Transcribe multilingual audio files
Transcribe fast, in multiple languages.
Speech to Text
Our advanced ASR technology is built on the robust foundation of OpenAI’s Whisper, known for its exceptional performance in multilingual speech recognition. However, we’ve significantly enhanced its capabilities with in-house innovations, including the implementation of phonetic time-stamps. These detailed markers provide an extra layer of precision by capturing the timing of specific phonetic elements within the audio, enabling more granular analysis and synchronization.
Our ASR component support’s for up to 49 languages and robust code-switching capabilities, it effortlessly transcribes audio that blends multiple languages. Its built-in automatic language detection ensures that users do not have to manually specify the language, streamlining the workflow, while precise time-stamps allow for easy navigation and review of audio content.
Audio transcription without PHI reduction
Transcirption is by default applied for the task protect. To have the transcription without any reduction use the task parameter and set it to transcribe.
Transcribe multilingual audio data
Set target transcription language
Output example
Benchmarks for top 50 supported languages
Rank | Code | Language | WER (%) on FLEURS |
---|---|---|---|
1 | es | Spanish | 3.0 |
2 | it | Italian | 4.0 |
3 | en | English | 4.2 |
4 | pt | Portuguese | 4.3 |
5 | de | German | 4.5 |
6 | ja | Japanese | 5.0 |
7 | pl | Polish | 5.6 |
8 | ru | Russian | 5.6 |
9 | nl | Dutch | 6.1 |
10 | id | Indonesian | 6.4 |
11 | fr | French | 7.1 |
12 | tr | Turkish | 7.3 |
13 | sv | Swedish | 8.1 |
14 | uk | Ukrainian | 8.3 |
15 | ms | Malay | 8.7 |
16 | no | Norwegian | 9.1 |
17 | fi | Finnish | 9.2 |
18 | vi | Vietnamese | 10.9 |
19 | th | Thai | 11.5 |
20 | el | Greek | 13.0 |
21 | cs | Czech | 13.4 |
22 | hr | Croatian | 13.9 |
23 | tl | Tagalog | 14.3 |
24 | da | Danish | 14.3 |
25 | ko | Korean | 14.4 |
26 | ro | Romanian | 14.6 |
27 | bg | Bulgarian | 14.7 |
28 | zh | Chinese | 15.6 |
29 | ht | Haitian Creole | 16.1 |
30 | mk | Macedonian | 17.5 |
31 | hi | Hindi | 21.5 |
32 | et | Estonian | 21.9 |
33 | ur | Urdu | 23.1 |
34 | fa | Persian | 23.4 |
35 | lt | Lithuanian | 24.2 |
36 | az | Azerbaijani | 27.1 |
37 | he | Hebrew | 27.7 |
38 | hy | Armenian | 28.1 |
39 | be | Belarusian | 31.3 |
40 | af | Afrikaans | 31.8 |
41 | sq | Albanian | 32.7 |
42 | sk | Slovak | 33.9 |
43 | sr | Serbian | 34.7 |
44 | kk | Kazakh | 37.7 |
45 | kn | Kannada | 38.1 |
46 | bn | Bengali | 39.7 |
47 | mr | Marathi | 40.9 |
48 | eu | Basque | 44.3 |
49 | ne | Nepali | 45.4 |
Good news to share with you!
Changelog
Looking ahead, we’re also pushing the envelope by refining our model with an extensive trove of medical data to eliminate hallucinations and boost reliability in even the most demanding environments. The Q2 realease will not only improve the recognition of complex medical terminology but also significantly mitigate transcirption errors, ensuring that results are both accurate and reliable. Whether you need rapid transcription for global communications or precise documentation in critical healthcare settings, our ASR component is designed to deliver excellence.