Transcription

Here's how to transcribe audio files with the VoiceHarbor API .

Quickstart

import json
import time
import nijtaio

TOKEN = '<token>'
API_URL = 'https://api.nijta.com'
headers = {'Content-Type': 'application/json; charset=utf-8', 'TOKEN': TOKEN}
params = {
  'language': 'fr'
}
  
# Send a batch of audio files for processing
response = nijtaio.send_request(
    ['path/to/audio_1.wav', 'path/to/audio_2.mp3', 'path/to/audio_3.ogg'],
    params,
    nijtaio.session(TOKEN, api_url=API_URL),
    task='ASR',
    headers=headers,
    api_url=API_URL
)
 
task_id = json.loads(response.content)['data']['task_id']
print(f'{task_id = }')

print('Waiting for the batch to be processed.')
status = ''
while status != 'finished':
    time.sleep(1)
    status, transcriptions = nijtaio.read_response(task_id, api_url=API_URL)
    print(f'{status = }', end='\r')
print()

for original_filepath in transcriptions:
    filename = os.path.basename(original_filepath)
    for channel in transcriptions[original_filepath]:
        print(filename, channel, transcriptions[original_filepath][channel]['transcription']
print(f'Done.')


Step by step

Follow the steps below to use the API for text processing:

  1. Setup and Configuration: Import necessary libraries and set up your API token and URL.

    import json
    import time
    import nijtaio
    
    TOKEN = '<token>'
    API_URL = 'https://api.nijta.com'
    headers = {'Content-Type': 'application/json; charset=utf-8', 'TOKEN': TOKEN}
    params = {
      'language': 'fr'
    }
    

Parameters:

  • language: if no language is given, it will be detected
  • words Set this parameter to True to get the timestamps of each word with the transcription, when the content parameter is True. For stereo files, you will also have the timestamped sequences of the dialogue.
  1. Send Audio: Pass a list of audio files.

    # Send a batch of audio files for processing
    response = nijtaio.send_request(
        ['path/to/audio_1.wav', 'path/to/audio_2.mp3', 'path/to/audio_3.ogg'],
        params,
        nijtaio.session(TOKEN, api_url=API_URL),
        task='ASR',
        headers=headers,
        api_url=API_URL
    )
     
    task_id = json.loads(response.content)['data']['task_id']
    print(f'{task_id = }')
    

  1. Wait for Processing to Complete: Call the API to check the status of the batch processing.

    print('Waiting for the batch to be processed.')
    status = ''
    
    while status != 'finished':
        time.sleep(1)
        status, anonymized_batch = nijtaio.read_response(task_id, api_url=API_URL)
        print(f'{status = }', end='\r')
    print()
    

📘

Good to know

The result is kept only for 5 seconds on our server.

  1. Retrieve and Display Results: Once processing is complete, retrieve and print the anonymized text.

    for masked_text in anonymized_batch:
        print(masked_text)
    
    print('Done.')
    

Expected Result

Mono file with `words=False :

{  
   "path/to/audio_1.wav": {  
     "mono": {  
       "language": "en",  
       "transcription": " I will be out of the office on Thursday. I will talk to you on F  
     }  
} }

Mono file with words=True :

 {  
   "path/to/audio_1.wav": {  
     "mono": {  
       "language": "en",  
       "transcription": " I will be out of the office on Thursday. I will talk to you on F  
       "words": \[  
         {  
           "end": 1.14,  
           "start": 0.56,  
           "text": " I"  
}, {  
}, {  
}, [...] {  
           "end": 4.86,  
           "start": 4.52,  
           "text": " Friday"  
}, {  
} ]  
} }  
}

Stereo file with words=False :

{  
  "path/to/audio_2.wav": {  
    "0": {  
      "language": "fr",  
      "transcription": " Oui oui je vous écoute. Non, j'ai déjà un abonnement. Au revoir."  
  }, "1": {  
      "language": "fr",  
      "transcription": " Bonjour monsieur, avez-vous quelques minutes à m'accorder s'il vous plaît ?"
  } 
}

Stereo file with words= True

{
  'path/to/audio_2.wav': {
    '0': {
      'language': 'fr', 
      'transcription': " Oui oui je vous écoute. Non, j'ai déjà un abonnement. Au revoir.", 
      'words': [
        {'end': 12.52, 'start': 12.14, 'text': ' Oui'}, 
        {'end': 12.6, 'start': 12.52, 'text': 'oui'}, 
        {'end': 12.72, 'start': 12.6, 'text': ' je'}, 
        {'end': 13.32, 'start': 12.72, 'text': ' vous'}, 
        {'end': 19.8, 'start': 19.56, 'text': ' écoute'}, 
        {'end': 19.88, 'start': 19.8, 'text': '.'}, 
        {'end': 19.96, 'start': 19.88, 'text': ' Non'}, 
        {'end': 20.08, 'start': 19.96, 'text': ' ,'}, 
        {'end': 20.3, 'start': 20.08, 'text': " j'ai"}, 
        {'end': 20.5, 'start': 20.3, 'text': ' déjà'}, 
        {'end': 20.68, 'start': 20.5, 'text': ' un'}, 
        {'end': 20.8, 'start': 20.68, 'text': ' abonnement'}, 
        {'end': 20.84, 'start': 20.8, 'text': '.'}, 
        {'end': 20.9, 'start': 20.84, 'text': ' Au'}, 
        {'end': 21.14, 'start': 20.9, 'text': ' revoir'}, 
        {'end': 21.34, 'start': 21.14, 'text': '.'}, 
      ]
    }, 
    '1': {
      'language': 'fr', 
      'transcription': " Bonjour monsieur, avez-vous quelques minutes à m'accorder s'il vous plaît ?", 
      'words': [
        {'end': 1.11, 'start': 0.37, 'text': ' Bonjour'}, 
        {'end': 1.51, 'start': 1.11, 'text': ' monsieur'}, 
        {'end': 2.71, 'start': 2.31, 'text': ','}, 
        {'end': 2.93, 'start': 2.71, 'text': ' avez'}, 
        {'end': 3.29, 'start': 2.93, 'text': ' vous'}, 
        {'end': 3.87, 'start': 3.29, 'text': ' quelques'}, 
        {'end': 4.19, 'start': 3.87, 'text': ' minutes'}, 
        {'end': 4.35, 'start': 4.19, 'text': ' à'}, 
        {'end': 4.77, 'start': 4.35, 'text': " m'accorder"}, 
        {'end': 5.35, 'start': 4.77, 'text': " s'il"}, 
        {'end': 5.49, 'start': 5.35, 'text': ' vous'}, 
        {'end': 5.79, 'start': 5.49, 'text': ' plaît'}, 
        {'end': 6.01, 'start': 5.79, 'text': ' ?'}
      ]
    }
	}
}