VoiceHarbor API

powered by Nijta

😎

Become a anonymization alchemist!

The provided code showcases an effective utilization of our VoiceHarbor API for speech anonymization. It imports necessary modules, sets parameters like language and gender, establishes a connection with the API, sends audio files for processing, and waits for processing to complete. The script then saves anonymized audio files and transcriptions in an output folder, using specified parameters. Finally, it provides feedback on the processing status and the location of the results.

import os
import json
import time
import nijtaio  # pip install nijtaio

# Replace '<token>' with your actual token
TOKEN = '<token>'
API_URL = 'https://api.nijta.com/'

# Set up headers for the API request
headers = {
    "Content-Type": "application/json; charset=utf-8",
    "TOKEN": TOKEN
}

# Parameters for the API request
params = {
    'language': 'french',
    'gender': 'f',
    'voice': True,
    'content': True,
    'entities': 'Name,Organization,Location,City,Country,Numbers,Age,Date,Credit Card Number,Email,Concept,Product,Event,Technology,Group,Medical Condition,Characteristic,Research,County',
    'regex_entities': \[('[\w\-\.]+@([\w-]+\.)+[\w-]{2,4}', 'email'),],
}

# Output folder where the results will be stored
output_folder = 'output'

# Start a new session with the provided token
session_id = nijtaio.session(TOKEN, api_url=API_URL)

# Send a batch of audio files for processing
response = nijtaio.send_request(
    ["path/to/audio_1.wav", "path/to/audio_2.wav"],
    params,
    session_id,
    headers=headers,
    api_url=API_URL
)

# Extract the task ID from the response
task_id = json.loads(response.content)['data']['task_id']

# Monitor the processing status of the batch
print('Waiting for the batch to be processed.')
while True:
    time.sleep(1)
    status, anonymized_batch = nijtaio.read_response(task_id, api_url=API_URL)
    if status == 'finished':
        break

# Process and save the anonymized results
print(f'Writing results in {output_folder}.')
os.makedirs(output_folder, exist_ok=True)
for original_filepath in anonymized_batch:
    filename = os.path.basename(original_filepath)
    # Save anonymized audio
    with open(os.path.join(output_folder, filename), mode='wb') as f:
        f.write(anonymized_batch[original_filepath]['audio'])
    if params['content']:
        # Print transcription if content parameter is True
        print(filename, anonymized_batch[original_filepath]['transcription'])

# Print completion message
print(f'Done. Check the results in the {output_folder} directory.')

Speech Anonymization

Requirements

Import necessary libraries.

πŸ“˜

Install NijtaIO

The NijtaIO module streamlines the process of working with audio datasets. It allows users to quickly convert audio data and related details into a format suitable for sending to VoiceHarbor API. This simplifies the interaction with audio data and facilitates efficient communication with the our API, making the workflow faster and more user-friendly.

pip install nijtaio

import nijtaio

Fill your token and select output folder to storage the output files

TOKEN = '<token>'
API_URL = 'https://api.nijta.com/'
headers = {'Content-Type': 'application/json; charset=utf-8', 'TOKEN': TOKEN}
output_folder = 'output'

Build request parameters

Supported parameters:

ParameterDescription
language'french', 'english'. Only French and English models are available in this version.
gender(optional) choose a gender for the target "pseudo" speakers (values: 'f' or 'm'). If no value is passed, it will be chosen randomly.
robotic(optional) By default, the original variations of the pitch are preserved. With this option turned on, they are removed and the results sounds robotic (values: True, False. Default is False).
seed(optional): you can set a seed for reproducibility during evaluation (not recommended in production).
voiceSet this parameter to True if you wish to transform the original voice.
contentSet this parameter to True if you wish to remove the sensitive content from the audio and the transcription, based on the categories passed in 'entities'.
entitiesGive the list of categories of entities you want to hide. Some examples: Name,Organization,Location,City,Country,Numbers,Age,Date,Credit Card Number,Email,Concept,Product,Event,Technology,Group,Medical Condition,Characteristic,Research,County, ...
ner_threshold(optional) Our NER hides entities based on a score. Depending on your needs, you might want to mask more or fewer entities than the default. Set a value between 0 and 1 (default is 0.5): closer to zero hides more entities, while closer to 1 only hides highly scored ones..
regex_entities(optional) In addition to NER, you can also mask entities based on regular expressions.
Format: add the key 'regex_entitiesto the params with the value in the following format: [ ('regex1', 'tag1), ('regex2', 'tag2'), .... ]
For example, to mask emails based on regex:
params = { 'regex_entities': [('[a-zA-Z0-9_.]+[@]{1}[a-z0-9]+[\.][a-z]+', 'email'),] }
Any email address matching the regex will be replaced with the tag <email>
maskIf the content parameter is set to True, specify how you would like to conceal the content. Choose between "silence" (default) or "beep".
wordsSet this parameter to True to get the timestamps of each word with the transcription, when the content parameter is True. If the audio is stereo, you will also get the sequential dialogue in the output.
separate(optional) Set this parameter to True to apply speaker separation to your input files before applying anonymisation. This is useful for mono files with 2 speakers. I the input file is stereo, no separation will be applied. Check the Speaker Separation section below to get more information

Supported languages:

LanguageVoice AnonymizationContent AnonymizationValue of the languageparameter
EnglishYesYesenglish
FrenchYesYesfrench
MultilingualComing Soon!Yes

To exclusively utilize the Voice Anonymization API, kindly modify your parameter settings to align with the following configuration:

params = {
    'language':'french',
    'gender':'f', 
    'voice':True
}

To include content Anonymization modify your parameter settings to this:

params = {
    'language':'french',
    'gender':'f',
    'robotic':'false',
    'mask':'silence',
    'voice':True,
    'content':True,
    'entities': 'Name,Organization,Location,City,Country,Numbers,Age,Date,Credit Card Number,Email,Concept,Product,Event,Technology,Group,Medical Condition,Characteristic,Research,County',
    'regex_entities': \[('[\w\-\.]+@([\w-]+\.)+[\w-]{2,4}', 'email'),],
    'words': True
}

Build payload and submit job

πŸ“˜

Optional: credit checking

Check your credit manually in seconds/minutes by visit the following page:

https://api.nijta.com/credit/<token>
# {"minute":"1000.3"}
curl https://api.nijta.com/credit/<token>

Supported inputs

You have multiple ways to forward your files to the API.

InputDescription
URLs (Multiple)a list of audio URLs (e.g., "https://foo.bar/audio_1", "https://foo.bar/audio_2", etc.).
URL (Single)a single audio URL ("https://foo.bar/audio_1").
Files (Multiple)a list of local audio files ("path/to/audio_1.wav", "path/to/audio_2.wav", etc.).
Files (Single)a single local audio file ("path/to/audio_1").
Directorya local directory containing audio files ("path/to/audio_folder").
Archivean archive with audio files
S3 Bucketthe URL to an S3 bucket containing audio files: "s3://my-bucket/". You then have to pass your aws credentials (see example below).

Send request

To do so get a session id by calling the NijtaIO session function

session_id = nijtaio.session(TOKEN, url=API_URL) # preliminary token and credit check 

Read, Load files and send your request to VoiceHarbor API by using NijtaIO send_request function.

## url(s)
response = nijtaio.send_request(
    ['https://foo.bar/audio_1', 'https://foo.bar/audio_2'],
    params,
    session_id,
    headers=headers,
    api_url=API_URL,
)
## file(s)
response = nijtaio.send_request(
    ['path/to/audio_1.wav', 'path/to/audio_2.wav'],
    params,
    session_id,
    headers=headers,
    api_url=API_URL,
)
## directory
response = nijtaio.send_request(
    'path/to/audio_folder', params, session_id, headers=headers, api_url=API_URL
)
## archive
response = nijtaio.send_request(
    'https://s3.amazonaws.com/datasets.huggingface.co/SpeechCommands/v0.01/v0.01_test.tar.gz',
    params,
    session_id,
    headers=headers,
    api_url=API_URL,
)

## S3 bucket
storage_options = {
    'key': '<AWS_ACCESS_KEY_ID>',
    'secret': '<AWS_SECRET_ACCESS_KEY>',
}
response = nijtaio.send_request(
    's3://my-bucket/',
    params,
    session_id,
    headers=headers,
    api_url=API_URL,
    storage_options=storage_options,
)

❗️

Limitations

File Size Limit: Audio files should not exceed 500 MB in size. The API will not accept files larger than this limit.

Supported Audio Formats: An audio file batch is organized as a dictionary, utilizing filenames as keys and binary content as corresponding values. The permissible audio file formats encompass 'wav', 'mp3', 'ogg', and 'flac'. If an incompatible file is encountered, its name will be included in the failed_files list within the Response.

License based API call limits: Our API accommodates user-initiated submissions of multiple batches, with the capacity to transmit up to 5(Foundation Plan) individual files per batch. Additionally, the system is engineered to efficiently manage concurrent requests, allowing for a maximum of 3(Foundation Plan) simultaneous requests per user.

πŸ“˜

Batch Processing

Our API is structured to efficiently handle a substantial volume of files beyond the above mentioned capacity. To facilitate the processing of a greater number of files, users are encouraged to reach out to us at [email protected] and explore the option of upgrading your current payment plan.

Success Response

ObjectDescriptionDescription
Code200 (OK)
ContentA JSON object with information to monitor the status of the task:
task_idthe id of the created task that can be passed to monitor the status of the task and retrieve the anonymized content when finished
submission statusstatus of the task
submission_timecreation time of this task
failed_filesfiles that couldn't be included in the tasks, if any

Error Response

CodeDescription
400"failed, no valid files" if none of the file in the batch is valid.
429"failed, too many requests" if the number of requests exceed the one authorized.

Monitor Job

πŸ“˜

Status & Result

Wait for the job be finished, check job status by "task_id"

task_id = json.loads(response.content)['data']['task_id']
print('Waiting for the batch to be processed.')
while True:
    time.sleep(1)
    status, anonymized_batch = nijtaio.read_response(task_id, api_url=API_URL)
    if status == 'finished':
        break

πŸ“˜

Monitor Status

check your job status by logging "task_status". Modify the code to have permanent monitor of the status:

...
while True:
    time.sleep(1)
    response = requests.get('{}/{}'.format(API_URL, task_id))
    content = json.loads(response.content)
    logging.info(content['data']['task_status'])
    ...

Download result

The anonymized_batch dictionary serves as a container for processed data, where original file paths are keys, and their corresponding anonymized content is stored as values.

  • Audio Content:
    When voice or content is set to True, anonymized audio content can be accessed via the audio key.
  • Transcriptions:
    If content=True, additional data beyond audio is included:
    • Anonymized transcription: Accessible via the 'transcription' key.
    • Original transcription: Accessible under 'original' -> 'text'.
  • Word-Level Timestamps:
    When both content=True and words=True, each word's timestamp can be retrieved via the words key.
  • Sequential Dialogue: (requires nijtaio v1.1.8+)
    If content=True, words=True, and the audio is stereo, the sequential dialogue is available under the sequence key.
  • Report Summary: (requires nijtaio v1.1.9+)
    For each file, a report key provides useful data about the original content, including:
    • A list of personally identifiable information (PII) categories detected in the contents, independently of the categories passed in parameters to inform you about the categories you could want to add.
    • The legal frameworks associated with the detected PII (e.g., GDPR, CCPA).
    • A classification of the document based on ISO 27001 (e.g., Confidential, Restricted).
    • Metadata such as sampling rate, duration, and MOS (Mean Opinion Score) for audio quality.

Here's an example of how the anonymized_batchdictionary looks like:

{
  'path/to/audio.wav': {
    'audio': b'RIFFb\x12\x03\x00WAVEfmt \x10\x00\[...]\xff\xeb\xff\xe6\xff',
    'transcription': 'Hey my name is <Name> I am from <City> ...',
    'language': 'en', 
    'words': [{'end': 1.14, 'start': 0.56, 'text': ' Hey'},
             {'end': 1.32, 'start': 1.14, 'text': ' my'},
             {'end': 1.48, 'start': 1.32, 'text': ' name'},
             {'end': 1.76, 'start': 1.48, 'text': ' is'},
             {'end': 1.92, 'start': 1.76, 'text': ' <Name>'},
             [...]],
    'report': {
		  'piis': {'Location': 2, 'Person': 2}, 
		  'regulations': ['CCPA (California)', 'PIPEDA (Canada)', 'LGPD (Brazil)', 'GDPR (EU)'], 
		  'sensitivity': 'Restricted', 
		  'infos': {
			  'sampling_rate': 16000, 
			  'sample_width': 2, 
			  'format_': 'wav', 
			  'duration': 6.315, 
			  'channels': 1, 
			  'mos': 0.8919036388397217
      }
	  }, 
  }
}

In the provided code snippet, a loop iterates through the keys of the anonymized_batch dictionary, the anonymized audio content is saved in an output folder and the anonymized transcription is printed.

# Read the result and write in a file
print(f'Writing results in {output_folder}.')
os.makedirs(output_folder, exist_ok=True)

for original_filepath in anonymized_batch:
    filename = os.path.basename(original_filepath)
    with open(os.path.join(output_folder, filename), mode='wb') as f:
        f.write(anonymized_batch[original_filepath]['audio'])
    if params['content']:
        print(filename, anonymized_batch[original_filepath]['transcription'])
print(f'Done. Check the results in the {output_folder} directory.')

In the transcriptions, the anonymization process replaces sensitive information, such as personal names, locations, birthdates, and credit card numbers, with placeholders ("").

πŸ“˜

Example output

# Hey my name is <Name> I am from <City> I was born on <Number> March <Number> My credit card number is <Number>

CLI

nijtaio also comes with a command voiceharbor, providing another option for interacting with the Voice Harbor API. With just a few command-line arguments, you can anonymize and process audio files efficiently. This CLI feature simplifies the process of submitting audio files for processing and retrieving anonymized results, all without the need for coding.

To use the CLI, simply invoke the voiceharbor command followed by the required arguments (run voiceharbor --help to get that list:

  • --token your API token,
  • --input_data input audio file paths
  • --language choose between "english_16" and "french_8"
  • --gender for the gender of the target speaker
  • -- voice Whether to anonymize voice or not.
  • --content Whether to anonymize content or not
  • --output_folderthe folder when you want the result to be saved

The CLI handles communication with the Voice Harbor API and provides you with status updates as the processing takes place. Whether you are automating anonymization tasks or running one-off commands, the CLI offers a streamlined experience.

voiceharbor	--token <token> \
  --input_data "[\"path/to/audio_1.wav\", \"path/to/audio_2.wav\"]" \
  --language english_16 \
  --gender m \
  --voice True \
  --content True \
  --output_folder path/to/folder

NER

You can choose to mask any categories of entities. Here's a non exhaustive list of categories considered as PII:

πŸ‘

Example PII:

Mobile number
IBAN
Location
Organization
Date and Time
Medical Conditions
Transportation
Landmarks and Attractions
Emergency Service Keywords
Credit card number
Health insurance number
Address
City
County
District
Borough
Age
Date
Birth date
CCV
Time
Emergency Type
Person
Numbers
Vehicle Description
Injured Person's Name
Medical Condition
Landmarks/Points of Interest
Intent

🚧

Generative Content Anonymization (WIP)

Presently, our system offers support for the substitution of the sensitive entities with auditory cues such as beeps or periods of silence. We are actively engaged in developing a Generative AI driven feature that will facilitate the replacement of these entities with generated ones (Audio+Text) from within the same semantic category.

This advancement is aimed at elevating the efficacy of the anonymized speech, thereby contributing to an enhanced level of utility.

Speaker Separation

Speaker separation is a feature within our API that isolates individual speakers recorded on one channel, in a mono audio file. The output is a stereo file where each speaker is in a separate channel, enabling better analysis of the dialogue. This is applied before anonymization. Below is an example of how you can use this feature in your code:

import os
import json
import time
import nijtaio  # pip install nijtaio

# Replace '<token>' with your actual token
TOKEN = '<token>'
API_URL = 'https://api.nijta.com/'

# Set up headers for the API request
headers = {
    "Content-Type": "application/json; charset=utf-8",
    "TOKEN": TOKEN
}

# Parameters for the API request
params = {
    'language': 'french_8',
    'gender': 'f',
    'voice': True,
    'content': True,
    'entities': 'Name,Organization,Location,City,Country,Numbers,Age,Date,Credit Card Number,Email,Concept,Product,Event,Technology,Group,Medical Condition,Characteristic,Research,County'
    'separate': True
}

# Output folder where the results will be stored
output_folder = 'output'

# Start a new session with the provided token
session_id = nijtaio.session(TOKEN, api_url=API_URL)

# Send a batch of audio files for processing
response = nijtaio.send_request(
    ["path/to/audio.wav"],  # audio.wav is a mono file with 2 speakers
    params,
    session_id,
    headers=headers,
    api_url=API_URL 
)

# Extract the task ID from the response
task_id = json.loads(response.content)['data']['task_id']

# Monitor the processing status of the batch
print('Waiting for the batch to be processed.')
while True:
    time.sleep(1)
    status, anonymized_batch = nijtaio.read_response(task_id, api_url=API_URL)
    if status == 'finished':
        break

will give as a result, in anonymized_batch:

{
  'path/to/audio.wav': {
    'audio': b'RIFFb\x12\x03\x00WAVEfmt \x10\x00\[...]\xff\xeb\xff\xe6\xff',
    'transcription': {
        '0': " Bonjour monsieur, je suis <Name> de <Organization>. Est-ce que vous avez cinqs minutes Γ  m'accorder ? [...]",
        '1': ' Bonjour <Name>, oui, je vous Γ©coute. [...]'
    }
  }
}

With words = True:

{
  'path/to/audio.wav': {
    'audio': b'RIFFb\x12\x03\x00WAVEfmt \x10\x00\[...]\xff\xeb\xff\xe6\xff',
    'transcription': {
        '0': " Bonjour monsieur, je suis <Name> de <Organization>. Est-ce que vous avez cinqs minutes Γ  m'accorder ? [...]",
        '1': ' Bonjour <Name>, oui, je vous Γ©coute. [...]'
    },
    'words': {
      "0": [
          {
              "end": 0.36,
              "start": 0.0,
              "text": " Bonjour"
          },
          {
              "end": 0.72,
              "start": 0.54,
              "text": " monsieur"
          },
          [...],
      ],
      "1": [
        {
      "0": [
          {
              "end": 0.76,
              "start": 0.72,
              "text": " Bonjour,"
          },
          {
              "end": 0.76,
              "start": 0.72,
              "text": " <Name>,"
          }, 
          [...]
      ]
    },
    'sequence': [
      {
          "channel": "0",
          "text": " Bonjour monsieur, je suis <Name> de <Organization>. Est-ce que vous avez cinqs minutes Γ  m'accorder ?",
          "start": 0.0,
          "end": 9.4
      },
      {
          "channel": "1",
          "text": " Bonjour <Name>, oui, je vous Γ©coute.",
          "start": 9.4,
          "end": 9.4
      },
      {
          "channel": "0",
          "text": " Vous Γͺtes <Name> ?",
          "start": 9.48,
          "end": 10.22
      },
      [...]
    ]
  }
}