VoiceHarbor API

powered by Nijta

😎

Become a anonymization alchemist!

The provided code showcases an effective utilization of our VoiceHarbor API for speech anonymization. It imports necessary modules, sets parameters like language and gender, establishes a connection with the API, sends audio files for processing, and waits for processing to complete. The script then saves anonymized audio files and transcriptions in an output folder, using specified parameters. Finally, it provides feedback on the processing status and the location of the results.

import os
import json
import time
import nijtaio  # pip install nijtaio

# Replace '<token>' with your actual token
TOKEN = '<token>'
API_URL = 'https://api.nijta.com/'

# Set up headers for the API request
headers = {
    "Content-Type": "application/json; charset=utf-8",
    "TOKEN": TOKEN
}

# Parameters for the API request
params = {
    'language': 'french_8',
    'gender': 'f',
    'voice': True,
    'content': True,
    'entities': 'Name,Organization,Location,City,Country,Numbers,Age,Date,Credit Card Number,Email,Concept,Product,Event,Technology,Group,Medical Condition,Characteristic,Research,County',
    'regex_entities': \[('[\w\-\.]+@([\w-]+\.)+[\w-]{2,4}', 'email'),],
}

# Output folder where the results will be stored
output_folder = 'output'

# Start a new session with the provided token
session_id = nijtaio.session(TOKEN, api_url=API_URL)

# Send a batch of audio files for processing
response = nijtaio.send_request(
    ["path/to/audio_1.wav", "path/to/audio_2.wav"],
    params,
    session_id,
    headers=headers,
    api_url=API_URL
)

# Extract the task ID from the response
task_id = json.loads(response.content)['data']['task_id']

# Monitor the processing status of the batch
print('Waiting for the batch to be processed.')
while True:
    time.sleep(1)
    status, anonymized_batch = nijtaio.read_response(task_id, api_url=API_URL)
    if status == 'finished':
        break

# Process and save the anonymized results
print(f'Writing results in {output_folder}.')
os.makedirs(output_folder, exist_ok=True)
for original_filepath in anonymized_batch:
    filename = os.path.basename(original_filepath)
    # Save anonymized audio
    with open(os.path.join(output_folder, filename), mode='wb') as f:
        f.write(anonymized_batch[original_filepath]['audio'])
    if params['content']:
        # Print transcription if content parameter is True
        print(filename, anonymized_batch[original_filepath]['transcription'])

# Print completion message
print(f'Done. Check the results in the {output_folder} directory.')

Speech Anonymization

Requirements

Import necessary libraries.

πŸ“˜

Install NijtaIO

The NijtaIO module streamlines the process of working with audio datasets. It allows users to quickly convert audio data and related details into a format suitable for sending to VoiceHarbor API. This simplifies the interaction with audio data and facilitates efficient communication with the our API, making the workflow faster and more user-friendly.

pip install nijtaio

import nijtaio

Fill your token and select output folder to storage the output files

TOKEN = '<token>'
API_URL = 'https://api.nijta.com/'
headers = {'Content-Type': 'application/json; charset=utf-8', 'TOKEN': TOKEN}
output_folder = 'output'

Build request parameters

Supported parameters:

ParameterDescription
language'french_8', 'english_16'. Only French/8kHz and English/16kHz are available in this version.
genderchoose a gender for the target "pseudo" speakers (values: 'f' or 'm').
robotic(optional) By default, the original variations of the pitch are preserved. With this option turned on, they are removed and the results sounds robotic (values: True, False. Default is False).
seed(optional): you can set a seed for reproducibility during evaluation (not recommended in production).
voice(default True) Set this parameter to False if you don't wish to transform the original voice.
contentSet this parameter to True if you wish to remove sensitive content.
entitiesGive the list of type of entities you want to hide. Some examples: Name,Organization,Location,City,Country,Numbers,Age,Date,Credit Card Number,Email,Concept,Product,Event,Technology,Group,Medical Condition,Characteristic,Research,County, ...
ner_threshold(optional) Our NER hides entities based on a score. Depending on your needs, you might want to mask more or fewer entities than the default. Set a value between 0 and 1 (default is 0.5): closer to zero hides more entities, while closer to 1 only hides highly scored ones..
regex_entities(optional) In addition to NER, you can also mask entities based on regular expressions.
Format: add the key 'regex_entitiesto the params with the value in the following format: [ ('regex1', 'tag1), ('regex2', 'tag2'), .... ]
For example, to mask emails based on regex:
params = { 'regex_entities': [('[a-zA-Z0-9_.]+[@]{1}[a-z0-9]+[\.][a-z]+', 'email'),] }
Any email address matching the regex will be replaced with the tag <email>
maskIf the content parameter is set to True, specify how you would like to conceal the content. Choose between "silence" (default) or "beep".
wordsSet this parameter to True to get the timestamps of each word with the transcription, when the content parameter is True. If the audio is stereo, you will also get the sequential dialogue in the output.
separate(optional) Set this parameter to True to apply speaker separation to your input files before applying anonymisation. This is useful for mono files with 2 speakers. I the input file is stereo, no separation will be applied. Check the Speaker Separation section below to get more information

Supported languages:

LanguageVoice AnonymizationContent AnonymizationParameter
EnglishYesYesenglish_16
FrenchYesYesfrench_8
MultilingualComing Soon!Yes

To exclusively utilize the Voice Anonymization API, kindly modify your parameter settings to align with the following configuration:

params = {
    'language':'french_8',
    'gender':'f', 
    'voice':True
}

To include content Anonymization modify your parameter settings to this:

params = {
    'language':'french_8',
    'gender':'f',
    'robotic':'false',
    'mask':'silence',
    'voice':True,
    'content':True,
    'entities': 'Name,Organization,Location,City,Country,Numbers,Age,Date,Credit Card Number,Email,Concept,Product,Event,Technology,Group,Medical Condition,Characteristic,Research,County',
    'regex_entities': \[('[\w\-\.]+@([\w-]+\.)+[\w-]{2,4}', 'email'),],
    'words': True
}

Build payload and submit job

πŸ“˜

Optional: credit checking

Check your credit manually in seconds/minutes by visit the following page:

https://api.nijta.com/credit/<token>
# {"minute":"1000.3"}
curl https://api.nijta.com/credit/<token>

Supported inputs

You have multiple ways to forward your files to the API.

InputDescription
URLs (Multiple)a list of audio URLs (e.g., "https://foo.bar/audio_1", "https://foo.bar/audio_2", etc.).
URL (Single)a single audio URL ("https://foo.bar/audio_1").
Files (Multiple)a list of local audio files ("path/to/audio_1.wav", "path/to/audio_2.wav", etc.).
Files (Single)a single local audio file ("path/to/audio_1").
Directorya local directory containing audio files ("path/to/audio_folder").
Archivean archive with audio files
S3 Bucketthe URL to an S3 bucket containing audio files: "s3://my-bucket/". You then have to pass your aws credentials (see example below).

Send request

To do so get a session id by calling the NijtaIO session function

session_id = nijtaio.session(TOKEN, url=API_URL) # preliminary token and credit check 

Read, Load files and send your request to VoiceHarbor API by using NijtaIO send_request function.

## url(s)
response = nijtaio.send_request(
    ['https://foo.bar/audio_1', 'https://foo.bar/audio_2'],
    params,
    session_id,
    headers=headers,
    api_url=API_URL,
)
## file(s)
response = nijtaio.send_request(
    ['path/to/audio_1.wav', 'path/to/audio_2.wav'],
    params,
    session_id,
    headers=headers,
    api_url=API_URL,
)
## directory
response = nijtaio.send_request(
    'path/to/audio_folder', params, session_id, headers=headers, api_url=API_URL
)
## archive
response = nijtaio.send_request(
    'https://s3.amazonaws.com/datasets.huggingface.co/SpeechCommands/v0.01/v0.01_test.tar.gz',
    params,
    session_id,
    headers=headers,
    api_url=API_URL,
)

## S3 bucket
storage_options = {
    'key': '<AWS_ACCESS_KEY_ID>',
    'secret': '<AWS_SECRET_ACCESS_KEY>',
}
response = nijtaio.send_request(
    's3://my-bucket/',
    params,
    session_id,
    headers=headers,
    api_url=API_URL,
    storage_options=storage_options,
)

❗️

Limitations

File Size Limit: Audio files should not exceed 500 MB in size. The API will not accept files larger than this limit.

Supported Audio Formats: An audio file batch is organized as a dictionary, utilizing filenames as keys and binary content as corresponding values. The permissible audio file formats encompass 'wav', 'mp3', 'ogg', and 'flac'. If an incompatible file is encountered, its name will be included in the failed_files list within the Response.

License based API call limits: Our API accommodates user-initiated submissions of multiple batches, with the capacity to transmit up to 5(Foundation Plan) individual files per batch. Additionally, the system is engineered to efficiently manage concurrent requests, allowing for a maximum of 3(Foundation Plan) simultaneous requests per user.

πŸ“˜

Batch Processing

Our API is structured to efficiently handle a substantial volume of files beyond the above mentioned capacity. To facilitate the processing of a greater number of files, users are encouraged to reach out to us at [email protected] and explore the option of upgrading your current payment plan.

Success Response

ObjectDescriptionDescription
Code200 (OK)
ContentA JSON object with information to monitor the status of the task:
task_idthe id of the created task that can be passed to monitor the status of the task and retrieve the anonymized content when finished
submission statusstatus of the task
submission_timecreation time of this task
failed_filesfiles that couldn't be included in the tasks, if any

Error Response

CodeDescription
400"failed, no valid files" if none of the file in the batch is valid.
429"failed, too many requests" if the number of requests exceed the one authorized.

Monitor Job

πŸ“˜

Status & Result

Wait for the job be finished, check job status by "task_id"

task_id = json.loads(response.content)['data']['task_id']
print('Waiting for the batch to be processed.')
while True:
    time.sleep(1)
    status, anonymized_batch = nijtaio.read_response(task_id, api_url=API_URL)
    if status == 'finished':
        break

πŸ“˜

Monitor Status

check your job status by logging "task_status". Modify the code to have permanent monitor of the status:

...
while True:
    time.sleep(1)
    response = requests.get('{}/{}'.format(API_URL, task_id))
    content = json.loads(response.content)
    logging.info(content['data']['task_status'])
    ...

Download result

The anonymized_batch dictionary serves as a container for processed data, where original file paths are keys, and their corresponding anonymized content is stored as values.

  • When the 'voice' parameter is set to True, the anonymized audio content can be accessed using the key 'audio'.
  • When the 'content' parameter is enabled, the anonymized transcription can be retrieved with the 'transcription' key.
  • When the 'content' and 'words' parameters are set to True, the timestamps of each word can be retrieved with the 'words' key.
  • When the 'content' and 'words' parameters are set to True and the result audio is stereo, the sequential dialogue can be retrieved with the 'sequence' key.

Here's an example of how the anonymized_batchdictionary looks like:

{
  'path/to/audio.wav': {
    'audio': b'RIFFb\x12\x03\x00WAVEfmt \x10\x00\[...]\xff\xeb\xff\xe6\xff',
    'transcription': 'Hey my name is <Name> I am from <City> ...',
    'words': [{'end': 1.14, 'start': 0.56, 'text': ' Hey'},
             {'end': 1.32, 'start': 1.14, 'text': ' my'},
             {'end': 1.48, 'start': 1.32, 'text': ' name'},
             {'end': 1.76, 'start': 1.48, 'text': ' is'},
             {'end': 1.92, 'start': 1.76, 'text': ' <Name>'},
             [...]]
  }
}

In the provided code snippet, a loop iterates through the keys of the anonymized_batch dictionary, the anonymized audio content is saved in an output folder and the anonymized transcription is printed.

# Read the result and write in a file
print(f'Writing results in {output_folder}.')
os.makedirs(output_folder, exist_ok=True)

for original_filepath in anonymized_batch:
    filename = os.path.basename(original_filepath)
    with open(os.path.join(output_folder, filename), mode='wb') as f:
        f.write(anonymized_batch[original_filepath]['audio'])
    if params['content']:
        print(filename, anonymized_batch[original_filepath]['transcription'])
print(f'Done. Check the results in the {output_folder} directory.')

In the transcriptions, the anonymization process replaces sensitive information, such as personal names, locations, birthdates, and credit card numbers, with placeholders ("").

πŸ“˜

Example output

# Hey my name is <Name> I am from <City> I was born on <Number> March <Number> My credit card number is <Number>

CLI

nijtaio also comes with a command voiceharbor, providing another option for interacting with the Voice Harbor API. With just a few command-line arguments, you can anonymize and process audio files efficiently. This CLI feature simplifies the process of submitting audio files for processing and retrieving anonymized results, all without the need for coding.

To use the CLI, simply invoke the voiceharbor command followed by the required arguments (run voiceharbor --help to get that list:

  • --token your API token,
  • --input_data input audio file paths
  • --language choose between "english_16" and "french_8"
  • --gender for the gender of the target speaker
  • -- voice Whether to anonymize voice or not.
  • --content Whether to anonymize content or not
  • --output_folderthe folder when you want the result to be saved

The CLI handles communication with the Voice Harbor API and provides you with status updates as the processing takes place. Whether you are automating anonymization tasks or running one-off commands, the CLI offers a streamlined experience.

voiceharbor	--token <token> \
  --input_data "[\"path/to/audio_1.wav\", \"path/to/audio_2.wav\"]" \
  --language english_16 \
  --gender m \
  --voice True \
  --content True \
  --output_folder path/to/folder

NER

πŸ‘

NER V3.0

https://nijta.readme.io/reference/features

Example entities:

Email
Date
Concept
Product
Event
Technology
Group
Medical Condition
Characteristic
Research
County
Module
Unit
Feature
Cell
Package
Anatomical Structure
Equipment
Attribute Value
Pokemon
Immune Response
Physiology
Animals
Cell Feature
FAC (Functional Annotation Clustering)
Input Device
Ward
Broadcast

🚧

Generative Content Anonymization (WIP)

Presently, our system offers support for the substitution of the sensitive entities with auditory cues such as beeps or periods of silence. We are actively engaged in developing a Generative AI driven feature that will facilitate the replacement of these entities with generated ones (Audio+Text) from within the same semantic category.

This advancement is aimed at elevating the efficacy of the anonymized speech, thereby contributing to an enhanced level of utility.

Speaker Separation

Speaker separation is a feature within our API that isolates individual speakers recorded on one channel, in a mono audio file. The output is a stereo file where each speaker is in a separate channel, enabling better analysis of the dialogue. This is applied before anonymization. Below is an example of how you can use this feature in your code:

import os
import json
import time
import nijtaio  # pip install nijtaio

# Replace '<token>' with your actual token
TOKEN = '<token>'
API_URL = 'https://api.nijta.com/'

# Set up headers for the API request
headers = {
    "Content-Type": "application/json; charset=utf-8",
    "TOKEN": TOKEN
}

# Parameters for the API request
params = {
    'language': 'french_8',
    'gender': 'f',
    'voice': True,
    'content': True,
    'entities': 'Name,Organization,Location,City,Country,Numbers,Age,Date,Credit Card Number,Email,Concept,Product,Event,Technology,Group,Medical Condition,Characteristic,Research,County'
    'separate': True
}

# Output folder where the results will be stored
output_folder = 'output'

# Start a new session with the provided token
session_id = nijtaio.session(TOKEN, api_url=API_URL)

# Send a batch of audio files for processing
response = nijtaio.send_request(
    ["path/to/audio.wav"],  # audio.wav is a mono file with 2 speakers
    params,
    session_id,
    headers=headers,
    api_url=API_URL 
)

# Extract the task ID from the response
task_id = json.loads(response.content)['data']['task_id']

# Monitor the processing status of the batch
print('Waiting for the batch to be processed.')
while True:
    time.sleep(1)
    status, anonymized_batch = nijtaio.read_response(task_id, api_url=API_URL)
    if status == 'finished':
        break

will give as a result, in anonymized_batch:

{
  'path/to/audio.wav': {
    'audio': b'RIFFb\x12\x03\x00WAVEfmt \x10\x00\[...]\xff\xeb\xff\xe6\xff',
    'transcription': {
        '0': " Bonjour monsieur, je suis <Name> de <Organization>. Est-ce que vous avez cinqs minutes Γ  m'accorder ? [...]",
        '1': ' Bonjour <Name>, oui, je vous Γ©coute. [...]'
    }
  }
}

With words = True:

{
  'path/to/audio.wav': {
    'audio': b'RIFFb\x12\x03\x00WAVEfmt \x10\x00\[...]\xff\xeb\xff\xe6\xff',
    'transcription': {
        '0': " Bonjour monsieur, je suis <Name> de <Organization>. Est-ce que vous avez cinqs minutes Γ  m'accorder ? [...]",
        '1': ' Bonjour <Name>, oui, je vous Γ©coute. [...]'
    },
    'words': {
      "0": [
          {
              "end": 0.36,
              "start": 0.0,
              "text": " Bonjour"
          },
          {
              "end": 0.72,
              "start": 0.54,
              "text": " monsieur"
          },
          [...],
      ],
      "1": [
        {
      "0": [
          {
              "end": 0.76,
              "start": 0.72,
              "text": " Bonjour,"
          },
          {
              "end": 0.76,
              "start": 0.72,
              "text": " <Name>,"
          }, 
          [...]
      ]
    },
    'sequence': [
      {
          "channel": "0",
          "text": " Bonjour monsieur, je suis <Name> de <Organization>. Est-ce que vous avez cinqs minutes Γ  m'accorder ?",
          "start": 0.0,
          "end": 9.4
      },
      {
          "channel": "1",
          "text": " Bonjour <Name>, oui, je vous Γ©coute.",
          "start": 9.4,
          "end": 9.4
      },
      {
          "channel": "0",
          "text": " Vous Γͺtes <Name> ?",
          "start": 9.48,
          "end": 10.22
      },
      [...]
    ]
  }
}