VoiceHarbor API
powered by Nijta
Become a anonymization alchemist!
The provided code showcases an effective utilization of our VoiceHarbor API for speech anonymization. It imports necessary modules, sets parameters like language and gender, establishes a connection with the API, sends audio files for processing, and waits for processing to complete. The script then saves anonymized audio files and transcriptions in an output folder, using specified parameters. Finally, it provides feedback on the processing status and the location of the results.
import os import json import time import nijtaio # pip install nijtaio # Replace '<token>' with your actual token TOKEN = '<token>' API_URL = 'https://api.nijta.com/' # Set up headers for the API request headers = { "Content-Type": "application/json; charset=utf-8", "TOKEN": TOKEN } # Parameters for the API request params = { 'language': 'french', 'gender': 'f', 'voice': True, 'content': True, 'entities': 'Name,Organization,Location,City,Country,Numbers,Age,Date,Credit Card Number,Email,Concept,Product,Event,Technology,Group,Medical Condition,Characteristic,Research,County', 'regex_entities': \[('[\w\-\.]+@([\w-]+\.)+[\w-]{2,4}', 'email'),], } # Output folder where the results will be stored output_folder = 'output' # Start a new session with the provided token session_id = nijtaio.session(TOKEN, api_url=API_URL) # Send a batch of audio files for processing response = nijtaio.send_request( ["path/to/audio_1.wav", "path/to/audio_2.wav"], params, session_id, headers=headers, api_url=API_URL ) # Extract the task ID from the response task_id = json.loads(response.content)['data']['task_id'] # Monitor the processing status of the batch print('Waiting for the batch to be processed.') while True: time.sleep(1) status, anonymized_batch = nijtaio.read_response(task_id, api_url=API_URL) if status == 'finished': break # Process and save the anonymized results print(f'Writing results in {output_folder}.') os.makedirs(output_folder, exist_ok=True) for original_filepath in anonymized_batch: filename = os.path.basename(original_filepath) # Save anonymized audio with open(os.path.join(output_folder, filename), mode='wb') as f: f.write(anonymized_batch[original_filepath]['audio']) if params['content']: # Print transcription if content parameter is True print(filename, anonymized_batch[original_filepath]['transcription']) # Print completion message print(f'Done. Check the results in the {output_folder} directory.')
Speech Anonymization
Requirements
Import necessary libraries.
Install NijtaIO
The NijtaIO module streamlines the process of working with audio datasets. It allows users to quickly convert audio data and related details into a format suitable for sending to VoiceHarbor API. This simplifies the interaction with audio data and facilitates efficient communication with the our API, making the workflow faster and more user-friendly.
pip install nijtaio
import nijtaio
Fill your token and select output folder to storage the output files
TOKEN = '<token>'
API_URL = 'https://api.nijta.com/'
headers = {'Content-Type': 'application/json; charset=utf-8', 'TOKEN': TOKEN}
output_folder = 'output'
Build request parameters
Supported parameters:
Parameter | Description |
---|---|
language | 'french', 'english'. Only French and English models are available in this version. |
gender | (optional) choose a gender for the target "pseudo" speakers (values: 'f' or 'm'). If no value is passed, it will be chosen randomly. |
robotic | (optional) By default, the original variations of the pitch are preserved. With this option turned on, they are removed and the results sounds robotic (values: True, False. Default is False). |
seed | (optional): you can set a seed for reproducibility during evaluation (not recommended in production). |
voice | Set this parameter to True if you wish to transform the original voice. |
content | Set this parameter to True if you wish to remove the sensitive content from the audio and the transcription, based on the categories passed in 'entities'. |
entities | Give the list of categories of entities you want to hide. Some examples: Name,Organization,Location,City,Country,Numbers,Age,Date,Credit Card Number,Email,Concept,Product,Event,Technology,Group,Medical Condition,Characteristic,Research,County, ... |
ner_threshold | (optional) Our NER hides entities based on a score. Depending on your needs, you might want to mask more or fewer entities than the default. Set a value between 0 and 1 (default is 0.5): closer to zero hides more entities, while closer to 1 only hides highly scored ones.. |
regex_entities | (optional) In addition to NER, you can also mask entities based on regular expressions. Format: add the key 'regex_entities to the params with the value in the following format: [ ('regex1', 'tag1), ('regex2', 'tag2'), .... ] For example, to mask emails based on regex: params = { 'regex_entities': [('[a-zA-Z0-9_.]+[@]{1}[a-z0-9]+[\.][a-z]+', 'email'),] } Any email address matching the regex will be replaced with the tag <email> |
mask | If the content parameter is set to True, specify how you would like to conceal the content. Choose between "silence" (default) or "beep". |
words | Set this parameter to True to get the timestamps of each word with the transcription, when the content parameter is True. If the audio is stereo, you will also get the sequential dialogue in the output. |
separate | (optional) Set this parameter to True to apply speaker separation to your input files before applying anonymisation. This is useful for mono files with 2 speakers. I the input file is stereo, no separation will be applied. Check the Speaker Separation section below to get more information |
Supported languages:
Language | Voice Anonymization | Content Anonymization | Value of the language parameter |
---|---|---|---|
English | Yes | Yes | english |
French | Yes | Yes | french |
Multilingual | Coming Soon! | Yes |
To exclusively utilize the Voice Anonymization API, kindly modify your parameter settings to align with the following configuration:
params = {
'language':'french',
'gender':'f',
'voice':True
}
To include content Anonymization modify your parameter settings to this:
params = {
'language':'french',
'gender':'f',
'robotic':'false',
'mask':'silence',
'voice':True,
'content':True,
'entities': 'Name,Organization,Location,City,Country,Numbers,Age,Date,Credit Card Number,Email,Concept,Product,Event,Technology,Group,Medical Condition,Characteristic,Research,County',
'regex_entities': \[('[\w\-\.]+@([\w-]+\.)+[\w-]{2,4}', 'email'),],
'words': True
}
Build payload and submit job
Optional: credit checking
Check your credit manually in seconds/minutes by visit the following page:
https://api.nijta.com/credit/<token> # {"minute":"1000.3"}
curl https://api.nijta.com/credit/<token>
Supported inputs
You have multiple ways to forward your files to the API.
Input | Description |
---|---|
URLs (Multiple) | a list of audio URLs (e.g., "https://foo.bar/audio_1", "https://foo.bar/audio_2", etc.). |
URL (Single) | a single audio URL ("https://foo.bar/audio_1"). |
Files (Multiple) | a list of local audio files ("path/to/audio_1.wav", "path/to/audio_2.wav", etc.). |
Files (Single) | a single local audio file ("path/to/audio_1"). |
Directory | a local directory containing audio files ("path/to/audio_folder"). |
Archive | an archive with audio files |
S3 Bucket | the URL to an S3 bucket containing audio files: "s3://my-bucket/". You then have to pass your aws credentials (see example below). |
Send request
To do so get a session id by calling the NijtaIO session function
session_id = nijtaio.session(TOKEN, url=API_URL) # preliminary token and credit check
Read, Load files and send your request to VoiceHarbor API by using NijtaIO send_request function.
## url(s)
response = nijtaio.send_request(
['https://foo.bar/audio_1', 'https://foo.bar/audio_2'],
params,
session_id,
headers=headers,
api_url=API_URL,
)
## file(s)
response = nijtaio.send_request(
['path/to/audio_1.wav', 'path/to/audio_2.wav'],
params,
session_id,
headers=headers,
api_url=API_URL,
)
## directory
response = nijtaio.send_request(
'path/to/audio_folder', params, session_id, headers=headers, api_url=API_URL
)
## archive
response = nijtaio.send_request(
'https://s3.amazonaws.com/datasets.huggingface.co/SpeechCommands/v0.01/v0.01_test.tar.gz',
params,
session_id,
headers=headers,
api_url=API_URL,
)
## S3 bucket
storage_options = {
'key': '<AWS_ACCESS_KEY_ID>',
'secret': '<AWS_SECRET_ACCESS_KEY>',
}
response = nijtaio.send_request(
's3://my-bucket/',
params,
session_id,
headers=headers,
api_url=API_URL,
storage_options=storage_options,
)
Limitations
File Size Limit: Audio files should not exceed 500 MB in size. The API will not accept files larger than this limit.
Supported Audio Formats: An audio file batch is organized as a dictionary, utilizing filenames as keys and binary content as corresponding values. The permissible audio file formats encompass 'wav', 'mp3', 'ogg', and 'flac'. If an incompatible file is encountered, its name will be included in the failed_files list within the Response.
License based API call limits: Our API accommodates user-initiated submissions of multiple batches, with the capacity to transmit up to 5(Foundation Plan) individual files per batch. Additionally, the system is engineered to efficiently manage concurrent requests, allowing for a maximum of 3(Foundation Plan) simultaneous requests per user.
Batch Processing
Our API is structured to efficiently handle a substantial volume of files beyond the above mentioned capacity. To facilitate the processing of a greater number of files, users are encouraged to reach out to us at [email protected] and explore the option of upgrading your current payment plan.
Success Response
Object | Description | Description |
---|---|---|
Code | 200 (OK) | |
Content | A JSON object with information to monitor the status of the task: | |
task_id | the id of the created task that can be passed to monitor the status of the task and retrieve the anonymized content when finished | |
submission status | status of the task | |
submission_time | creation time of this task | |
failed_files | files that couldn't be included in the tasks, if any |
Error Response
Code | Description |
---|---|
400 | "failed, no valid files" if none of the file in the batch is valid. |
429 | "failed, too many requests" if the number of requests exceed the one authorized. |
Monitor Job
Status & Result
Wait for the job be finished, check job status by "task_id"
task_id = json.loads(response.content)['data']['task_id'] print('Waiting for the batch to be processed.') while True: time.sleep(1) status, anonymized_batch = nijtaio.read_response(task_id, api_url=API_URL) if status == 'finished': break
Monitor Status
check your job status by logging "task_status". Modify the code to have permanent monitor of the status:
... while True: time.sleep(1) response = requests.get('{}/{}'.format(API_URL, task_id)) content = json.loads(response.content) logging.info(content['data']['task_status']) ...
Download result
The anonymized_batch
dictionary serves as a container for processed data, where original file paths are keys, and their corresponding anonymized content is stored as values.
- Audio Content:
Whenvoice
orcontent
is set toTrue
, anonymized audio content can be accessed via theaudio
key. - Transcriptions:
Ifcontent=True
, additional data beyond audio is included:- Anonymized transcription: Accessible via the 'transcription' key.
- Original transcription: Accessible under 'original' -> 'text'.
- Word-Level Timestamps:
When bothcontent=True
andwords=True
, each word's timestamp can be retrieved via thewords
key. - Sequential Dialogue: (requires nijtaio v1.1.8+)
Ifcontent=True
,words=True
, and the audio is stereo, the sequential dialogue is available under thesequence
key. - Report Summary: (requires nijtaio v1.1.9+)
For each file, areport
key provides useful data about the original content, including:- A list of personally identifiable information (PII) categories detected in the contents, independently of the categories passed in parameters to inform you about the categories you could want to add.
- The legal frameworks associated with the detected PII (e.g., GDPR, CCPA).
- A classification of the document based on ISO 27001 (e.g., Confidential, Restricted).
- Metadata such as sampling rate, duration, and MOS (Mean Opinion Score) for audio quality.
Here's an example of how the anonymized_batch
dictionary looks like:
{
'path/to/audio.wav': {
'audio': b'RIFFb\x12\x03\x00WAVEfmt \x10\x00\[...]\xff\xeb\xff\xe6\xff',
'transcription': 'Hey my name is <Name> I am from <City> ...',
'language': 'en',
'words': [{'end': 1.14, 'start': 0.56, 'text': ' Hey'},
{'end': 1.32, 'start': 1.14, 'text': ' my'},
{'end': 1.48, 'start': 1.32, 'text': ' name'},
{'end': 1.76, 'start': 1.48, 'text': ' is'},
{'end': 1.92, 'start': 1.76, 'text': ' <Name>'},
[...]],
'report': {
'piis': {'Location': 2, 'Person': 2},
'regulations': ['CCPA (California)', 'PIPEDA (Canada)', 'LGPD (Brazil)', 'GDPR (EU)'],
'sensitivity': 'Restricted',
'infos': {
'sampling_rate': 16000,
'sample_width': 2,
'format_': 'wav',
'duration': 6.315,
'channels': 1,
'mos': 0.8919036388397217
}
},
}
}
In the provided code snippet, a loop iterates through the keys of the anonymized_batch
dictionary, the anonymized audio content is saved in an output folder and the anonymized transcription is printed.
# Read the result and write in a file
print(f'Writing results in {output_folder}.')
os.makedirs(output_folder, exist_ok=True)
for original_filepath in anonymized_batch:
filename = os.path.basename(original_filepath)
with open(os.path.join(output_folder, filename), mode='wb') as f:
f.write(anonymized_batch[original_filepath]['audio'])
if params['content']:
print(filename, anonymized_batch[original_filepath]['transcription'])
print(f'Done. Check the results in the {output_folder} directory.')
In the transcriptions, the anonymization process replaces sensitive information, such as personal names, locations, birthdates, and credit card numbers, with placeholders ("").
Example output
# Hey my name is <Name> I am from <City> I was born on <Number> March <Number> My credit card number is <Number>
CLI
nijtaio
also comes with a command voiceharbor
, providing another option for interacting with the Voice Harbor API. With just a few command-line arguments, you can anonymize and process audio files efficiently. This CLI feature simplifies the process of submitting audio files for processing and retrieving anonymized results, all without the need for coding.
To use the CLI, simply invoke the voiceharbor
command followed by the required arguments (run voiceharbor --help
to get that list:
--token
your API token,--input_data
input audio file paths--language
choose between "english_16" and "french_8"--gender
for the gender of the target speaker-- voice
Whether to anonymize voice or not.--content
Whether to anonymize content or not--output_folder
the folder when you want the result to be saved
The CLI handles communication with the Voice Harbor API and provides you with status updates as the processing takes place. Whether you are automating anonymization tasks or running one-off commands, the CLI offers a streamlined experience.
voiceharbor --token <token> \
--input_data "[\"path/to/audio_1.wav\", \"path/to/audio_2.wav\"]" \
--language english_16 \
--gender m \
--voice True \
--content True \
--output_folder path/to/folder
NER
You can choose to mask any categories of entities. Here's a non exhaustive list of categories considered as PII:
Example PII:
Mobile number IBAN Location Organization Date and Time Medical Conditions Transportation Landmarks and Attractions Emergency Service Keywords Credit card number Health insurance number Address City County District Borough Age Date Birth date CCV Time Emergency Type Person Numbers Vehicle Description Injured Person's Name Medical Condition Landmarks/Points of Interest Intent
Generative Content Anonymization (WIP)
Presently, our system offers support for the substitution of the sensitive entities with auditory cues such as beeps or periods of silence. We are actively engaged in developing a Generative AI driven feature that will facilitate the replacement of these entities with generated ones (Audio+Text) from within the same semantic category.
This advancement is aimed at elevating the efficacy of the anonymized speech, thereby contributing to an enhanced level of utility.
Speaker Separation
Speaker separation is a feature within our API that isolates individual speakers recorded on one channel, in a mono audio file. The output is a stereo file where each speaker is in a separate channel, enabling better analysis of the dialogue. This is applied before anonymization. Below is an example of how you can use this feature in your code:
import os
import json
import time
import nijtaio # pip install nijtaio
# Replace '<token>' with your actual token
TOKEN = '<token>'
API_URL = 'https://api.nijta.com/'
# Set up headers for the API request
headers = {
"Content-Type": "application/json; charset=utf-8",
"TOKEN": TOKEN
}
# Parameters for the API request
params = {
'language': 'french_8',
'gender': 'f',
'voice': True,
'content': True,
'entities': 'Name,Organization,Location,City,Country,Numbers,Age,Date,Credit Card Number,Email,Concept,Product,Event,Technology,Group,Medical Condition,Characteristic,Research,County'
'separate': True
}
# Output folder where the results will be stored
output_folder = 'output'
# Start a new session with the provided token
session_id = nijtaio.session(TOKEN, api_url=API_URL)
# Send a batch of audio files for processing
response = nijtaio.send_request(
["path/to/audio.wav"], # audio.wav is a mono file with 2 speakers
params,
session_id,
headers=headers,
api_url=API_URL
)
# Extract the task ID from the response
task_id = json.loads(response.content)['data']['task_id']
# Monitor the processing status of the batch
print('Waiting for the batch to be processed.')
while True:
time.sleep(1)
status, anonymized_batch = nijtaio.read_response(task_id, api_url=API_URL)
if status == 'finished':
break
will give as a result, in anonymized_batch
:
{
'path/to/audio.wav': {
'audio': b'RIFFb\x12\x03\x00WAVEfmt \x10\x00\[...]\xff\xeb\xff\xe6\xff',
'transcription': {
'0': " Bonjour monsieur, je suis <Name> de <Organization>. Est-ce que vous avez cinqs minutes à m'accorder ? [...]",
'1': ' Bonjour <Name>, oui, je vous écoute. [...]'
}
}
}
With words = True
:
{
'path/to/audio.wav': {
'audio': b'RIFFb\x12\x03\x00WAVEfmt \x10\x00\[...]\xff\xeb\xff\xe6\xff',
'transcription': {
'0': " Bonjour monsieur, je suis <Name> de <Organization>. Est-ce que vous avez cinqs minutes à m'accorder ? [...]",
'1': ' Bonjour <Name>, oui, je vous écoute. [...]'
},
'words': {
"0": [
{
"end": 0.36,
"start": 0.0,
"text": " Bonjour"
},
{
"end": 0.72,
"start": 0.54,
"text": " monsieur"
},
[...],
],
"1": [
{
"0": [
{
"end": 0.76,
"start": 0.72,
"text": " Bonjour,"
},
{
"end": 0.76,
"start": 0.72,
"text": " <Name>,"
},
[...]
]
},
'sequence': [
{
"channel": "0",
"text": " Bonjour monsieur, je suis <Name> de <Organization>. Est-ce que vous avez cinqs minutes à m'accorder ?",
"start": 0.0,
"end": 9.4
},
{
"channel": "1",
"text": " Bonjour <Name>, oui, je vous écoute.",
"start": 9.4,
"end": 9.4
},
{
"channel": "0",
"text": " Vous êtes <Name> ?",
"start": 9.48,
"end": 10.22
},
[...]
]
}
}
Updated 2 months ago