Advanced vs. Mini

Our Mini and Advanced systems are designed to meet different customer needs when it comes to PHI redaction.

The Mini system harnesses a fine-tuned transformer model to quickly and accurately reduce PHI. Mini is ideal for high-volume or time-sensitive scenarios. Its streamlined architecture ensures that redaction is performed rapidly without compromising on quality, making it an excellent choice for environments where speed is crucial. In fact, Mini is 60% faster than our Advanced solution.

In contrast, the Advanced system leverages a private large language model (LLM) equipped with sophisticated reasoning capabilities. Its integrated reasoning allows for a deeper contextual analysis, ensuring that even subtle or ambiguous PHI elements are accurately identified and redacted. This level of precision is particularly beneficial for applications where data security and comprehensive compliance are paramount.

Not sure which model best fits your data? In May 2025, we’ll launch our auditing service that analyzes a representative subset of your data to provide you with tailored model recommendations. Currently, our audit service is available upon request. Just ask, and we’ll gladly assess your data to recommend the perfect model for your need

Simple model selection example, read more here.

# Advanced model:
params = {
    "files": [Path(input_file).name],
    "model": "advanced",
    "agents": ["health-generic"]
}
# Mini model, by default:
params = {
    "files": [Path(input_file).name],
    "model": "mini"
}

It’s recommended to use the a scheduled download approach if submitting larger amount of volume with advanced models. Read Documentation.

Labels

Mini model labels

"medical_record_number": "A unique identifier assigned to a patient's medical records.",
"date_of_birth": "The birth date of an individual.",
"ssn": "Social Security Number, a unique number assigned to U.S. citizens and some residents.",
"date": "A generic date field representing an event or record date.",
"first_name": "The given name of a person.",
"email": "An electronic mail address used for communication.",
"last_name": "The family or surname of a person.",
"customer_id": "A unique identifier for a customer account.",
"employee_id": "A unique identifier for an employee within an organization.",
"name": "A general name field which could represent a full name or title.",
"street_address": "The physical street address for a residence or business.",
"phone_number": "A contact telephone number.",
"credit_card_number": "A number associated with a credit card account.",
"address": "A general address field for various uses.",
"user_name": "A chosen username for account login or identification.",
"device_identifier": "A unique identifier assigned to a device.",
"date_time": "Combined date and time information.",
"company_name": "The name of a company or business entity.",
"unique_identifier": "A general purpose unique identifier.",
"account_number": "A number associated with a financial or user account.",
"city": "The name of a city in an address.",
"time": "A specific time, separate from the date.",
"postcode": "A postal or ZIP code used for mail delivery.",
"coordinate": "Geographical coordinate data (latitude/longitude).",
"country": "The name of a country.",
"api_key": "A key used to authenticate and authorize API requests.",
"ipv6": "An IPv6 address used in modern computer networks.",
"password": "A secret word or phrase used for authentication.",
"health_plan_beneficiary_number": "An identifier for a health plan beneficiary.",
"national_id": "A national identification number.",
"tax_id": "A number used for tax identification purposes.",
"url": "A web address.",
"state": "A state or region within a country.",
"swift_bic": "A code used for international bank transfers (SWIFT/BIC).",
"cvv": "The Card Verification Value found on credit cards."

Advanced model labels

"health-generic": {
    "NAME": "Any personal name (first, middle, last, suffix).",
    "LOCATION": "Indicates the geographical location relevant to the health record, such as city or country.",
    "ORGANIZATION": "Names of organizations like hospitals, clinics, or labs.",
    "PHONE": "Represents a contact telephone number associated with the record.",
    "AGE": "Age or age group of an individual.",
    "DATE": "A significant date related to the health event, such as admission or diagnosis date.",
    "DISEASE": "Diseases that can affect an individual (e.g., cancer, diabetes).",
    "DISORDER": "Mental or physical disorders (e.g., anxiety, arthritis).",
    "DRUG": "Medications or drugs prescribed or used.",
    "RESULT": "Clinical or lab results indicating test outcomes.",
    "SYMPTOM": "Reported symptoms indicating potential conditions.",
    "SYNDROME": "Recognized medical syndromes.",
    "TEST": "Medical tests or procedures (e.g., blood test).",
    "TREATMENT": "Medical treatments or therapeutic procedures.",
    "ORGANIZATION_ID": "Unique identification codes for organizations."
},
"clinical": {
    "GENE": "Denotes a specific gene relevant to the clinical or genetic data.",
    "PROTEIN": "Represents a protein that is studied or measured in clinical settings.",
    "CHEMICAL": "Specifies a chemical substance involved in biomedical studies or treatments.",
    "DISEASE_Syndrome_Disorder": "A combined label representing various clinical conditions such as diseases, syndromes, or disorders.",
    "DRUG_Ingredient": "Refers to the active component or ingredient of a drug formulation.",
    "DRUG_BrandName": "Represents the commercial brand name under which the drug is marketed.",
    "MUTATION": "Indicates a genetic mutation relevant to diagnosis or research.",
    "PATHWAY": "Represents a biological pathway involved in physiological or pathological processes.",
    "CELL_TYPE": "Specifies the type of cell that is relevant in the clinical context.",
    "BIOPROCESS": "Describes a biological process that is significant in clinical research or treatment."
},
"hipaa": {
    "NAME": "Any personal name (first, middle, last, suffix).",
    "GEOGRAPHIC": "Any geographic data (street, city, county, ZIP, etc.).",
    "DATES": "Dates (except year) directly related to an individual (birth, admission, discharge, death).",
    "TELEPHONE": "Telephone numbers (landline or mobile).",
    "FAX": "Fax numbers.",
    "EMAIL": "Email addresses.",
    "SSN": "Social Security numbers or equivalent identifiers.",
    "MEDICAL_RECORD": "Unique medical record numbers.",
    "HEALTH_PLAN": "Health plan beneficiary or insurance policy numbers.",
    "ACCOUNT": "Account numbers related to healthcare billing.",
    "LICENSE": "Certificate or license numbers issued by authorities.",
    "VEHICLE": "Vehicle identifiers or license plate numbers.",
    "DEVICE": "Device identifiers and serial numbers for medical equipment.",
    "WEB": "Web URLs or website addresses.",
    "IP": "IP address numbers.",
    "BIOMETRIC": "Biometric identifiers (fingerprints, retinal scans, voice prints).",
    "PHOTOGRAPHIC": "Full face photographic images or comparable images.",
    "OTHER_UNIQUE": "Any other unique identifying number, characteristic, or code.",
    "ORGANIZATION": "Names of organizations like hospitals, clinics, or labs.",
    "AGE": "Age or age group of an individual.",
    "DISEASE": "Diseases that can affect an individual (e.g., cancer, diabetes).",
    "DISORDER": "Mental or physical disorders (e.g., anxiety, arthritis).",
    "DRUG": "Medications or drugs prescribed or used.",
    "RESULT": "Clinical or lab results indicating test outcomes.",
    "SYMPTOM": "Reported symptoms indicating potential conditions.",
    "SYNDROME": "Recognized medical syndromes.",
    "TEST": "Medical tests or procedures (e.g., blood test).",
    "TREATMENT": "Medical treatments or therapeutic procedures.",
    "ORGANIZATION_ID": "Unique identification codes for organizations."
}