How to handle domain-specific jargon (medical, legal, technical)
When working with specialized domains like medicine, law, or engineering, transcription systems can sometimes struggle with jargon, abbreviations, or less common terms. To improve accuracy, you can use the custom_vocabulary parameter.
This feature lets you prioritize specific terms, add pronunciations, and fine-tune the system to better recognize words critical to your use case.
Example: Medical Terminology
Suppose you are transcribing patient consultations or medical lectures. Common words like rybelsus or lisinopril may be misheard or misspelled. By explicitly defining them in your vocabulary, you guide the model to recognize and correctly transcribe them.
{ "custom_vocabulary": true, "custom_vocabulary_config": { "vocabulary": [ { "value": "rybelsus", "pronunciations": ["rebellious", "rebesses"], "intensity": 0.8, "language": "en" }, { "value": "lisinopril", "pronunciations": ["Lizzie No Pryor", "Lizzie Knoprol"], "intensity": 0.7, "language": "en" } ], "default_intensity": 0.6 } }
Best Practices
Start with your most critical 10–20 terms (drug names, procedures, diagnoses).
Add abbreviations and acronyms (e.g., ECG, HbA1c).
Review transcripts regularly and update your vocabulary with new terms.
For multilingual contexts (e.g., Latin medical terms), use vocabulary.language to specify pronunciation language.
With this setup, medical terms and specific jargon will be transcribed more reliably, reducing errors that could otherwise distort meaning in critical domains.
To learn more about the custom vocabulary feature, you can check our doc : https://docs.gladia.io/chapters/live-stt/features#custom-vocabulary