Skip to content
Gladia Help Center home
Gladia Help Center home

How to combine real-time transcription with translation?

When you enable translation in a real-time transcription session, Gladia sends not only the usual transcription payloads but also a dedicated translation payload. This message provides the translated text alongside the original utterance and its metadata.

Enable the translation

To enable the translation for a realtime translation, you need to add this to your configuration :

"realtime_processing": { "translation": false, "translation_config": { "target_languages": [ "en","fr" ], "model": "base", "match_original_utterances": true, "lipsync": true, "context_adaptation": true, "context": "<string>", "informal": false } }

Important note on multiple target languages

Currently, when multiple target languages are enabled, translations are processed sequentially rather than in parallel.

This means each language is translated one after the other. As the number of requested languages increases, the processing time accumulates and can introduce additional latency before translations are received.

For real-time use cases such as live subtitles, we recommend limiting the number of simultaneous target languages to keep latency low.

If your application requires a large number of translated languages in parallel (for example 10+ languages), a common architecture is to use Gladia for real-time transcription and send the transcription output to a separate translation pipeline that can handle large-scale multilingual translation in parallel.

The Translation Payload

A translation message always has type: "translation". It contains both the original utterance and its translated counterpart, so you can display or store them together.

{ "session_id": "4a39145c-2844-4557-8f34-34883f7be7d9", "created_at": "2021-09-01T12:00:00.123Z", "error": null, "type": "translation", "data": { "utterance_id": "00-00000011", "utterance": { "language": "en", "start": 123, "end": 123, "confidence": 123, "channel": 1, "speaker": 1, "words": [ { "word": "<string>", "start": 123, "end": 123, "confidence": 123 } ], "text": "<string>" }, "original_language": "af", "target_language": "af", "translated_utterance": { "language": "en", "start": 123, "end": 123, "confidence": 123, "channel": 1, "speaker": 1, "words": [ { "word": "<string>", "start": 123, "end": 123, "confidence": 123 } ], "text": "<string>" } } }

To learn more about the translation event : https://docs.gladia.io/api-reference/v2/live/callback/translation

How to Use It

  • Display dual captions: Show the source utterance.text and the translated translated_utterance.text together.

  • Align by utterance ID: Ensure the translation matches the correct transcription.

  • Leverage metadata: Use speaker, start, and end to keep captions well-timed and attributed.

  • Store bilingual transcripts: Save both utterance objects to allow replay, subtitle generation, or exporting to SRT/VTT later.