How to handle the Websocket connection ?
WebSockets are the backbone of Gladia’s real-time transcription API. Every piece of audio you stream in and every event or transcript you get back flows through this single channel. Getting this right ensures low latency, reliable transcription, and smooth user experiences.
Here’s how to handle WebSocket connections the right way with Gladia 👇
One Connection = One Session
Always start with POST /v2/live to create a session.
You’ll receive a WebSocket URL (with a session token).
Use this URL to open exactly one WebSocket per session.
Keep it alive until the session is finished.
👉 Why? This keeps overhead low, avoids duplicate events, and makes your application predictable.
Handling error 429
A free Gladia account is limited with 1 websocket opened at the same time. If you try to open more, you’ll receive a 429 error from our API (Too Many Requests).
https://docs.gladia.io/chapters/limits-and-specifications/concurrency
To upgrade this limit, contact a sales to discuss a custom package : contact us
Understand the Message Types
Every message from Gladia includes a type field. Handling them correctly is the secret to a stable app:
Type  | What it means  | How to use it  | 
|---|---|---|
lifecycle  | Session updates (start_session, speech_end, …)  | Sync your app with session state  | 
transcript  | Partial & final text  | Show live captions or feed your app  | 
acknowledgment  | Confirms audio chunks received  | Optional, but great for debugging reliability  | 
post-processing  | Outputs like translation, summarization  | Use it for logs, or further usage depending on your use case  | 
Send Audio the Right Way
Match your audio sample rate and channel count to what you declared when starting the session.
You can send audio as binary, or as base64
Send small, continuous chunks (e.g. 20–50 ms of audio).
For multi-channel input, interleave audio buffers in the right order — Gladia preserves speaker identity by channel.
If you’re not sure, preprocess your audio before sending (resample, normalize, denoise). Better input = better transcription.
To go further, check the documentation : https://docs.gladia.io/chapters/live-stt/getting-started
Closing the Connection
When you’re done recording, you can send to Gladia’s websocket:
{ "type": "stop_recording" }
This signals the end of audio, lets Gladia finalize the transcript, and then you can safely close the WebSocket. It might be useful as your amount of opened WebSockets is limited based on your plan.
NB: The WebSocket will automatically close after roughly 30 seconds of inactivity (no audio sent), and it will use the close code 4408
Handling Disconnects & Reliability
WebSockets can drop—Wi-Fi hiccups, server restarts, etc. Plan for it:
Auto-reconnect: try to re-open the same session using the original URL (the token remains valid).
Enable TCP keep-alive to prevent idle disconnections on long sessions.
And don’t forget acknowledgments—they’re a great way to confirm your audio is being received and processed.