How to handle the Websocket connection ?

WebSockets are the backbone of Gladia’s real-time transcription API. Every piece of audio you stream in and every event or transcript you get back flows through this single channel. Getting this right ensures low latency, reliable transcription, and smooth user experiences.

Here’s how to handle WebSocket connections the right way with Gladia 👇

One Connection = One Session

Always start with POST /v2/live to create a session.

You’ll receive a WebSocket URL (with a session token).
Use this URL to open exactly one WebSocket per session.
Keep it alive until the session is finished.

👉 Why? This keeps overhead low, avoids duplicate events, and makes your application predictable.

Handling error 429

A free Gladia account is limited with 1 websocket opened at the same time. If you try to open more, you’ll receive a 429 error from our API (Too Many Requests).

https://docs.gladia.io/chapters/limits-and-specifications/concurrency

To upgrade this limit, contact a sales to discuss a custom package : contact us

Understand the Message Types

Every message from Gladia includes a type field. Handling them correctly is the secret to a stable app:

Type	What it means	How to use it
lifecycle	Session updates (start_session, speech_end, …)	Sync your app with session state
transcript	Partial & final text	Show live captions or feed your app
acknowledgment	Confirms audio chunks received	Optional, but great for debugging reliability
post-processing	Outputs like translation, summarization	Use it for logs, or further usage depending on your use case

Send Audio the Right Way

Match your audio sample rate and channel count to what you declared when starting the session.
You can send audio as binary, or as base64
Send small, continuous chunks (e.g. 20–50 ms of audio).
For multi-channel input, interleave audio buffers in the right order — Gladia preserves speaker identity by channel.

If you’re not sure, preprocess your audio before sending (resample, normalize, denoise). Better input = better transcription.

To go further, check the documentation : https://docs.gladia.io/chapters/live-stt/getting-started

Closing the Connection

When you’re done recording, you can send to Gladia’s websocket:


{ "type": "stop_recording" }

This signals the end of audio, lets Gladia finalize the transcript, and then you can safely close the WebSocket. It might be useful as your amount of opened WebSockets is limited based on your plan.

NB: The WebSocket will automatically close after roughly 30 seconds of inactivity (no audio sent), and it will use the close code 4408. The WebSocket will close itself after 1 minutes if not text is transribed, or no audio is sent with the code 4504.

Handling Disconnects & Reliability

WebSockets can drop—Wi-Fi hiccups, server restarts, etc. Plan for it:

Auto-reconnect: try to re-open the same session using the original URL (the token remains valid).
Enable TCP keep-alive to prevent idle disconnections on long sessions.

And don’t forget acknowledgments—they’re a great way to confirm your audio is being received and processed.