How to upload large files efficiently?

How to Upload Large Files Efficiently?

The Gladia API applies certain limits to guarantee stability and performance for all users:

Maximum audio duration:
- Standard plan → 135 minutes per request
- YouTube direct links → 120 minutes
- Enterprise plan → up to 4h15
Maximum file size: 1000 MB (1 GB)

Maximum file size can be increased with an Enterprise plan

Any file exceeding these limits will be rejected by the API.

When dealing with long or heavy recordings, the best approach is to optimize the file step by step before uploading. He are some best practices :

If you have a video, consider extracting only the audio → reduces file size and improves transcription performance.
Downsample the audio to 16 kHz → sufficient for speech recognition while keeping the file lighter.
Split files longer than 1 hour into smaller chunks → improves stability and transcription accuracy.

Let’s go through a concrete example.

We start with a 1h15 MP4 video weighing 450 MB.

First, we strip out the video part and extract the audio as a WAV file:


ffmpeg -i input.mp4 -vn -c:a pcm_s16le temp.wav

Now we have a raw audio WAV file, but it can be optimized a bit more.

For transcription, 16 kHz sampling rate is sufficient and helps reduce file size while keeping voice quality intact.


ffmpeg -i temp.wav -ar 16000 -c:a pcm_s16le optimized.wav

This produces an API-compatible file that is lighter and faster to process.

Since the optimized file is still 1h15 long, we split it into two smaller files, each no longer than 1 hour:


ffmpeg -i optimized.wav -f segment -segment_time 3600 -c copy chunk_%02d.wav

Both files now follow best practices: shorter chunks usually provide better stability and more accurate transcriptions.

After these steps, we end up with two WAV PCM files, each ≤ 1 hour and optimized at 16 kHz for efficient transcription.