How to upload large files efficiently?
How to Upload Large Files Efficiently?
API Limitations
The Gladia API applies certain limits to guarantee stability and performance for all users:
Maximum audio duration:
Standard plan → 135 minutes per request
YouTube direct links → 120 minutes
Enterprise plan → up to 4h15
Maximum file size: 1000 MB (1 GB)
Maximum file size can be increased with an Enterprise plan
Any file exceeding these limits will be rejected by the API.
Preparing Large Media Files for Upload
When dealing with long or heavy recordings, the best approach is to optimize the file step by step before uploading. He are some best practices :
If you have a video, consider extracting only the audio → reduces file size and improves transcription performance.
Downsample the audio to 16 kHz → sufficient for speech recognition while keeping the file lighter.
Split files longer than 1 hour into smaller chunks → improves stability and transcription accuracy.
Let’s go through a concrete example.
Step 1: Starting Point
We start with a 1h15 MP4 video weighing 450 MB.
Step 2: Remove the Video Track
First, we strip out the video part and extract the audio as a WAV file:
ffmpeg -i input.mp4 -vn -c:a pcm_s16le temp.wav
vnremoves the video stream.pcm_s16leensures WAV PCM encoding.
Now we have a raw audio WAV file, but it can be optimized a bit more.
Step 3: Downsample to 16 kHz
For transcription, 16 kHz sampling rate is sufficient and helps reduce file size while keeping voice quality intact.
ffmpeg -i temp.wav -ar 16000 -c:a pcm_s16le optimized.wav
ar 16000→ sets sample rate to 16 kHz.
This produces an API-compatible file that is lighter and faster to process.
Step 4: Split into Chunks of 1 Hour
Since the optimized file is still 1h15 long, we split it into two smaller files, each no longer than 1 hour:
ffmpeg -i optimized.wav -f segment -segment_time 3600 -c copy chunk_%02d.wav
chunk_00.wav→ first 60 minutes.chunk_01.wav→ last 15 minutes.
Both files now follow best practices: shorter chunks usually provide better stability and more accurate transcriptions.
Final Result
After these steps, we end up with two WAV PCM files, each ≤ 1 hour and optimized at 16 kHz for efficient transcription.