Skip to content
Gladia Help Center home
Gladia Help Center home

How to upload large files efficiently?

How to Upload Large Files Efficiently?

API Limitations

The Gladia API applies certain limits to guarantee stability and performance for all users:

  • Maximum audio duration:

    • Standard plan → 135 minutes per request

    • YouTube direct links → 120 minutes

    • Enterprise plan → up to 4h15

  • Maximum file size1000 MB (1 GB)

Maximum file size can be increased with an Enterprise plan

Any file exceeding these limits will be rejected by the API.

Preparing Large Media Files for Upload

When dealing with long or heavy recordings, the best approach is to optimize the file step by step before uploading. He are some best practices :

  • If you have a video, consider extracting only the audio → reduces file size and improves transcription performance.

  • Downsample the audio to 16 kHz → sufficient for speech recognition while keeping the file lighter.

  • Split files longer than 1 hour into smaller chunks → improves stability and transcription accuracy.

Let’s go through a concrete example.

Step 1: Starting Point

We start with a 1h15 MP4 video weighing 450 MB.

Step 2: Remove the Video Track

First, we strip out the video part and extract the audio as a WAV file:

ffmpeg -i input.mp4 -vn -c:a pcm_s16le temp.wav
  • vn removes the video stream.

  • pcm_s16le ensures WAV PCM encoding.

Now we have a raw audio WAV file, but it can be optimized a bit more.

Step 3: Downsample to 16 kHz

For transcription, 16 kHz sampling rate is sufficient and helps reduce file size while keeping voice quality intact.

ffmpeg -i temp.wav -ar 16000 -c:a pcm_s16le optimized.wav
  • ar 16000 → sets sample rate to 16 kHz.

This produces an API-compatible file that is lighter and faster to process.

Step 4: Split into Chunks of 1 Hour

Since the optimized file is still 1h15 long, we split it into two smaller files, each no longer than 1 hour:

ffmpeg -i optimized.wav -f segment -segment_time 3600 -c copy chunk_%02d.wav
  • chunk_00.wav → first 60 minutes.

  • chunk_01.wav → last 15 minutes.

Both files now follow best practices: shorter chunks usually provide better stability and more accurate transcriptions.

Final Result

After these steps, we end up with two WAV PCM files, each ≤ 1 hour and optimized at 16 kHz for efficient transcription.