FFmpeg adds first AI feature with Whisper audio transcription filter

Alfonso Maruccia · Aug 15, 2025

Forward-looking: Although FFmpeg is often associated with video transcoding tasks, it can also handle audio streams and files with ease. The open-source project is now introducing its first AI-powered feature: an audio transcription filter based on a popular speech recognition model developed by OpenAI.

For the first time in its long history, FFmpeg is integrating AI models with the introduction of the new Whisper audio filter. This filter can process audio streams or files to perform automatic speech recognition, potentially simplifying media transcoding workflows – even for live events.

Whisper, developed by OpenAI, is a general-purpose speech recognition model trained on a large and diverse audio dataset. It supports multilingual transcription, speech translation, and language identification. The model is available in six different sizes, each offering a trade-off between speed and accuracy.

With Whisper, FFmpeg users can output transcriptions in multiple formats, including raw text, SRT subtitle files, or JSON. The filter also allows users to balance accuracy against performance and even supports GPU acceleration for faster processing.

FFmpeg developers have always prioritized speed and performance in media processing tasks. The team is known for its use of handwritten assembly code and vector-based parallel processing on modern chips. Audio processing appears to follow the same high-performance philosophy.

The Whisper filter introduces integrated speech recognition and transcription capabilities, allowing users to avoid relying on external services or additional software for similar results. This feature will be particularly useful for content creators, streamers, and professionals who need to handle repetitive archiving tasks.

The filter is especially significant because it brings the first AI model ever integrated into FFmpeg, marking what many see as an important precedent. This step could pave the way for more AI-driven features in the future, even as FFmpeg maintains its core focus on fast media processing and transcoding.

FFmpeg remains one of the most important multimedia frameworks, offering libraries and tools to handle video, audio, and other media formats. It supports a wide range of open standards and provides numerous filters for transforming or converting streams. Many major platforms and organizations rely on FFmpeg for transcoding, including YouTube, Google Chrome, the Linux version of Firefox, and countless others.

Permalink to story:

FFmpeg adds first AI feature with Whisper audio transcription filter

mbrowne5061 · Aug 15, 2025

I wonder if this might finally "solve" subtitles on software like Plex and Jellyfin. Lots of files out there with PGS subtitles that trigger transcoding and performance hits, or lack subtitles entirely. If this feature ever makes into a version of Plex, I could see it allowing Plex to auto-generate subtitles in your preferred languages whenever media is added to your library that lacks subs or only has PGS subs.

letsgoiowa · Aug 15, 2025

This is very exciting to me. However, Whisper is quite outdated at this point and many other models either match quality and run wayyyyy faster OR are far superior in quality. Hopefully those can be swapped in later.

mbrowne5061 · Aug 18, 2025

letsgoiowa said:
This is very exciting to me. However, Whisper is quite outdated at this point and many other models either match quality and run wayyyyy faster OR are far superior in quality. Hopefully those can be swapped in later.

I wonder if its a licensing issue?

Anyway, I think I might have used this feature in Shotcut last night. Shotcut uses FFMPEG, as far as I can tell, and I needed some subs generated. Downloaded a model, and it seems to have generated perfect subs, synced perfectly. Took about 40-60 minutes to process a video about 15 minutes in length, however - and it got 2 words spelled incorrectly (it spelled them like their phonetic pronunciations). Definitely seems like there is potential for improvement in either speed or accuracy, but still pretty good!

FFmpeg adds first AI feature with Whisper audio transcription filter

Alfonso Maruccia

Posts: 2,515 +935

mbrowne5061

Posts: 2,340 +1,527

letsgoiowa

Posts: 320 +508

mbrowne5061

Posts: 2,340 +1,527

Similar threads

Latest posts