FFmpeg gets 100x faster with AVX-512 and handwritten assembly code

Alfonso Maruccia

Posts: 1,869   +569
Staff
What just happened? FFmpeg developers keep on crunching "handwritten" assembly code to make the multimedia project faster than ever before. Thanks to newer vector-based instructions included in modern x86 processors, FFmpeg can truly provide a massive speedup in media transcoding workloads – if you are lucky enough.

The FFmpeg team recently announced a massive speed increase thanks to some newly patched code. The open-source project is now more than 100 times faster – likely the biggest performance increase it's ever experienced. However, the developers warn that only a single function is receiving this full boost, though some huge speed improvements are coming to other parts of the project as well.

As clearly stated in the recently submitted patch, the "rangedetect8_avx512" function is now 100 times faster. The coders credit their handwritten assembly code for the speed increase, together with the extensive use of the AVX-512 extensions to the x86 ISA available in modern computer processors.

The FFmpeg team clearly is a big proponent of assembly programming. There is even an online school focused on how assembly is used in the project, where people interested in joining the challenge are pushed to "open their eyes" to what's actually going on in a computer when it's running some binary code in RAM.

Assembly is a low-level programming language where human-readable instructions have a direct correspondence to the CPU architecture's machine code instructions. Unlike high-level languages such as C, assembly code doesn't need to be "compiled" to work. Assembly programs are simply "assembled" into direct binary code designed to run on a specific processor ISA, and are definitely the best (and most difficult) way to extract every single bit of number-crunching performance from a CPU.

As confirmed by FFmpeg programmers, "register allocator sucks on compilers." The AVX-512 instruction set is a vector-based addition to the traditional x86 ISA, a type of "single instruction, multiple data" computing standard implemented by Intel and AMD in modern(ish) CPUs.

Vector-based instructions such as AVX-512, or the more recent AVX10 ISA introduced by Intel, can indeed provide a massive performance boost in parallel processing workloads. FFmpeg, a comprehensive suite of libraries and tools for processing multimedia streams, is well suited to exploit this kind of computing acceleration. The project experienced its first AVX-512-powered massive speed boost in 2024, when video decoding routines became three to 94 times faster.

Even on older processors that don't provide direct AVX-512 hardware support, the latest FFmpeg patch can still bring some eye-opening speed increases. The "rangedetect8_avx2" function is now 64 times faster, with AVX2 extensions being introduced together with the Haswell microarchitecture back in 2013.

Permalink to story:

 
The optimisations were submitted by Niklas Haas, the lead developer of libplacebo.

Though welcome, the function in question won't affect most FFmpeg usage, being part of a filter. Nonetheless, Haas added some other AVX2/512 optimisations too.
 
I really want to see someone add AC-4 decoding to ffmpeg. Without it, ATSC 3.0 and OTA TV in general, on the PC, will be dead.
 
You would need an AMD Zen 4 or Zen 5 (preferably Zen 5) to take full advantage of these new performance improvements.
 
Nice. This is why I exclusively encode on Zen 4/5 CPUs. Those things absolutely haul while encoding video.
 
Back