Meta's AI-assisted audio codec claims 10x compression rate compared to MP3s

Alfonso Maruccia · Nov 2, 2022

TL;DR: Encodec is a next-generation audio codec based on a complex neural network design, a system that can squeeze a lot of audio juice into minimal storage space. The codec would work for Metaverse experiences and optimizing mobile phone calls.

Thanks to its high efficiency and integrated support by iconic products like the everlasting Winamp player, the MP3 codec became the de-facto standard for sharing audio files on the internet during the Nineties and beyond. Now, a new codec wants to make history again by offering even more extreme gains in efficiency and bandwidth saving. The secret is an AI algorithm capable of "hypercompressing" audio streams.

Meta researchers conceptualized Encodec as a potential solution for supporting "current and future" high-quality experiences in the metaverse. The new technology is a neural network trained to "push the boundaries of what's possible" in audio compression for online applications. The system can achieve "an approximate 10x compression rate" compared to the MP3 standard.

Meta trained the AI "end to end" to achieve a specific target size after compression. Encodec can squeeze a 64 Kbps MP3 data stream into 6 Kbps, which means it needs just 6,144 bytes (yes, bytes) to keep the same quality as the original. The researchers say the codec can compress 48 kHz stereo audio samples for speech — an industry first.

The AI-based approach can "compress and decompress audio in real-time to state-of-the-art size reductions," with potentially incredible results, as seen in the sample shared on Meta's AI blog. Classic codecs like MP3, Opus, or EVS decompose the signal between different frequencies and encode as efficiently as possible leveraging psychoacoustics (the study of human sound perception). Encodec's methods are based on a complex design comprising three parts: the encoder, the quantizer, and the decoder.

The encoder takes uncompressed data and turns it into a higher dimensional and lower frame rate representation. The quantizer compresses this stream to the target size while retaining the most vital information to rebuild the original signal. Finally, the decoder turns the compressed signal into a waveform that is "as similar as possible to the original."

Encodec's machine learning model identifies audio changes that are imperceptible to humans, using discriminators to improve the perceived quality of the generated sounds. Meta described this process as a "cat-and-mouse game," with the discriminator differentiating between the original and reconstructed samples. The final result is superior audio compression in low-bitrate speech (1.5 kbps to 12 kbps).

Encodec can encode and decode audio data in real-time on a single CPU core, Meta said, and it still offers many areas of improvement for even smaller file sizes. Beyond supporting next-gen Metaverse experiences on today's internet connections, the new model could potentially guarantee higher-quality phone calls in areas where mobile coverage is anything but optimal.

Permalink to story.

https://www.techspot.com/news/96528-encodec-facebook-ai-assisted-audio-codec-rich-metaverse.html

Plutoisaplanet · Nov 2, 2022

One adjustment to make with the article... 6kB per second is actually 48kbps. To represent 6kb in bytes, you would use 768 bytes. I'd recommend using kbit/s to make this obvious, and remember the k for kilo- is actually lowercase. https://en.wikipedia.org/wiki/Data-rate_units#Decimal_multiples_of_bits

seeprime · Nov 2, 2022

I'd like to hear them in an A-B comparison. Talk is cheap. Also, I trust no nothing that Meta claims. So, prove it.

richalone442 · Nov 2, 2022

Yea, but, what does it sound like with real music, not just test signals.
I have little hope that it will even be as good a MP3 which I stopped using years ago because of the low quality of compression techniques.
High bandwidth, no compression material sounds best, stop looking for anything but crappy audio with these silly ideas.

winjer · Nov 2, 2022

Does it include spyware and ads from Meta.

rmcrys · Nov 2, 2022

winjer said:
Does it include spyware and ads from Meta.

Exactly what I thought, most probably the AI is programmed so that parts of the voice information goes to their servers "just for the user's sake..."

Alfonso Maruccia · Nov 2, 2022

richalone442 said:
Yea, but, what does it sound like with real music, not just test signals.
I have little hope that it will even be as good a MP3 which I stopped using years ago because of the low quality of compression techniques.
High bandwidth, no compression material sounds best, stop looking for anything but crappy audio with these silly ideas.

Years ago I though MP3 was "good enough" for my listening experience. Then I got a pair of good-quality speakers (Edifier R1280T), then I started listening to FLAC audio files, and finally I said to myself: "MP3 is NOT good enough". Lossless compression ftw

Arbie · Nov 2, 2022

seeprime said:
Talk is cheap.

Looks like it may be getting even cheaper.

axiomatic13 · Nov 3, 2022

If it's from Meta, I don't want it.

p51d007 · Nov 3, 2022

Nothing sounds as good as a good turntable, tube (valve) amplifer connected to some old school speakers...ALL analog.
Digital is nice and convenient but you can oversample only so much...it's still a digital signal.

wiyosaya · Nov 3, 2022

rmcrys said:
Exactly what I thought, most probably the AI is programmed so that parts of the voice information goes to their servers "just for the user's sake..."

I agree. I'm sure its replete with spyware that Meta can sell to anyone to ensure that you are not pirating copyrighted material.

Why we should trust Meta with something like this is beyond my comprehension.

wiyosaya · Nov 3, 2022

p51d007 said:
Nothing sounds as good as a good turntable, tube (valve) amplifer connected to some old school speakers...ALL analog.
Digital is nice and convenient but you can oversample only so much...it's still a digital signal.

Speaking of which, TS was thinking of you https://www.techspot.com/news/96529...ects-iconic-sound-burger-portable-record.html

wiyosaya · Nov 3, 2022

Alfonso Maruccia said:
Years ago I though MP3 was "good enough" for my listening experience. Then I got a pair of good-quality speakers (Edifier R1280T), then I started listening to FLAC audio files, and finally I said to myself: "MP3 is NOT good enough". Lossless compression ftw

Granted FLAC is great and open source, DSD is also great.

rmcrys · Nov 4, 2022

Alfonso Maruccia said:
Years ago I though MP3 was "good enough" for my listening experience. Then I got a pair of good-quality speakers (Edifier R1280T), then I started listening to FLAC audio files, and finally I said to myself: "MP3 is NOT good enough". Lossless compression ftw

Lossless is always the best but that is only doable for local music files, because otherwise worldwide it would consume too much bandwidth, not to mention if movies got the same treatment...

But I think as music files are rather small, the effort developing good codecs went for video.

BobHome · Nov 4, 2022

rmcrys said:
Exactly what I thought, most probably the AI is programmed so that parts of the voice information goes to their servers "just for the user's sake..."

Hmmm... remember subliminal messaging?

Meta's AI-assisted audio codec claims 10x compression rate compared to MP3s

Alfonso Maruccia

Posts: 1,844 +556

TL;DR: Encodec is a next-generation audio codec based on a complex neural network design, a system that can squeeze a lot of audio juice into minimal storage space. The codec would work for Metaverse experiences and optimizing mobile phone calls.

Plutoisaplanet

Posts: 1,369 +1,975

seeprime

Posts: 974 +1,447

richalone442

Posts: 70 +56

winjer

Posts: 798 +3,984

rmcrys

Posts: 556 +424

Alfonso Maruccia

Posts: 1,844 +556

Arbie

Posts: 540 +911

axiomatic13

Posts: 891 +1,264

p51d007

Posts: 4,492 +4,559

wiyosaya

Posts: 10,696 +11,416

wiyosaya

Posts: 10,696 +11,416

wiyosaya

Posts: 10,696 +11,416

rmcrys

Posts: 556 +424

BobHome

Posts: 393 +191

Similar threads

Latest posts