In a nutshell: Meta is introducing another AI model for researchers and companies interested in language translation and related applications. The company, formerly known as Facebook, has developed the first "all-in-one" model that can handle a wide array of tasks across 100 languages.

Meta's new AI model holds the potential to underpin robust and efficient "universal translation" services in the future. Known as SeamlessM4T, this marks the first all-in-one multilingual, "multimodal" AI translation and transcription model, according to Meta. It offers support for up to 100 different languages, depending on the specific task.

SeamlessM4T serves as a foundational multilingual and multitask model capable of seamlessly translating and transcribing both speech and text. Given its multimodal nature, SeamlessM4T can process either speech or text input and produce translations in various formats including speech-to-speech, speech-to-text, text-to-speech, and text-to-text.

Notably, the AI can perform automatic speech recognition without requiring reference samples.

Language support varies depending on the task, encompassing 35 languages ("plus English") for speech-to-speech and text-to-speech output translations, and nearly 100 languages for all other modes. Meta has released the code for SeamlessM4T under a Creative Commons license (CC BY-NC 4.0), indicating its intent for researchers and developers to leverage their work in building new applications and solutions.

Seemingly inspired by the fictional Babel Fish described by Douglas Adams in "The Hitchhiker's Guide to the Galaxy," SeamlessM4T aims to turn the concept of a "universal translator" from fiction into reality. However, Meta explains that achieving this goal is a challenging endeavor due to the limited coverage of existing speech-to-speech and speech-to-text systems, which encompass only a fraction of the world's languages.

Meta believes that SeamlessM4T could represent a significant advancement toward the development of a universal translator. The multimodal approach it employs has the potential to minimize errors and delays, thereby enhancing efficiency and translation quality for individuals who speak various languages. This technology can bring about greater accessibility and understanding among people of diverse linguistic backgrounds.

SeamlessM4T is based on the progress made in translation technology by Meta and other organizations over the years. Meta references its own No Language Left Behind (NLLB) text-to-text machine translation model, which supports 200 languages and has been integrated into Wikipedia. Additionally, Meta highlights its recent Universal Speech Translator demo, which marked the introduction of the first direct speech-to-speech translation system for Hokkien.

Furthermore, Meta's recently unveiled Massively Multilingual Speech model encompasses speech recognition, language identification, and speech synthesis capabilities across over 1,100 languages. For more details about the SeamlessM4T model, its translation performance, and the responsible approach taken to address sensitive topics such as toxicity and gender bias, check out Meta AI's official blog. The code for SeamlessM4T is also available on GitHub.