New Microsoft AI can accurately mimic a human voice after analyzing a 3-second sample

midian182 · Jan 10, 2023

What just happened? Concerned that artificial intelligence seems to be advancing at a rapid pace, potentially threatening more human jobs? Then here's some news that could add to those concerns. A team of Microsoft researchers has announced a new AI that can accurately mimic a human voice from a mere three-second-long audio sample.

Microsoft's voice AI tool, called Vall-E, is trained on "discrete codes derived from an off-the-shelf neural audio codec model" as well as 60,000 hours of speech—100 times more than existing systems—from more than 7,000 speakers, most of which come from LibriVox public domain audiobooks.

Ars Technica reports that Vall-E builds on a technology called EnCodec that Meta announced in October 2022. It works by analyzing a person's voice, breaking down the information into components, and using its training to synthesize how the voice would sound if it were speaking different phrases. Even after hearing just a three-second sample, Vall-E can replicate a speaker's timbre and emotional tone.

Microsoft have announced their AI "VALL-E"

Using a 3-second sample of human speech, it can generate super-high-quality text-to-text speech from the same voice. Even emotional range and acoustic environment of the
sample data can be reproduced. Here are some examples. pic.twitter.com/ExoS2VWO6d
— Tuvok @ NaughtyDog (@TheCartelDel) January 7, 2023

"Experiment results show that Vall-E significantly outperforms the state-of-the-art zero-shot TTS system [AI that recreates voices it's never heard] in terms of speech naturalness and speaker similarity," states the research paper, available at Cornell University. "In addition, we find VALL-E could preserve the speaker's emotion and acoustic environment of the acoustic prompt in synthesis."

You can hear examples of Vall-E recreating voices on GitHub. Many are genuinely amazing, sounding almost identical to the speaker despite being based on such a short audio sample. There are a few that are a slightly more robotic and sound a bit closer to traditional text-to-voice software, but it's still impressive, and we can expect the AI to improve over time.

Microsoft's researchers believe Vall-E could find use as a text-to-voice tool, a way of editing speech, and an audio creation system by combining it with other generative AIs such as GPT-3.

As with all AIs, there are concerns about the potential misuse of Vall-E. Impersonating public figures like politicians is an example, especially when using it alongside Deepfakes. Or it could trick people into believing they are talking to family, friends, or officials and handing over sensitive data. There's also the fact that some security systems use voice identification. As for its impact on jobs, Vall-E would likely be a cheaper alternative to hiring voice actors.

Addressing the risks of Vall-E being misused, the researchers said these could be mitigated. "It is possible to build a detection model to discriminate whether an audio clip was synthesized by Vall-E. We will also put Microsoft AI Principles into practice when further developing the models."

Masthead: Ociacia

Permalink to story.

https://www.techspot.com/news/97217-new-microsoft-ai-can-accurately-mimic-human-voice.html

Plutoisaplanet · Jan 10, 2023

I can't wait to see this technology used by Bad Lip Reading so videos like these can become even more realistic:

https://twitter.com/i/web/status/1611934183110365184

Arbie · Jan 10, 2023

This is great!

[Full disclosure: all my comments are made by an AI]

loki1944 · Jan 10, 2023

Hey that's about how long it takes for a Microsoft update to break your pc. What a coincidence.

Eldritch · Jan 10, 2023

It will be awesome to integrate with GPT. Imagine generating stories in instant and then listening them from your favorite character/celebs.

Also RIP voice actors.

Feng Lengshun · Jan 10, 2023

We already have MoeTTS (online instance) and from my testing, while it is still apparent as being synthetic, it's also good enough that if you're doing it for funsies it works well enough for the characters that it has model for and you can make more models for it -- though to be fair, visual novels do tend to have a LOT of voices to sample from.

Also, Neuro-sama, while also synthetic-feeling, has gotten to a state where it's cute enough that people don't mind listening to her streaming.

The real breakthrough will be in making it easy enough for laypeople to use it on their 2020s laptops. I really do hope AMD catches up in that regards because a decent amount of the AI stuff I found seems to assume Nvidia GPU and that sucks. Also, kinda hoping for Linux peeps to eventually make a simple GUI to set these things up, but that's a pipe dream and I'd settle for the AUR repos being on par with the Windows installers.

meric · Jan 11, 2023

In my opinion, AI is advancing unchecked, with no control, no regulation and no care. I wonder how many countries and how many universities impose regulatory principles during development, let alone a common global base. We have to establish core principles during the development phase, before it goes out of hand. With current trend, it's hard to anticipate exactly when the AI will gain self-awareness. Could be tomorrow, could be in 10 years. And I see no regulation concerning who and for what purpose it could be used.

Avro Arrow · Jan 11, 2023

MasterMace · Jan 11, 2023

You'll just need to
One
Setup your Windows Sound settings
and ... oh ... sorry you unplugged an audio device
Two
Windows just reset the Sound settings
Goto One

Hodor · Jan 14, 2023

Avro Arrow said:
youtube

I had exactly the same thought, except from the sequel:

Avro Arrow · Jan 16, 2023

Hodor said:
I had exactly the same thought, except from the sequel:

Yeah, either way, it's damn creepy.

New Microsoft AI can accurately mimic a human voice after analyzing a 3-second sample

midian182

Posts: 9,742 +121

Plutoisaplanet

Posts: 1,083 +1,639

Arbie

Posts: 528 +897

loki1944

Posts: 1,193 +905

Eldritch

Posts: 597 +1,155

Feng Lengshun

Posts: 69 +29

meric

Posts: 400 +401

Avro Arrow

Posts: 3,721 +4,821

MasterMace

Posts: 312 +275

Hodor

Posts: 765 +512

Avro Arrow

Posts: 3,721 +4,821

Similar threads

Latest posts