Microsoft VASA tech can create realistic deepfakes using a single photo and one audio track

Cal Jeffrey

Posts: 4,183   +1,427
Staff member
Through the looking glass: Microsoft Research Asia has released a white paper on a generative AI application it is developing. The program is called VASA-1, and it can create very realistic videos from just a single image of a face and a vocal soundtrack. Even more impressive is that the software can generate the video and swap faces in real time.

The Visual Affective Skills Animator, or VASA, is a machine-learning framework that analyzes a facial photo and then animates it to a voice, syncing the lips and mouth movements to the audio. It also simulates facial expressions, head movements, and even unseen body movements.

Like all generative AI, it isn't perfect. Machines still have trouble with fine details like fingers or, in VASA's case, teeth. Paying close attention to the avatar's teeth, one can see that they change sizes and shape, giving them an accordion-like quality. It is relatively subtle and seems to fluctuate depending on the amount of movement going on in the animation.

There are also a few mannerisms that don't look quite right. It's hard to put them into words. It's more like your brain registers something slightly off with the speaker. However, it is only noticeable under close examination. To casual observers, the faces can pass as recorded humans speaking.

The faces used in the researchers' demos are also AI-generated using StyleGAN2 or DALL-E-3. However, the system will work with any image – real or generated. It can even animate painted or drawn faces. The Mona Lisa face singing Anne Hathaway's performance of the "Paparazzi" song on Conan O'Brien is hilarious.

Joking aside, there are legitimate concerns that bad actors could use the tech to spread propaganda or attempt to scam people by impersonating their family members. Considering that many social media users post pictures of family members on their accounts, it would be simple for someone to scrape an image and mimic that family member. They could even combine it with voice cloning tech to make it more convincing.

Microsoft's research team acknowledges the potential for abuse but does not provide an adequate answer for combating it other than careful video analysis. It points to the previously mentioned artifacts while ignoring its ongoing research and continued system improvement. The team's only tangible effort to prevent abuse is not releasing it publicly.

"We have no plans to release an online demo, API, product, additional implementation details, or any related offerings until we are certain that the technology will be used responsibly and in accordance with proper regulations," the researchers said.

The technology does have some intriguing and legitimate practical applications, though. One would be to use VASA to create realistic video avatars that render locally in real-time, eliminating the need for a bandwidth-consuming video feed. Apple is already doing something similar to this with its Spatial Personas available on the Vision Pro.

Check out the technical details in the white paper publish on the arXiv repository. There are also more demos on Microsoft's website.

Permalink to story:

 
Best I have seen so far.
There is one thing I want to know though. AI needs to feed, it will never stop feeding
on new info. So, this tech could replace and make obsolete a huge number of
people who previously created stuff like this. But since the best ones, the most talented would
still be able to make money, they will start guarding their content from these devourers.
Anyone with a brain will then realize that AI using their content is pretty much fires them from their job.
People will viciously guard their content and sue any AI firm using even as much as one image or a second of music.
Then AI wont have as much high quality data to learn.
Would this be the time AI will not have enough data to improve as fast?
Would that make AI much more expensive to develop since they will have to pay
a lot more to people than they do now pretty much sucking everything from everywhere.
Of course, I cannot know how it will be, but I find this topic very interesting.
Furthermore, I think people who live by selling stuff AI could do should
start to unite, encourage laws and apps that can 100% check if as little as one image
was used by any AI system with or without permission from their content creator.
 
Last edited:
Sigh .. hey I have a great idea ... let's keep improving tech that will make people kill themselves, kill other people and ruin peoples lives ...

lets make software that people will use to ::
Make 100x better deepfake porn of celebrities
Make 100x better deepfake of your co-worker to get them fired.. or get you fired
Make 100x better deepfakes of religious leaders encouraging violence
Make 100x better deepfake political figures making speeches or saying outright bs
Make 100x better deepfake porn of your children by taking 4 or 5 quick photos at the park and sell it online

people are garbage and this tech will kill people .. lots of them ..
 
Sigh .. hey I have a great idea ... let's keep improving tech that will make people kill themselves, kill other people and ruin peoples lives ...

lets make software that people will use to ::
Make 100x better deepfake porn of celebrities
Make 100x better deepfake of your co-worker to get them fired.. or get you fired
Make 100x better deepfakes of religious leaders encouraging violence
Make 100x better deepfake political figures making speeches or saying outright bs
Make 100x better deepfake porn of your children by taking 4 or 5 quick photos at the park and sell it online

people are garbage and this tech will kill people .. lots of them ..
Some of these are fairly easy to block. For example porn ones. Most companies will want people to use their AI as ONLINE service, hence controlling everything they can and cannot do. These still take a lot of hardware power. That already excludes a lot of criminals from doing so because of how expensive this stuff would be at home.
 
Some of these are fairly easy to block. For example porn ones. Most companies will want people to use their AI as ONLINE service, hence controlling everything they can and cannot do. These still take a lot of hardware power. That already excludes a lot of criminals from doing so because of how expensive this stuff would be at home.

overseas doesn't care about restrictions.
 
Sigh .. hey I have a great idea ... let's keep improving tech that will make people kill themselves, kill other people and ruin peoples lives ...

lets make software that people will use to ::
Make 100x better deepfake porn of celebrities
Make 100x better deepfake of your co-worker to get them fired.. or get you fired
Make 100x better deepfakes of religious leaders encouraging violence
Make 100x better deepfake political figures making speeches or saying outright bs
Make 100x better deepfake porn of your children by taking 4 or 5 quick photos at the park and sell it online

people are garbage and this tech will kill people .. lots of them ..

Just like with all the other fakery tech that has existed for millennia? Fake videos are just the next thing that some will perceive as the sky falling yet again, like people did in the past with previous tech. It's a wonder that civilization has been able to recreate itself from the ashes of its fiery death so many times.
 
Technologically impressive, socially and morally reprehensible IMO.
Indeed, very impressive... but also dangerous. This is the kind of thing that can start a revolution, war, or just people stealing money from someone. It could be a nice way to ruin someone's reputation too. The future will be interesting for sure. I only want this tech for movies. Imagine dead actors coming back. Some people hate it, I think it's kind of cool, and makes them in a way.. Immortal. Animations are like that already. We will have Scooby-Doo forever, just like Vincent Price, cus he was part of it. Why not have a realistic Tom Cruise (thats not animated) in movies 100 years from now? Why not have Clint Eastwood in more westerns after he's gone? Sure, this is the type of thing the family has to agree about, it should not be up to the movie studios.
 
Sigh .. hey I have a great idea ... let's keep improving tech that will make people kill themselves, kill other people and ruin peoples lives ...

lets make software that people will use to ::
Make 100x better deepfake porn of celebrities
Make 100x better deepfake of your co-worker to get them fired.. or get you fired
Make 100x better deepfakes of religious leaders encouraging violence
Make 100x better deepfake political figures making speeches or saying outright bs
Make 100x better deepfake porn of your children by taking 4 or 5 quick photos at the park and sell it online

people are garbage and this tech will kill people .. lots of them ..

What about tools for creating biological nightmares making say Corona Virus with a greater mortality with the infectioness of say a norovirus - with just things brought online.

Even in the 70s, nothing stopped people buying diesel , fertiliser probably high in chlorates for extra oxygen .
Of super fine metal powders

I take you point, but already lots of young teens self harmed due to social sex Picts/video extortions .

Unfortunately we have to find ways to combat it.
You will have people crying about their freedoms, their rights of expression, Yet we know for a fact that they would insist on searching peoples bags if they knew 1% coming into contact with their families had a semi-auto pistol holing 30 rounds plus with intent to murder their family.

Life is complicated

Main way to combat it is The Gulag Archipelago defense - so what , why do I care, F off I'm reporting you

Aleksandr Solzhenitsy wrote the only way people have power over you , is if you care to keep your finger nails intact , your family alive etc

Kids knew this in the play ground , kids grab you jacket to have you chase after it.
2 Strategies
1 ignore them walk away
2 Just start randomly beating the crap out of any kid you grab again ignoring the jacket ,which magically is now dropped
 
Just like with all the other fakery tech that has existed for millennia? Fake videos are just the next thing that some will perceive as the sky falling yet again, like people did in the past with previous tech. It's a wonder that civilization has been able to recreate itself from the ashes of its fiery death so many times.

https://www.theregister.com/2024/04/25/ai_voice_arrest/?td=rt-3a

here you go .. easy peazy .. faked audio to get some guy fired ...
 
Back