Researchers show off generated video of Obama that uses lip-syncing AI tech

midian182

Posts: 9,745   +121
Staff member

Turning an audio clip into a realistic video of a person speaking those words usually doesn’t turn out well, but researchers at the University of Washington have developed a system that produces some amazing results.

The team demonstrated their work by generating a surprisingly lifelike video of Barack Obama. They trained a neural network by feeding it 14 hours of the former President’s weekly address videos. It was then able to create mouth shapes that synced with the audio clips of Obama talking about different topics. Finally, these were superimposed and blended over a clip of his face from another video (not the same one as the audio source).

The process is almost entirely automated, and while the end result isn’t quite 100 percent perfect, you can see in the video below that it’s not far off.

“These type of results have never been shown before,” said Dr Ira Kemelmacher-Shlizerman, an assistant professor at the UW’s Paul G. Allen School of Computer Science & Engineering. “Realistic audio-to-video conversion has practical applications like improving video conferencing for meetings, as well as futuristic ones such as being able to hold a conversation with a historical figure in virtual reality by creating visuals just from audio.”

Previous attempts to create realistic video from audio has resulted in the “uncanny valley” issue. A term that describes manufactured human likenesses that can’t quite pass as being a real person, making them appear a bit creepy.

“People are particularly sensitive to any areas of your mouth that don’t look realistic,” said lead author Dr Supasorn Suwajanakorn. “If you don’t render teeth right or the chin moves at the wrong time, people can spot it right away, and it’s going to look fake. So you have to render the mouth region perfectly.”

While the process should make it easier to animate characters that appear in movies, TV shows, and game cutscenes, it does have the potential to be used for nefarious purposes, such as creating clips of political figures giving fake statements. But right now, it’s not easy to produce as it requires many hours of source material, and the same system could be used to identify fake videos.

Permalink to story.

 
Didn't they demo this last year with Trump?

Gotta plug Presbo, I suppose.
I was just about to say that it's odd they didn't try this with Trump. There's sure a lot of mentally ill people trying to get words in his mouth.
 
I wasn't going to, but since there was already a reference to Trump, if it wasn't for the teleprompter,
Obama would never utter more than ummmm...ahhhh...ummmmmm....ahhhhhh
 
Sadly, this will pretty much end the allowance of video testimony in courts. All the defense has to do is bring forth the existence of this technology through objections of falsified or tampered evidence. Of course, for anyone falsely incriminated by the military, govt. agencies, etc. it makes a great defense. The military and CIA were doing this many decades (Vietnam era) ago to manipulate people and it was highly effective. Seeing a lip synced video would have to be a slam dunk for most jury pools that are unaware of the new technology ......
 
Very impressive. The lower jaw and moth region are still a little in uncanny valley for me.
 
Sadly, this will pretty much end the allowance of video testimony in courts. All the defense has to do is bring forth the existence of this technology through objections of falsified or tampered evidence. Of course, for anyone falsely incriminated by the military, govt. agencies, etc. it makes a great defense. The military and CIA were doing this many decades (Vietnam era) ago to manipulate people and it was highly effective. Seeing a lip synced video would have to be a slam dunk for most jury pools that are unaware of the new technology ......

The rules of evidence are created to prevent the introduction of such evidence or records without proving relevance and laying a proper foundation for their introduction. Even then, they can still be considered for weight (in other words, even if true, it might not mean much).

I'm far more worried about this being used for propaganda purposes during tense domestic or international events. The local media's reporting that Chavez was killed, with selective video clips, in a 2003 coup created greater riots (He wasn't killed -- but the media was run by conservatives that wished for his downfall). Same goes for numerous other historical events that could have turned with further agitation (the Cuban Missile Crisis being a classic).

In the past we were worried about false audio editing. Which is pretty simple. False video evidence is much more disturbing.
 
Back