Google’s DeepMind artificial intelligence program may be best known for building AplhaGO, which beat one of the world’s best Go players, but the technology has numerous applications in the field of science and could prove especially helpful to the hearing impaired.
Researchers from Oxford University and DeepMind teamed up to create an AI system trained using 5000 hours of BBC videos, which contained 118,000 sentences. It managed to outperform a professional lip-reader who provides services for UK courts.
When shown a random sample of 200 videos from BBC broadcasts, the human lip-reader was able to decipher less than a quarter of the spoken words. But when the AI system was tested using the same data set, it deciphered almost half the words and could make out entire complex phrases.
Additionally, the machine was able to annotate 46 percent of the words without error, whereas the professional only managed around 12 percent. Most of the AI’s mistakes were minor, like missing the ‘s’ from the end of words.
Two weeks ago, another deep learning system that can read lips was developed at the University of Oxford. LipNet was also able to beat a human when it came to accurately reading lips, though the data set used in this instance, called GRID, contained only 51 unique words, whereas the BBC data contains nearly 17,500, according to New Scientist.
GRID also used well-lit videos of people facing the camera and reading three seconds worth of words. After showing the AI 29,000 videos, it had an error rate of just 6.6 percent, while humans that were tested using 300 similar videos had an average error rate of 47.7 percent.
Researchers say the system could find use in mobile technologies, virtual assistants, and for general speech recognition tasks. It could also be invaluable in helping deaf and hearing-impaired people understand others.
"A machine that can lip read opens up a host of applications: 'dictating' instructions or messages to a phone in a noisy environment; transcribing and redubbing archival silent films; resolving multi-talker simultaneous speech; and, improving the performance of automated speech recognition in general," wrote the researchers in their paper.