A hot potato: Algorithms have vastly improved human voice recognition but it seems the technology still needs a lot of real human help to train itself, even if it comes at the cost of privacy. Similar to Alexa, a recent report sheds light on how Google trains its virtual assistant by paying contractors to transcribe audio clips, including those where the personal assistant wasn't even invoked.
Although virtual assistants remain widely popular, many are concerned with privacy risks associated with owning devices that listen to conversations from a corporate work space to a family living room. Following a report on how Amazon's Alexa works, it looks like Google is employing pretty much the same process for improving its algorithms.
VRT NWS, a Belgian public broadcaster, has published a report revealing how it was able to listen to more than a thousand excerpts recorded via Google Assistant and then confronted people with their recordings, which the broadcaster was able to find thanks to addresses and other sensitive information present in the audio clips.
People in Belgium and the Netherlands were taken by surprise when they heard themselves or a close relation in the recordings. "This is undeniably my own voice," said one man, while a couple was immediately able to recognize the voice of their son and their grandchild.
"Why is Google storing these recordings and why does it have employees listening to them? They are not interested in what you are saying, but the way you are saying it. Google’s computer system consists of smart, self-learning algorithms. And in order to understand the subtle differences and characteristics of the Dutch language, it still needs to learn a lot." notes VRT NWS adding that Google uses its online tool Crowdsource, in case the search engine has difficulty in analyzing a certain speech command.
While the tool is free for anyone to help Google get better at describing images and facial expressions, audio recordings are exclusively accessible to Google subcontractors who can log in to a secure part of the tool where a list of audio excerpts awaits their analysis. VRT NWS says that they have three sources who confirm this is how Google works. "Most recordings made via Google Home smart speakers are very clear. Recordings made with Google Assistant, the smartphone app, are of telephone quality. But the sound is not distorted in any way."
Although user information is de-linked from the audio excerpts to make them anonymous, the recordings themselves make it easy to know someone's identity, who can also be traced through their address or company name that the subcontractors look up on Google or Facebook. The publication also released a video report and says that out of the thousand excerpts they listened to, 153 had been recorded by mistake, meaning that the "Okay Google" command was not issued. These included "bedroom conversations, conversations between parents and their children, but also blazing rows and professional phone calls containing lots of private information."
An employee who transcribes such audio clips also tells of a woman whom he could identify was in "definite stress" but given the lack of guidelines regarding such cases, there wasn't much he could do. Although the rules do say that account numbers and passwords have to be marked as sensitive information. The recordings also reveal men to be looking for porn a lot, even via smart speakers, although that in itself is not entirely a revelation.
Google responded to the report admitting that this is how language experts help to improve its speech technology. "This happens by making transcripts of a small number of audio files" said the company's spokesperson for Belgium, adding that "this work is of crucial importance to develop technologies sustaining products such as the Google Assistant."
Google also claims that their language experts only judge about 0.2 percent of all audio fragments and that those aren't linked to any personal or identifiable information. Given that VRT NWS was able to confront people with their own recordings, the reality would appear to be quite different.