Something to look forward to: Researchers at Penn State University have demonstrated a new method of remote surveillance that enables the reconstruction of phone conversations using the subtle vibrations generated by a cellphone's earpiece. This technique, known as wireless tapping, uses millimeter wave radar sensors to detect and interpret these minute vibrations from distances of up to ten feet.

The team positioned a millimeter wave radar device, technology similar to that used in self-driving cars and advanced motion detectors, a few feet away from a smartphone. As speech played through the phone's earpiece, the radar detected the surface vibrations caused by the audio signal.

These vibrations are not perceptible to humans or nearby microphones but permeate the entire structure of the device. The radar's measurements, after careful preprocessing to reduce hardware and environmental noise, are analyzed using machine learning techniques.

Standard speech recognition systems are designed for clean, high quality audio and perform poorly when applied directly to noisy radar data. To address this, researchers adapted Whisper, an open source, large scale speech recognition model, using a method called low rank adaptation.

By retraining just one percent of the model's parameters, they specialized it for noisy radar signals, converting vibration measurements into text with an accuracy of up to 60 percent for a vocabulary as large as 10,000 words. Although this accuracy is still limited, even partial transcriptions or keyword recovery can be useful in real world eavesdropping scenarios.

This achievement builds on earlier work by the same group, which in 2022 showed that radar sensors could identify as many as 10 predetermined words, letters, or numbers with about 83 percent accuracy when the sensor was within a foot of the phone. Extending that work, the new method successfully extracts longer phrases and partial conversations from farther away.

// Related Stories Maglev trains' tunnel boom problem may finally have a fix

Researchers note that, as with lip reading, contextual clues could further improve the interpretation of partially accurate transcripts, meaning that even mistaken or incomplete outputs might suffice to discern the gist of sensitive discussions.

The authors emphasize that their laboratory setup was strictly for research and awareness, anticipating what malicious actors might one day attempt with miniaturized or concealed radar devices. They advise users to recognize this emerging privacy risk, particularly when discussing sensitive topics in settings where such surveillance might be possible.

The research, supported by the National Science Foundation, is published in the Proceedings of the 18th ACM Conference on Security and Privacy in Wireless and Mobile Networks.