How To Fool an Eavesdropping AI … With Another AI

Photo credit: PM Images - Getty Images
Photo credit: PM Images - Getty Images
  • A Columbia University team has created an artificial intelligence (AI) system that effectively confuses eavesdropping tools that also use AI.

  • The new system used machine learning in a lab setting, but researchers hope it can be employed in real situations.

  • The system works far better than simple white noise to disguise conversations.


Can one artificial intelligence tool outsmart another?

Scientists at Columbia University in New York City think they’ve devised an AI that can effectively fool an eavesdropping automatic speech recognition system from transcribing your private conversation. So in the future, you may not have to worry that someone is using spyware to record your phone calls, or that your Alexa is listening in when it shouldn’t be.

👾 New technologies are amazing. Let’s nerd out over them together.

Their Neural Voice Camouflage system prevents eavesdroppers from secretly transcribing your audio conversation by piggybacking a custom static-type noise over your speech. The noise is set to the same volume as normal background noise—no louder than a regular background air conditioning unit—so people you’re talking to can still easily make out what you’re saying. However, the automatic speech recognition system (ASR) that’s attempting to eavesdrop will get confused and produce a Gobbledygook transcription, as you can see in the demonstration below:

This process of producing a custom background noise is more complicated than it seems. It’s not like turning on a faucet to produce white noise, as a character in a TV show might do to prevent the hidden microphones planted by the bad guys from catching what they say.

Instead, researchers turned to machine learning to train their system to find patterns in the audio data of people’s speech, and then make predictions about what they would say next. Based on the predicted words, the system generated the most effective noises to block comprehension by an enemy AI, project lead Mia Chiquier says in a Science news article. Researchers trained the Neural Voice Camouflage with hours upon hours of speech that they recorded during the project, which the system processed continuously, two milliseconds at a time. The team presented their research in a conference paper for peer review at the International Conference on Learning Representations in late April 2022.

Streaming audio presents a particular challenge because the software has to outsmart an eavesdropping AI in real time. Audio sampling rates—how many samples of audio are recorded every second—are at least 16 kilohertz, according to Chiquier’s project page, which means any successful disruption of the audio stream must happen “within milliseconds, which is currently infeasible.” Researchers also had to make sure that the disruptive background noise was loud enough to reach eavesdropping microphones—to carry the same distance as the voice it was trying to camouflage.

The system’s defense weapon is the use of “predictive attacks” to ensure that any words spoken in real time are shrouded enough to prevent ASR systems from working properly. The system takes into account what a person is saying to generate confusing noise. While it can’t know for sure what the speaker will say next, the system predicts a few possible phrases.

For example, in one test, the speaker says, “He doesn’t say, but it’s on the frontier and on the map,” while the spying AI transcribes the speaker as saying, “O doesn’t say but it’s on the front hir eser and on thelag.”

White noise, which you may be familiar with as steady rain or the droning of a fan, is a sound that contains all audible frequencies. It tends to mask other sounds, but it doesn’t compare to the high word-error rate the Neural Voice Camouflage can generate in an eavesdropping transcription AI. The same sentence is more comprehensible when white noise is used: “He doesn’t say, but it’s on the front hier and on the map.”

The word-error rate with the research team’s software was 80.2 percent, compared to 11.3 percent using white noise. Even an eavesdropping AI system that the researchers trained to transcribe audio speech from the Neural Voice Camouflage didn’t perform well, with an error rate of 52.5 percent.

Chiquier hopes to expand her research to protect all forms of privacy that are potentially challenged by AI technology, such as the unauthorized use of facial recognition software, she tells Science. And the predictive AI component of the team’s research could be used for self-driving cars, she adds, since they require real-time processing, like anticipating and avoiding pedestrians.

You Might Also Like