That smart home speaker isn’t listening to everything you say, according to new research – but it is listening a lot more than it should. Researchers have found some speakers activating by mistake up to 19 times each day.
Virtual assistants like Siri and Alexa are programmed not to listen to your conversation constantly. Instead, they listen for a ‘wake phrase’. When they hear it, it’s their cue to listen to what you subsequently say, which could be an instruction or a request. Google Assistant responds to “OK Google”, Apple’s Siri perks up when you say “Hey Siri” and Microsoft’s Cortana pricks up its digital ears when you say “Hey Cortana”.
The problem is that just like humans, virtual assistants often mishear things. Siri might think that “Seriously” sounds enough like its wake word to start listening to what you’re saying, but that’s just one of a range of sounds that might trigger it. That’s why it’s been reported recording everything from sex to criminal deals.
Until now, we haven’t known just how (in)accurate these voice assistants are at listening for wake phrases. Thanks to research by academics at Northeastern University and Imperial College London, now we do. It turns out they’re not that accurate at all.
The researchers wanted to simulate real-world conditions, so they set up a variety of smart speakers with embedded virtual assistants and played them 125 hours of audio from various Netflix shows ranging from The Office to The Big Bang Theory and Narcos. They tested the first generation Google Home Mini, Apple’s first-generation HomePod, Amazon’s second- and third-generation Echo Dot, and the Harman Kardon Invoke, which has Microsoft’s Cortana embedded.
The researchers detected when speakers were recording by capturing video feeds to determine whether their lights activated, and by monitoring the network to spot any traffic that they were sending back to the cloud. They also checked their cloud accounts to watch for any self-reported recordings.
They found that devices would activate up to 19 times each day on average. The HomePod device was the worst, with an over-enthusiastic Siri switching on for lots of phrases. Speech that triggered it started with “Hi” or “Hey” followed by something starting with something sounding like an “S” and a vowel, or something that sounds like “ri”. Examples of speech that set it off included “He clearly”, “Hey sorry” or “I’m sorry”, and “Okay, yeah”, so watch who you’re apologising to or agreeing with. Even “historians” would set it off.
When the devices did wake up, they’d often do so for relatively long periods. The HomePod and the Echos would wake up for at least six seconds more than half the time. The second-generation Echo Dot and the Harman Kardon speaker had the longest activations, earwigging for between 20 and 43 seconds.
Amazon’s Echo Dot 3 mistakenly woke up the fewest times, and has by far the widest range of wake-up phrases. You have to set the chosen wake word in advance, so we can assume the researchers ran the test using each wake word – “Alexa”, “Amazon”, “Echo”, or “Computer”.
… we found activations with words that contain “k” and sound similar to “Alexa,” such as “exclamation”, “kevin’s car”, “congresswoman”
An “Amazon”-enabled Dot did apparently wake up when it heard “My pants on” which could be potentially, um, embarrassing, depending on the context.
Every show caused at least one device to wake up, and most shows woke up multiple devices. However, the results were mostly inconsistent. The team experimented with each device 12 times (other than the Harman Kardon speaker, which only got four tests). Only 8.44% of the activations occurred consistently across 75% of the tests. The researchers said:
This could be due to some randomness in the way smart speakers detect wake words, or the smart speakers may learn from previous mistakes and change the way they detect wake words.
That inconsistency compounds a known problem with AI-driven devices; they’re opaque. AI algorithms can’t explain what they do. They’re black boxes that produce results based on statistical models. There isn’t a procedural set of instructions that you can follow to predict their results. It’s a problem that distances us from the tech, putting it outside our complete control.
There were some upsides, though. Despite some past incidents, they found no evidence that these devices were always recording peoples’ conversations in their tests.
The good news is that you can turn off active listening on many of these devices, although doing so might leave with you with a relatively expensive bluetooth speaker unless your hardware has an alternative tap-to-talk option. In the meantime, be careful what you say – particularly immediately after mentioning Radiohead’s ground-breaking third studio album “OK Computer”.
Beratung Consulting are dedicated to Security solutions and are a trusted Sophos Partner.