You have a podcast or YouTube video playing at home and, without realizing it, it starts playing a sound that is undetectable to you, but is sending commands to your AI assistants. The assistant then begins to share sensitive data with the attacker or installs malware. We already had the prompt injection attacks and now comes the injection of sound prompts.
The experiment. It sounds like science fiction, but it is perfectly possible. A team of researchers from China and Singapore have discovered a way to create malicious sounds that can “hijack” voice AI models, causing them to execute commands without you knowing or being able to stop them. In statements to IEEE Spectrumthe leader of the study assures that “It only takes half an hour to train this signal and, since it is context-independent, it can be used to attack a model whenever you want, regardless of what the user says.”
The authors tested this technique against thirteen AI models, including services from Microsoft and Mistral. In the test they had these models perform sensitive searches, send emails with user information and download files. They achieved a success rate of between 79 and 96%.
Undetectable. LALMs (Large Audio Language Models) have a critical security flaw. Since they receive instructions in audio format, it is possible to inject malicious commands into manipulated sounds. Worst of all, these sounds are not voices with instructions, which would be fairly easy to detect, but rather they use a method called “convolutional mixing” that masquerades the sound as a natural reverb or echo in the room.
Why it is important. An attack of this type completely changes the defenses that we have internalized (do not click on links, do not download things, do not give out your data…). Something as harmless as playing a YouTube video, a podcast, or watching a TikTok in the background can trigger an attack without us even realizing it. If we also take into account that the power of AI agents, such as the recently announced Gemini Sparkis precisely having access to our entire digital life, an attack of this type can wreak havoc.
Hijacking attention. Pre-instructing the model with examples of malicious commands so that it ignores them barely reduces attack success by a dismal 7%. Similarly, asking the AI to “reflect” on whether its response matches what the user has actually asked for only manages to detect 28% of attacks. Current security measures are useless because manipulated audio hijacks the model’s mathematical “attention,” inducing the AI to execute high-confidence outputs and making it impossible to distinguish between a legitimate user command and an adversary attack.
Open source. The “good” part is that at the moment this type of attack has only been able to be carried out with open weight models. However, researchers have seen that once malicious audio is trained, it can be transferred to breach closed models.
As we said, the authors put it to the test with services from Mistral and Microsoft. At the moment Mistral has not commented, but Microsoft sent the following statement to IEEE Spectrum:
We appreciate the work of the researchers to deepen the understanding of this type of technique. This study assesses model resilience through controlled, direct interactions with the model itself, helping to define our approach to building resilience. In practice, AI models are often integrated into user applications, and we provide developers with tools and guidance they can use to implement additional layers of protection to help safeguard users.
Image | Yassine Ait TahitUnsplash
In Xataka | The most used passwords in Spain are hacked in seconds: if yours is on this list, you have a problem

GIPHY App Key not set. Please check settings