Authors :
Dr. N. M. Balamurugan; Abiney Yadav R.; Akash S.; Gowthaman J.; R. Anitha
Volume/Issue :
Volume 11 - 2026, Issue 1 - January
Google Scholar :
https://tinyurl.com/6f5xd46b
Scribd :
https://tinyurl.com/57nkr7yf
DOI :
https://doi.org/10.38124/ijisrt/26jan1055
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Abstract :
Modern speech-controlled systems rely on Voice Activity Detection (VAD) as the critical gatekeeper in speech
processing pipelines. Although adversarial attacks on Automatic Speech Recognition (ASR) have been extensively studied, VAD
security remains largely unexplored, exposing a fundamental vulnerability. This paper introduces Silent Deception, a Universal
Adversarial Perturbation (UAP) framework designed to force VAD models to misclassify active speech as silence. By targeting
VAD rather than ASR, the proposed attack achieves effective speech pipeline disruption, creating a silent denial-of-service (DoS)
condition where downstream components never receive valid input. The UAPs are crafted using gradient-based optimization on
Silero VAD and WebRTC VAD, maximizing the False Negative Rate (FNR) while strictly preserving perceptual quality.
Evaluation demonstrates a 90% bypass success rate and significant ASR degradation, measured via Word Error Rate (WER).
This work highlights the urgent need for adversarial robustness in VAD systems as a primary defense capability in next-
generation speech pipelines.
Keywords :
Voice Activity Detection, Universal Adversarial Perturbations, Speech Security, Adversarial Machine Learning, Silent Denial-of-Service, Automatic Speech Recognition, Spectral Preservation.
References :
- L. Schönherr, K. Kohls, S. Zeiler, T. Holz, and D. Kolossa, “Adversarial Attacks Against Automatic Speech Recognition Systems via Psychoacoustic Hiding,” in Proc. Network and Distributed System Security Symposium (NDSS), San Diego, CA, USA, Feb. 2019.
- N. Carlini and D. Wagner, “Audio Adversarial Examples: Targeted Attacks on Speech-to-Text,” in 2018 IEEE Security and Privacy Workshops (SPW), San Francisco, CA, USA, May 2018, pp. 1–7.
- X. Yuan et al., “CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition,” in Proc. 27th USENIX Security Symposium (USENIX Security 18), Baltimore, MD, USA, Aug. 2018, pp. 49–64.
- L. Schönherr et al., “IMPERIO: Robust Over-the-Air Adversarial Examples for Automatic Speech Recognition Systems,” in Proc. Annual Computer Security Applications Conference (ACSAC), Virtual Event, Dec. 2020, pp. 843–855.
- T. Du et al., “SirenAttack: Generating Adversarial Audio for End-to- End Acoustic Systems,” in Proc. 15th ACM Asia Conf. on Computer and Communications Security (ASIA CCS), Taipei, Taiwan, Oct. 2020, pp. 357–369.
- A. Ettenhofer, J.-P. Schulze, and K. Pizzi, “An Integrated Algorithm for Robust and Imperceptible Audio Adversarial Examples,” in Proc. 3rd Symposium on Security and Privacy in Speech Communication (SPSC), Dublin, Ireland, Aug. 2023, pp. 22–29.
- X. Li et al., “Inaudible Adversarial Perturbation: Manipulating the Recognition of User Speech in Real Time,” in Proc. Network and Distributed System Security Symposium (NDSS), San Diego, CA, USA, Feb. 2024.
- P. Neekhara et al., “Universal Adversarial Perturbations for Speech Recognition Systems,” in Proc. 20th Annual Conf. of the International Speech Communication Association (Interspeech), Graz, Austria, Sep. 2019.
- Y. Qin et al., “Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition,” in Proc. 36th Int. Conf. on Machine Learning (ICML), Long Beach, CA, USA, Jun. 2019, pp. 5231–5240.
- G. Qi, Y. Luo, Y. Li, H. Zhu, F. Zhang, and H. Wu, “TransAudio: Towards the Transferable Adversarial Audio Attack via Learning Contextualized Perturbations,” in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, Jun. 2023, pp. 1–5.
- Z. Sun, R. Tang, Y. Cheng, and P. Wang, “CommanderUAP: Practical and Transferable Universal Adversarial Attacks on Speech Recognition Models,” Cybersecurity, vol. 7, no. 1, art. 38, 2024.
- Y. Wang, Y. Luo, S. Fu, Z. Qiu, and L. Liu, “Diffusion-based Adversarial Attack to Automatic Speech Recognition,” in Proc. 16th Asian Conference on Machine Learning (ACML), Hanoi, Vietnam, 2024, pp. 889–904.
- G. Zhang et al., “LaserAdv: Laser Adversarial Attacks on Speech Recognition Systems,” in Proc. 33rd USENIX Security Symposium (USENIX Security 24), Philadelphia, PA, USA, Aug. 2024, pp. 3945– 3958.
- G. Chen, Z. Zhao, F. Song, S. Chen, L. Fan, F. Wang, and J. Wang, “Towards Understanding and Mitigating Audio Adversarial Examples for Speaker Recognition,” IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 5, pp. 3970–3987, Sep.-Oct. 2023.
Modern speech-controlled systems rely on Voice Activity Detection (VAD) as the critical gatekeeper in speech
processing pipelines. Although adversarial attacks on Automatic Speech Recognition (ASR) have been extensively studied, VAD
security remains largely unexplored, exposing a fundamental vulnerability. This paper introduces Silent Deception, a Universal
Adversarial Perturbation (UAP) framework designed to force VAD models to misclassify active speech as silence. By targeting
VAD rather than ASR, the proposed attack achieves effective speech pipeline disruption, creating a silent denial-of-service (DoS)
condition where downstream components never receive valid input. The UAPs are crafted using gradient-based optimization on
Silero VAD and WebRTC VAD, maximizing the False Negative Rate (FNR) while strictly preserving perceptual quality.
Evaluation demonstrates a 90% bypass success rate and significant ASR degradation, measured via Word Error Rate (WER).
This work highlights the urgent need for adversarial robustness in VAD systems as a primary defense capability in next-
generation speech pipelines.
Keywords :
Voice Activity Detection, Universal Adversarial Perturbations, Speech Security, Adversarial Machine Learning, Silent Denial-of-Service, Automatic Speech Recognition, Spectral Preservation.