Speech Enhancement Using Deep Neural Networks


Authors : V. Sudha Rani; Dr. A. N. Satyanrayana; Aroju Santhosh; Maliha; Erravelly Sricharan

Volume/Issue : Volume 9 - 2024, Issue 4 - April

Google Scholar : https://tinyurl.com/mtzm7rfm

Scribd : https://tinyurl.com/mr22syrr

DOI : https://doi.org/10.38124/ijisrt/IJISRT24APR2694

Abstract : A comprehensive study is conducted to enhance audio quality in challenging noisy environments, departing from conventional approaches that target specific sound components. This paper focuses on a modified U-Net architecture integrat- ing broader audio features and implementing a probabilistic framework for direct spectral content reconstruction. Multiple variants of this system were rigorously tested across diverse noise levels and reverberation conditions, with performance evaluation conducted using objective metrics such as SDR, signal-to-noise ratio, evaluation of voice, and intelligibility scores. The paper demonstrates that proposed enhanced U-Net architecture, characterized by strategically designed connections within its structure, consistently outperforms traditional audio enhancement methods across a range of noise scenarios. Notably,the improvements in audio quality were most pronounced in highly reverberant environments, where conventional techniques often struggle to deliver satisfactory results. These results high- light the effectiveness of our novel approach in significantly enhancing audio fidelity and intelligibility, particularly in real- world noisy conditions.

Keywords : Audio Enhancement, Noisy Environments, U-Net Architecture, Spectral Content Reconstruction, SDR, SNR.

References :

  1. F. Rund, V. Vencovsky, and M. Semansk ´ y, “An evalu-ation of click detection algorithms against the results of listening tests,” J. Audio Eng. Soc., vol. 69, no. 7/8, pp. 586–593, July/Aug. 2021.
  2. H. T. de Carvalho, F-R. Avila, and L. W. P. Biscainho, “Bayesian restoration of audio degraded by low frequency pulses modeled via Gaussian process,” IEEE J. Selected Topics Signal Process., vol. 15, no. 1, pp. 90–103, Oct. 2021.
  3. J. Berger, R. R. Coifman, and M. J. Goldberg, “Removing noise from music using local trigonometric bases and wavelet packets,” J. Audio Eng. Soc., vol. 42, no. 10, pp. 808–818, Oct. 1994.
  4. P. A. A. Esquef, “Audio restoration,” in Handbook of Signal Processing in Acoustics, pp. 773–784. Springer, New York, NY, USA, 2008.
  5. S. Boll, “Suppression of acoustic noise in speech using spectral subtrac- tion,” IEEE Trans. Acoust. Speech Signal Process., vol. 27, no. 2, pp. 113–120, Apr. 1979.
  6. S. J. Godsill and P. J. W. Rayner, Digital Audio Restoration - A Statistical Model Based Approach, Springer, 1998.
  7. Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error log- spectral amplitude estimator,” IEEE Trans. Acoust. Speech Signal Process., vol. 33, no. 2, pp. 443–445, Apr. 1985

A comprehensive study is conducted to enhance audio quality in challenging noisy environments, departing from conventional approaches that target specific sound components. This paper focuses on a modified U-Net architecture integrat- ing broader audio features and implementing a probabilistic framework for direct spectral content reconstruction. Multiple variants of this system were rigorously tested across diverse noise levels and reverberation conditions, with performance evaluation conducted using objective metrics such as SDR, signal-to-noise ratio, evaluation of voice, and intelligibility scores. The paper demonstrates that proposed enhanced U-Net architecture, characterized by strategically designed connections within its structure, consistently outperforms traditional audio enhancement methods across a range of noise scenarios. Notably,the improvements in audio quality were most pronounced in highly reverberant environments, where conventional techniques often struggle to deliver satisfactory results. These results high- light the effectiveness of our novel approach in significantly enhancing audio fidelity and intelligibility, particularly in real- world noisy conditions.

Keywords : Audio Enhancement, Noisy Environments, U-Net Architecture, Spectral Content Reconstruction, SDR, SNR.

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.
Subscribe
OR

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.
Subscribe