Speech Enhancement in Multi-Source Environments

Kanwal Rehman; Insia Mukhtar; Muzammil Haider

doi:10.65492/01/201/2024/9

Authors

Kanwal Rehman Institute of Computing, Women University, Multan, Punjab, Pakistan.
Insia Mukhtar Institute of Computing, Women University, Multan, Punjab, Pakistan.
Muzammil Haider Institute of Computing, Women University, Multan, Punjab, Pakistan.

DOI:

https://doi.org/10.65492/01/201/2024/9

Keywords:

Speech improvement, noise, signal-to-noise ratio (SNR), Wiener filter, Perceptual Evaluation of Speech Quality (PESQ)

Abstract

Background noise, prevalent in real-world settings, can adversely af- fect speech communication for both people and automated equipment. Of several strategies, speech separation utilizing a single microphone is the most advantageous from an application perspective. The resultant monaural speech separation issue has been a pivotal concern in speech processing for numerous decades. Nonetheless, its success has been con- strained to date. This research develops speech separation systems uti- lizing combinations of time-frequency masking, deep neural networks, and model-based reconstruction. The objective of each system is to en- hance the perceived quality of the speech estimations. The efficacy of numerous speech processing applications is significantly compromised in the presence of both noise and reverberation. The proposed approach has been evaluated in a simulation environment, and the results in- dicate that voice enhancement can be effectively achieved through its integration. This study proposes two-stage noise reduction systems to diminish background noise in single microphone recordings with low signal-to-noise ratios, utilizing perfect binary masking and Wiener fil- tering techniques. It comprises two stages. Initially, a Wiener filter with an improved signal-to-noise ratio is employed for the reduction of background noise in noisy speech. Secondly, IBM is computed in each time-frequency channel by utilizing the pre-processed speech from the initial stage and aligning the time-frequency channels to a predeter- mined threshold to minimize residual noise. The channels that meet the threshold criteria are preserved, while all others are diminished.

Speech Enhancement in Multi-Source Environments

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

About

issn

CALL FOR PAPERS

Make a Submission

Open Access

Information