Speech Enhancement in Multi-Source Environments
Keywords:
Speech improvement, noise, signal-to-noise ratio (SNR), Wiener filter, Perceptual Evaluation of Speech Quality (PESQ)Abstract
Background noise, prevalent in real-world settings, can adversely af- fect speech communication for both people and automated equipment. Of several strategies, speech separation utilizing a single microphone is the most advantageous from an application perspective. The resultant monaural speech separation issue has been a pivotal concern in speech processing for numerous decades. Nonetheless, its success has been con- strained to date. This research develops speech separation systems uti- lizing combinations of time-frequency masking, deep neural networks, and model-based reconstruction. The objective of each system is to en- hance the perceived quality of the speech estimations. The efficacy of numerous speech processing applications is significantly compromised in the presence of both noise and reverberation. The proposed approach has been evaluated in a simulation environment, and the results in- dicate that voice enhancement can be effectively achieved through its integration. This study proposes two-stage noise reduction systems to diminish background noise in single microphone recordings with low signal-to-noise ratios, utilizing perfect binary masking and Wiener fil- tering techniques. It comprises two stages. Initially, a Wiener filter with an improved signal-to-noise ratio is employed for the reduction of background noise in noisy speech. Secondly, IBM is computed in each time-frequency channel by utilizing the pre-processed speech from the initial stage and aligning the time-frequency channels to a predeter- mined threshold to minimize residual noise. The channels that meet the threshold criteria are preserved, while all others are diminished.