Robust Voice Activity Detection for Interview Speech in NIST Speaker Recognition Evaluation

The introduction of interview speech in recent NIST Speaker Recognition Evaluations (SREs) has necessitated the development of robust voice activity detectors (VADs) that can work under very low signal-to-noise ratio. This paper highlights the characteristics of interview speech files in NIST SREs and discusses the difficulties of detecting speech/non-speech segments in these files. To alleviate these difficulties, this paper proposes a VAD that uses noise reduction as a pre-processing step. A strategy to avoid the undesirable effects of impulsive signals and sinusoidal background-signals on the VAD is also proposed. The proposed VAD is compared with the VAD in the ETSI-AMR speech coder for removing silence regions of interview speech files. The results show that the proposed VAD is more robust in detecting speech segments under very low SNR

Click here for free

download this paper

- -