Document Type : Review Article

Author

Abstract

Endpoint detection, which means distinguishing speech and non-speech segments, is considered as one of the key preprocessing operations in automatic speech recognition (ASR) systems. Usually the energy of speech signal and Zero Crossing Rate (ZCR), are used to locate the beginning and ending for an utterance. Both of these methods have been shown to be effective for endpoint detection. However,  especially in a high noise environment they fail. In this paper, we integrate the modified Teager approach with the Energy-Entropy Features. In our new algorithm, the Teager Energy is used to determine crude endpoints, and the Energy-Entropy Features are used to make the final decision. The advantage of this method is that there is no need to estimate the background noise. Therefore, it is very helpful for environments when the beginning or ending noise is very strong or there is not enough “silence” at the beginning or at the end of the utterance. Experimental results on Farsi speech show that the accuracy of this algorithm is quite satisfactory and acceptable for speech endpoints detection. 

Keywords

[1] L.R. Rabiner, M.R. Sambur; “An Algorithm
for Determining the Endpoints of Isolated
Utterances ”, Bell Sjjsl. Tech. J., Vol. 54,
pp.297-315, 1975.
[2] [G.S. Ying, C.D. Mitchell, L.H. Jamieson;
“Endpoint Detection of Isolated Utterances
Based on A Modified Teager Energy
Measurement”. In Proc. IEEE ICASSP-92,
pp.732-735, 1992.
[3] H. Qiang, Z. Youwei; “On Prefiltering and
Endpoint Detection of Speech Signal”,
Proceedings of ICSP 1998 , pp.749-752, 1998.
[4] L.S.Huang, C.H.Yang; “A Novel Approach
to robust speech endpoint detection in car
environments”, ICASSP-2000, Vol. 3,
PP.1751-1754, 2000
[5] L. Gu and S.A. Zahorian; “A New Robust
Algorithm for Isolated Word Endpoint
Detection”, IV-4161 ICASSP, 2002.
[6] W.Han, C.F.Chan, C.S.Choy, K.P.Pun; “An
Efficient MFCC Extraction Method in
Speech Recognition”, ISCAS 2006. Volume ,
Issue , 4 pp, 2006.