Robust Speech Recognition Based on Mixed Histogram Transform and Asymmetric Noise Suppression

Farsi, Hassan; Kuhimoghadam, Samana

Document Type : Review Article

Authors

Hassan Farsi ¹
Samana Kuhimoghadam ²

¹ University of Birjand

² Department of Engineering, University of payam noor, Mashhaad

Abstract

This paper proposes a new feature extraction algorithm which is robust against noise using histogram compensation and asymmetric filter. Temporal masking would be provided to improve ASR systems specifically in matched and multistyle training conditions. Nonlinear filtering and temporal masking are used in this algorithm. By matching the power histograms of the input in each frequency band to those obtained over clean training data, and then mixing together the processed and unprocessed spectra can be increased appropriately speech recognition accuracy. Obtaining results show that recognition accuracy in compare with MFCC, PLP and PNCC has been improved in various training conditions.

Keywords

References

[1] B. Atal, “Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification,” Journal of the Acoustical Society of America, vol. 55, pp. 1304–1312, 1974.

[2] P. Jain and H. Hermansky, “Improved mean and variance normalization for robust speech recognition,” in IEEE Int. Conf. Acoust., Speech and Signal Processing, 2001.

[3] X. Huang, A. Acero, and H-W Won, “Spoken Language Processing: A Guide to Theory, Algorithm, and System Development٫”Upper Saddle River, NJ: Prentice Hall, 2001.

[4] Y. Obuchi, N. Hataoka, and R. M. Stern, “Normalization of time-derivative parameters for robust speech recognition in small devices,” IEICE Transactions on Information and Systems, vol. 87-D, no. 4, pp. 1004-1011, 2004.

[5] C.Kim and R.M Stern,“Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring‚”in IEEE Int. Conf. On

[6] R. Balchandran and R.J. Mammone, “Non-parametric estimation and correction of non-linear distortion in speech systems,” in IEEE Int. Conf. on Acoust., Speech and Signal Processing, May 1998.

[7] S. Dharanipragada and M. Padmanabhan, “A nonlinear unsupervised adaptation technique for speech recognition,” in Int. Conf. on Spoken Language Processing, 2000, vol. 4, pp. 556–559.

[8] A. de la Torre et al., “Non-linear transformations of the feature space for robust speech recognition,” in IEEE Int. Conf. on Acoust., Speech and Signal Processing, 2002, pp. 401–404.

[9] F. Hilger‚ “Quantile based histogram equalization for noise robust speech recognition,” Ph.D. thesis, Computer Science Department, RWTH Aachen University, Aachen, Germany, 2004.

[10] H. Hermansky and N. Morgan, “RASTA processing of speech,” IEEE. Trans. Speech Audio Process., vol. 2, no. 4, pp. 578–58, 1994.

[11] B. E. D. Kingsbury, N. Morgan, and, S. Greenberg, “Robust speech recognition using the modulation spectrogram,” Speech Communication, vol. 25, no. 1–3, pp. 117–132, 1998.

[12] H. G. Hirsch and C. Ehrlicher, “Noise estimation techniques or robust speech recognition,” in IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, May 1995, pp. 153–156.

[13] C. Kim and R. M. Stern, “Nonlinear enhancement of onset for robust speech recognition,” in INTERSPEECH-2010, Sept. 2010, pp. 2058–2061.

[14] S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoust., Speech and Signal Processing, vol. 27, no. 2, pp. 113–120, 1979.

[15] C. Kim and R. M. Stern, “Power function-based power distribution normalization algorithm for robust speech recognition,” in IEEE Automatic Speech Recognition and Understanding Workshop, pp.188-193, Dec. 2009.

[16] R.C. Gonzalez and R.E.Woods,“ Digital Image Processing, Pearson Prentice Hall, Upper Saddle Ridge, NewJersey, third edition, 2008.

[17] C. Kim, K. Kumar, and R.M. Stern, “Robust speech recognition using a small power boosting algorithm,” in IEEE Automatic Speech Recognition and Understanding Workshop, December 2009.

[18] M. Bijankhan and J. Sheikhzadegan, “FARSDAT – The Speech Database of Farsi Spoken Language,” Proc. 5th Australian Int. Conf. On Speech Science & Tech., vol. 2, pp. 826-831, 1994.

[19] SPIB, SPIB noise data. Available from:

Majlesi Journal of Electrical Engineering

Robust Speech Recognition Based on Mixed Histogram Transform and Asymmetric Noise Suppression

References

References

Volume 7, Issue 2 - Serial Number 2
June 2013
Pages 1-11

Robust Speech Recognition Based on Mixed Histogram Transform and Asymmetric Noise Suppression

References

References

Volume 7, Issue 2 - Serial Number 2June 2013Pages 1-11

Volume 7, Issue 2 - Serial Number 2
June 2013
Pages 1-11