Comprehensive modulation representation for automatic speech recognition
Date of Original Version
We present a new feature representation for speech recognition based on both amplitude modulation spectra (AMS) and frequency modulation spectra (FMS). A comprehensive modulation spectral (CMS) approach is defined and analyzed based on a modulation model of the band-pass signal. The speech signal is processed first by a bank of specially designed auditory band-pass filters. CMS are extracted from the output of the filters as the features for automatic speech recognition (ASR). A significant improvement is demonstrated in performance on noisy speech. On the Aurora 2 task the new features result in an improvement of 23.43% relative to traditional mel-cepstrum front-end features using a 3 GMM HMM back-end. Although the improvements are relatively modest, the novelty of the method and its potential for performance enhancement warrants serious attention for future-generation ASR applications.
9th European Conference on Speech Communication and Technology
Wang, Yadong, Steven Greenberg, Jayaganesh Swaminathan, Ramdas Kumaresan, and David Poeppel. "Comprehensive modulation representation for automatic speech recognition." 9th European Conference on Speech Communication and Technology , (2005): 3025-3028. https://digitalcommons.uri.edu/ele_facpubs/658