Average instantaneous frequency (AIF) and average log-envelopes (ALE) for ASR with the aurora 2 database
Date of Original Version
We have developed a novel approach to speech feature extraction based on a modulation model of a band-pass signal. Speech is processed by a bank of band-pass filters. At the output of the band-pass filters the signal is subjected to a log-derivative operation which naturally decomposes the band-pass signal into analytic (called ? α(t)+j α) and anti-analytic (called ?β (t)-j β ) components. The average instantaneous frequency (AIF) and average log-envelope (ALE) are then extracted as coarse features at the output of each filter. Further, refined features may also be extracted from the analytic and anti-analytic components (but not done in this paper). We then evaluated the Aurora 2 task where noise corruption is synthetic. For clean training, (compared to the mel-cepstrum front end, with 3 mixture HMM back-end,) ourAIF/ALE front end achieves an average improvement of 13.97% with set A and 17.92% improvement with set B and -31.72% (negative) 'improvement' with set C. The overall improvement in accuracy rates for clean training is 7.97%. Although the improvements are modest, the novelty of the frontend and its potential for future enhancements are our strengths.
EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology
Wang, Yadong, Jesse Hansen, Gopi Krishna Allu, and Ramdas Kumaresan. "Average instantaneous frequency (AIF) and average log-envelopes (ALE) for ASR with the aurora 2 database." EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology , (2003): 25-28. https://digitalcommons.uri.edu/ele_facpubs/661