Decomposition of a bandpass signal and its applications to speech processing
Date of Original Version
We have developed a novel approach to speech feature extraction based on a modulation model of a band-pass signal. Speech is processed by a bank of band-pass filters. At the output of the band-pass filters the signal is subjected to a log-derivative operation which naturally decomposes the band-pass signal into analytic (called α̇̌(t) + jα̇̌(t)) and anti-analytic (called β̇(t) -jβ̌(t)) components. The average instantaneous frequency (AIF) and average log-envelope (ALE) are then extracted as coarse features at the output of each filter. We indicate how further refined features may also be extracted from the analytic and anti-analytic components. We then evaluated the feature extraction procedure on the Aurora 2 task where noise corruption is synthetic. For clean training, (compared to the mel-cepstrum front end, with 3 mixture EMM back-end) our AIF/ALE front end achieves an average improvement of 13.97% with set A and 17.92% improvement with set B and -31.72% (negative) 'improvement' with set C. The overall improvement in accuracy rates for clean training is 7.97%. Although the improvements are modest, the novelty of the front-end and its potential for future enhancements are our strengths.
Conference Record of the Asilomar Conference on Signals, Systems and Computers
Kumaresan, Ramdas, Gopi K. Allu, Jayaganesh Swaminathan, and Yadong Wang. "Decomposition of a bandpass signal and its applications to speech processing." Conference Record of the Asilomar Conference on Signals, Systems and Computers 2, (2003): 2078-2082. https://digitalcommons.uri.edu/ele_facpubs/660