Decomposition of a bandpass signal and its applications to speech processing
Document Type
Conference Proceeding
Date of Original Version
12-1-2003
Abstract
We have developed a novel approach to speech feature extraction based on a modulation model of a band-pass signal. Speech is processed by a bank of band-pass filters. At the output of the band-pass filters the signal is subjected to a log-derivative operation which naturally decomposes the band-pass signal into analytic (called α̇̌(t) + jα̇̌(t)) and anti-analytic (called β̇(t) -jβ̌(t)) components. The average instantaneous frequency (AIF) and average log-envelope (ALE) are then extracted as coarse features at the output of each filter. We indicate how further refined features may also be extracted from the analytic and anti-analytic components. We then evaluated the feature extraction procedure on the Aurora 2 task where noise corruption is synthetic. For clean training, (compared to the mel-cepstrum front end, with 3 mixture EMM back-end) our AIF/ALE front end achieves an average improvement of 13.97% with set A and 17.92% improvement with set B and -31.72% (negative) 'improvement' with set C. The overall improvement in accuracy rates for clean training is 7.97%. Although the improvements are modest, the novelty of the front-end and its potential for future enhancements are our strengths.
Publication Title, e.g., Journal
Conference Record of the Asilomar Conference on Signals, Systems and Computers
Volume
2
Citation/Publisher Attribution
Kumaresan, Ramdas, Gopi K. Allu, Jayaganesh Swaminathan, and Yadong Wang. "Decomposition of a bandpass signal and its applications to speech processing." Conference Record of the Asilomar Conference on Signals, Systems and Computers 2, (2003): 2078-2082. https://digitalcommons.uri.edu/ele_facpubs/660