Decomposition of a bandpass signal and its applications to speech processing

Document Type

Conference Proceeding

Date of Original Version

12-1-2003

Abstract

We have developed a novel approach to speech feature extraction based on a modulation model of a band-pass signal. Speech is processed by a bank of band-pass filters. At the output of the band-pass filters the signal is subjected to a log-derivative operation which naturally decomposes the band-pass signal into analytic (called α̇̌(t) + jα̇̌(t)) and anti-analytic (called β̇(t) -jβ̌(t)) components. The average instantaneous frequency (AIF) and average log-envelope (ALE) are then extracted as coarse features at the output of each filter. We indicate how further refined features may also be extracted from the analytic and anti-analytic components. We then evaluated the feature extraction procedure on the Aurora 2 task where noise corruption is synthetic. For clean training, (compared to the mel-cepstrum front end, with 3 mixture EMM back-end) our AIF/ALE front end achieves an average improvement of 13.97% with set A and 17.92% improvement with set B and -31.72% (negative) 'improvement' with set C. The overall improvement in accuracy rates for clean training is 7.97%. Although the improvements are modest, the novelty of the front-end and its potential for future enhancements are our strengths.

Publication Title, e.g., Journal

Conference Record of the Asilomar Conference on Signals, Systems and Computers

Volume

2

This document is currently not available here.

Share

COinS