On decomposing speech into modulated components

Document Type


Date of Original Version



We model a segment of filtered speech signal as a product of elementary signals as opposed to a sum of sinusoidal signals. Using this model, one can better appreciate the basic relationships between envelopes and phases or instantaneous frequencies (IF's) of signals. These relationships reveal some interesting properties of the signal's modulations. For instance, if the contribution due to a signal's envelope, specifically the Hilbert transform of its log-envelope, is removed from the signal's phase, then the resulting signal's IF is strictly positive. In addition, filtered speech signal having a bandwidth of B Hz can be essentially represented by log-envelope and IF that have the same B Hz bandwidths. In this paper, we extend the above ideas to decompose speech into modulated components. Specifically, a bank of data-adaptive filters (in a cross-coupled configuration) are used to decompose speech into its components; each adaptive filter is a simple single-resonance bandpass filter (whose center-frequency or pole-location closely follows the desired formant frequency) supplemented by an adaptive all-zero filter (whose zero-locations sufficiently reduce unwanted leakage from neighboring formants). The filtered components are then represented by their respective log-envelopes and positive IF's; these small number of modulations closely approximate the speech signal. © 2000 IEEE.

Publication Title, e.g., Journal

IEEE Transactions on Speech and Audio Processing