On decomposing speech into modulated components
Date of Original Version
We model a segment of filtered speech signal as a product of elementary signals as opposed to a sum of sinusoidal signals. Using this model, one can better appreciate the basic relationships between envelopes and phases or instantaneous frequencies (IF's) of signals. These relationships reveal some interesting properties of the signal's modulations. For instance, if the contribution due to a signal's envelope, specifically the Hilbert transform of its log-envelope, is removed from the signal's phase, then the resulting signal's IF is strictly positive. In addition, filtered speech signal having a bandwidth of B Hz can be essentially represented by log-envelope and IF that have the same B Hz bandwidths. In this paper, we extend the above ideas to decompose speech into modulated components. Specifically, a bank of data-adaptive filters (in a cross-coupled configuration) are used to decompose speech into its components; each adaptive filter is a simple single-resonance bandpass filter (whose center-frequency or pole-location closely follows the desired formant frequency) supplemented by an adaptive all-zero filter (whose zero-locations sufficiently reduce unwanted leakage from neighboring formants). The filtered components are then represented by their respective log-envelopes and positive IF's; these small number of modulations closely approximate the speech signal. © 2000 IEEE.
IEEE Transactions on Speech and Audio Processing
Rao, Ashwin, and Ramdas Kumaresan. "On decomposing speech into modulated components." IEEE Transactions on Speech and Audio Processing 8, 3 (2000): 240-254. doi:10.1109/89.841207.