On accurately tracking the harmonic components' parameters in voiced-speech segments and subsequent modeling by a transfer function

We propose an improved method to model voiced speech signals. First, we describe a method to accurately model the signals using a linear combination of harmonically related sinewaves. The method fits a linear combination of sines and cosines whose frequencies are integer multiples of the unknown fundamental (pitch) frequency to the speech data in the least-square sense. The amplitudes of the sinewaves and the fundamental frequency are the unknowns and are determined simultaneously using the least-squares fit. Using our method, we show how one can obtain smoothly varying frequency and amplitude tracks for all the harmonics and thus model the speech signal parsimoniously. After obtaining the harmonic decomposition, we regard the time-varying amplitudes of the cosinusoidal and sinusoidal harmonic components as the real and imaginary parts of the complex-valued frequency responses of the slowly time-varying filter representing the vocal tract and glottal excitation pulse generator, in cascade. We then fit a sequence of all-pole/pole-zero models to the complex frequency response values.

