Model-based approach to envelope and positive instantaneous Model-based approach to envelope and positive instantaneous frequency estimation of signals with speech applications frequency estimation of signals with speech applications

An analytic signal s ( t ) is modeled over a T second duration by a pole-zero model by considering its periodic extensions. This type of representation is analogous to that used in discrete-time systems theory, where the periodic frequency response of a system is characterized by a ﬁnite number of poles and zeros in the z -plane. Except, in this case, the poles and zeros are located in the complex-time plane. Using this signal model, expressions are derived for the envelope, phase, and the instantaneous frequency of the signal s ( t ). In the special case of an analytic signal having poles and zeros in reciprocal complex conjugate locations about the unit circle in the complex-time plane, it is shown that their instantaneous frequency (cid:126) IF (cid:33) is always positive. This result paves the way for representing signals by positive envelopes and positive IF (cid:126) PIF (cid:33) . An algorithm is proposed for decomposing an analytic signal into two analytic signals, one completely characterized by its envelope and the other having a positive IF. This algorithm is new and does not have a counterpart in the cepstral literature. It consists of two steps. In the ﬁrst step, the envelope of the signal is approximated to desired accuracy using a minimum-phase approximation by using the dual of the autocorrelation method of linear prediction, well known in spectral analysis. The criterion that is optimized is a waveform ﬂatness measure as opposed to the spectral ﬂatness measure used in spectral analysis. This method is called linear prediction in spectral domain (cid:126) LPSD (cid:33) . The resulting residual error signal is an all-phase or phase-only analytic signal. In the second step, the derivative of the error signal, which is the PIF, is computed. The two steps together provide a unique AM-FM or minimum-phase/all-phase decomposition of a signal. This method is then applied to synthetic signals and ﬁltered speech


INTRODUCTION
Many natural and man-made signals of interest are timevarying or nonstationary in nature, i.e., their frequency content or spectrum changes with time.Examples include speech signals, animal calls, biological/biomedical signals such as cardiac rhythms, etc. Techniques for characterizing such signals are of great importance in applications involving such signals.A collection of short-time Fourier spectra known as spectrogram is a common tool for analyzing such time-varying signals.Unfortunately, the spectrogram suffers from the need to compromise time and frequency resolution, i.e., a large time window is required to resolve closely spaced frequencies.To overcome this problem, a number of so-called time-frequency distributions or representations have been developed. 1,2The time-frequency analysis tools are very useful in visualizing the time and frequency behavior of simple signals like a chirp.However, when the signals are complex, as in the case of speech, it is hard to interpret time-frequency representations because of the interactions between components in the signal.The time-frequency analysis methods also create a practical problem.They result in enormous 2D data sets.Although sometimes these 2D data sets can be viewed by humans to sort out the important features of interest, it is hard to program a machine to reliably extract such features.Hence it has been difficult to apply these methods to automatic signal classification problems.
In the area of speech processing, the above problem is circumvented by directly extracting features from short segments of a speech signal.Such algorithms are based on short-term spectral analysis in the form of linear prediction ͑which captures the spectral envelope of a signal with a few parameters͒, 3,4 cepstral analysis, 5 and Mel-cepstrum. 6Using these procedures, spectral templates or feature vectors are computed and used in applications like machine recognition/ verification.However, these methods are vulnerable to interference and channel degradations as encountered in telephone speech.Signals are also often analyzed over shorttime intervals, using specific signal models, such as sum of sinusoidal or damped sinusoidal signals or phase-modulated sinusoidal signals.If such models are appropriate for the data at hand, then significant advantages can be gained.In this paper a model-based approach is proposed for representing signals by their envelope and instantaneous frequency which is guaranteed to be positive.

A. Envelope and instantaneous frequency of signals
Many of the above-mentioned methods represent a signal by characterizing its power as a function of time and frequency.Are there other alternatives?Clearly, a signal's phase and envelope carry information about how various components of the signal are related to each other.Hence, is it possible to characterize a signal by its phase and envelope modulations?In his 1946 paper, among other important ideas, Gabor approached this question by defining the socalled analytic signal or pre-envelope. 7,8Recall that if s r (t) is a real signal, then the corresponding analytic signal is s(t)ϭs r (t)ϩ js ˆr(t), where s ˆr(t) is the Hilbert transform 9 of s r (t).The Fourier transform of s(t), S(), is nonzero only for Ͼ0.The envelope of s r (t) is then defined as ͉s(t)͉ and its instantaneous frequency ͑IF͒ is denoted by the first derivative of s(t)'s phase function scaled by 1/2.][13][14] Engineers and scientists are most familiar with IF in the context of frequency-modulated signals as in a FM radio.But what about the IF of an arbitrary signal?For an arbitrary signal, the IF is typically an erratic function whose range may extend from negative to positive infinity. 10For example, for a signal consisting of two complex sine waves, i.e., s(t)ϭa 1 e j 1 t ϩa 2 e j 2 t , the IF could lie anywhere in the range of ͑Ϫϱ,ϱ͒ depending on the relative sizes of a 1 and a 2 .The general impression among researchers is that the IF function is unusable unless it is sufficiently smoothed. 15ome incompletely resolved questions regarding IF include: how do we interpret the envelope and IF of naturally occurring ͑not man-made͒ signals like speech?How are phase or IF and envelope related to each other?Is IF more important than envelope?When is a signal's IF a smooth function?Under what conditions is a signal's IF guaranteed to be positive, and so on.Further, one of the factors that has discouraged researchers 15,16 in using phase and envelope to represent a signal is the following: for example, if bandpass filtered speech is decomposed into envelope and IF, then the resulting modulations, rather ironically, have bandwidths that are typically much greater compared to that of the original band-limited signal.
In addition to Gabor, Dugundji, and others, 7,8,[17][18][19][20][21][22] significant contributions to understanding analytic signals were made nearly 30 years ago by Voelcker. 23,24Voelcker proposed a methodical way to understanding the IF and logenvelope of signals which may help answer some of the questions raised above.Unfortunately, Voelcker's work never became popular because it was somewhat hard to read.He proposed that complex-valued signals ͑and hence analytic signals͒ be modeled as polynomials or a ratio of polynomials in the complex variable t ͑time͒, just like a given system or frequency response may be modeled by a ratio of polynomials in the s-domain ͑continuous-time systems͒ or the z-domain ͑discrete-time systems͒.He called it ''product representation of signals.''Once we realize that signals may be represented by a polynomial or a ratio of polynomials with complex coefficients, then a myriad of ideas that have been developed in systems literature can be applied to this socalled product representation of signals.In this paper we extend Voelcker's work by applying some of the well-known ideas from the theory of linear prediction 25 to his signal model.
A motivation for representing signals by envelope and IF comes from our desire to understand and model the signal processing function performed by the auditory periphery, particularly the cochlea.The cochlea is known 26 to decompose acoustic stimuli into frequency components along the length of the basilar membrane.This phenomenon is called tonotopic decomposition.Further, it is also known that the nerve fibers emanating from a high-frequency location in the cochlea ''phase-lock'' to the envelope of the stimulus around that frequency, i.e., convey information about the envelope modulations in the signal. 27Thus, to a first-order approximation, it is often argued that the tonotopic location/place along the length of the basilar membrane conveys the IF or frequency information about the signal, and the rate of nerve fiber activity around that location conveys the envelope information.Hence analytical signal models that explicitly characterize the envelope and phase variations of a complex stimulus on a short-time basis may eventually help in understanding the cochlear function.

B. Organization of the paper
In Sec.I we consider complex-valued periodic signals and express them as a product of so-called elementary signals a `la Voelcker.This type of representation is analogous to that used in discrete-time systems theory, where the periodic frequency response of a system is characterized by a finite number of poles and zeros, except in our case the poles and zeros are located in a complex-time plane.Using this signal model, we derive expressions for the envelope, phase, and the instantaneous frequency.In the special case of an analytic signal having poles and zeros in reciprocal complex conjugate locations about the unit circle in the complex-time plane, it is shown in Sec.II that their instantaneous frequency ͑IF͒ is always positive.This result paves the way for representing signals by positive envelopes and positive IF ͑PIF͒ as desired in literature associated with time-frequency distributions. 10,14In Sec.III we propose a new algorithm which consists of two steps to achieve a unique decomposition of an analytic signal into two analytic signals, one completely described by its envelope and the other having a positive IF.This type of decomposition is different from those known in the cepstral literature. 5In the first step, the envelope of the signal is approximated to desired accuracy using a minimum-phase approximation by using the dual of the autocorrelation method of linear prediction 25 well known in spectral analysis.The criterion that is optimized is a waveform flatness measure as opposed to the spectral flatness measure used in the spectral domain.We call our method, linear prediction in spectral domain ͑LPSD͒.The resulting residual error signal is an all-phase or phase-only analytic signal.In the second step, the derivative of the error signal is approximated.The two steps together provide a unique AM-FM or minimum-phase/all-phase decomposition of a signal.This method is then applied to synthetic signals and filtered speech signals.

I. ENVELOPE AND IF IN TERMS OF A SIGNAL MODEL
Consider a periodic analytic signal s(t), with period T seconds.Let ⍀ϭ2/T denote its fundamental angular fre-quency.If s(t) has finite bandwidth, it may be described by the following model for a sufficiently large M, over an interval of T seconds: ͑1͒ e j t t represents a frequency translation.In other words, t у0 is the nominal carrier frequency of the signal.a k are the complex amplitudes of the sinusoids e jk⍀t ; a 0 0 and a M 0. By analytic continuation we may regard e j⍀t as a complex variable ͑a `la the complex variable Z͒.That is, t, the time variable, is regarded as complex-valued.Note that in Eq. ͑1͒ the M th degree polynomial in e j⍀t represents the complex envelope of the signal s(t).We may factor this polynomial into its M (ϭ PϩQ) factors and rewrite s(t) as ͑2͒ p 1 ,p 2 ,...,p P , and q 1 ,q 2 ,...,q Q denote the polynomial's roots; p i ϭ͉p i ͉e j i , q i ϭ͉q i ͉e j i .p i denote roots inside the unit circle in the complex plane, q i are outside the unit circle.Currently we assume that there are no roots on the circle.That is ͉p i ͉Ͻ1 and ͉q i ͉Ͼ1.Each factor of the form ( 1Ϫp i e j⍀t ) in the above is called an ''elementary signal.'' 23he p i and q i are referred to as zeros of the signal s(t).The above expressions, representing a band-limited periodic signal, may be recognized as the counterpart of the frequency response of a finite impulse response ͑FIR͒ filter in discretetime systems theory. 28More generally, if s(t) consists of an infinite number of spectral lines ͓i.e., its Fourier transform, S()ϭ ͚ kϭ0 ϱ a k ␦(Ϫk⍀)͔, then we can represent s(t) over T seconds to desired accuracy using a sufficient number of poles and zeros as follows: ͑3͒ p i and q i correspond to zeros inside and outside the unit circle, respectively.u i correspond to the signal's poles.Since the spectrum of the signal is assumed to have only positive frequencies, poles are restricted to be inside the unit circle.Again this representation is analogous to causal, stable IIR filters in discrete-time systems literature.Even more generally, if the spectrum of s(t) is two-sided then we may model s(t) using poles and zeros inside and outside the unit circle.e j t t , the arbitrary frequency translation, is analogous to an arbitrary time shift in the impulse response in the case of a discrete-time filter.In summary, we model complex-valued periodic signals using an all-zero or a pole-zero signal model as in Eqs.͑2͒ and ͑3͒, respectively.This type of signal modeling goes back to the work of Cauchy and Hadamard and is related to the theory of entire functions. 29,30Voelcker called this way of modeling signals as ''product representation of signals.''We shall primarily work with the all-zero models since they are easier to use.
The factors corresponding to the zeros inside the unit circle, ⌸ iϭ1 P (1Ϫp i e j⍀t ), constitute the minimum-phase ͑MinP͒ signal.Similarly, the factors corresponding to the zeros outside the circle, ⌸ iϭ1 Q (1Ϫq i e j⍀t ), constitute the maximum-phase ͑MaxP͒ signal.These are the direct counterparts of the frequency responses of the well-known minimum-and maximum-phase FIR filters in discrete-time systems theory; 5 just as in systems theory ͑see Sec.10.3 in Ref. 5͒ the phase of the MinP signal is the Hilbert transform of its log-envelope.That is, the MinP signal may be expressed in the form e ␣(t)ϩ j␣ ˆ(t) .See Appendix A for details.␣ ˆ(t) is the Hilbert transform of ␣(t).Similarly, since a maximum-phase ͑MaxP͒ signal has zeros outside the unit circle, it may be expressed as e ␤(t)Ϫ j␤ ˆ(t) .Thus, envelope or phase alone is sufficient to essentially characterize a MinP or a MaxP signal.͓Along the same lines, an all-phase ͑AllP͒ analytic signal ͑the analog of an all-pass filter͒ would be of the form e j␥(t) .͔Thus s(t) may be expressed as

͑4͒
where the ''hat'' stands for Hilbert transform.c is Q⍀ ͑contributed by the linear phase term from the MaxP signal͒ plus the arbitrary frequency translation, t , shown in Eq. ͑2͒.A c is a 0 ⌸ iϭ1 Q (Ϫq i ).See Appendix A for details.The expressions for ␣(t) and ␤(t) are derived in Appendix A.
Closed-form expressions can be obtained for ␣ ˆ(t) and ␤ ˆ(t). 23,31The ''dot'' stands for the time-derivative operation.Note that the envelope of s(t) is A c e ␣(t)ϩ␤(t) and the IF is c ϩ␣ ˆ(t)Ϫ␤ ˆ(t).A detailed description of properties of envelope and IF of signals described by Eq. ͑2͒ can be found in Ref. 31.We briefly summarize the main points here.The envelope, log-envelope, and phase ͑or IF͒ of s(t) are not band-limited quantities.It can be shown that if s(t) is bandlimited then ͉s(t)͉ 2 and dЄs(t)/dt͉s(t)͉ 2 are band-limited.Further, it can also be shown that no ''information'' is lost by filtering the log-envelope and IF of a band-limited s(t), using a lowpass filter with bandwidth equal to that of the signal s(t).That is, in principle, it is possible to essentially reconstruct the signal s(t) given ideally filtered versions of log-envelope and IF of s(t).The counterpart of this property in the systems domain is the property of complex cepstrum ͑see Ch. 12 in Ref. 5͒.That is, even though the complex cepstrum of a finite-length discrete-time sequence is infinite in length, only a finite number of samples of the complex cepstrum is needed to recover the original sequence.
Using the above product representation model, in addition to being able to obtain explicit expressions for the logenvelope and IF, it is also easy to gain intuitive understanding of the relationship between phase and envelope of signals based on familiar results in systems theory.Just like the unit circle in the ͑discrete-time͒ z plane corresponds to the interval between zero frequency and the sampling frequency, 5 the unit circle in the complex-time plane corresponds to the interval of T seconds.If a periodic signal is such that a zero of the signal, p i or q i , is close to the unit circle, then significant phase changes will occur in the temporal neighborhood of this zero, which will be reflected in the IF values.Specifically, a zero close to the unit circle will result in a large spike in the IF.In fact, if a zero happens to fall on the circle, the envelope goes to zero ͑at a time instant determined by the zero's location͒ and the IF at that time instant is undefined ͑a la group delay of systems͒.Thus if we want to use IF and log-envelope as information-bearing attributes of a signal, then it is necessary to ''tame'' these quantities by shaping the signal spectrum.That is, we must preprocess the signal such that the zeros, p i and q i , stay away from the unit circle.This preprocessing then becomes part and parcel of the signal representation.

A. Extension to nonstationary signals
The model in Eq. ͑2͒ describes a stationary and periodic signal.Of course, most signals of interest are not stationary and certainly not periodic.Hence, as in the case of short-time spectral analysis/spectrogram, we may consider a short T-second segment of a nonstationary signal and imagine that it is periodically extended in order to apply the model in Eq. ͑2͒.Then, successive overlapping T-second segments of a signal may be described as in Eq. ͑2͒, possibly with slowly drifting parameters (p i and q i ) and the associated envelope and IF they represent.Thus although the model described in this section is strictly valid for a periodic signal, we intend to apply it to nonstationary signals by viewing the signal through a sliding T-second window.In fact there is no reason to fix the window length to T seconds.The window length may be a function of the nominal center frequency of the signal s(t) as its characteristics change.Next, we use the above model to define a signal whose IF is positive.

II. POSITIVE INSTANTANEOUS FREQUENCY "PIF… OF A SIGNAL
Recall that an analytic signal is said to be minimumphase ͑MinP͒ if its log-envelope (ln͉s(t)͉) and its phase angle are related by Hilbert transform.An analytic signal is said to be maximum-phase ͑MaxP͒ if its log-envelope is the negative of the Hilbert transform of its phase angle.An important property of these signals is that their logarithm is also an analytic signal.Another important aspect is that either envelope or phase of these signals is essentially sufficient information to characterize these signals.An analytic signal is said to be all-phase ͑AllP͒ if its envelope, ͉s(t)͉, is constant.That is, AllP is a pure phase signal with one-sided spectrum.Now we shall discuss signals whose IF is always positive.

A. General case
Let s(t) be any analytic signal with spectrum confined to the positive side of the frequency axis, The IF could lie anywhere in the interval of ͑Ϫϱ,ϱ͒ depending on the makeup of s(t).Let us rewrite s(t) as s͑t͒ϭe ln a͑t ͒ϩ j͑t ͒ .͑7͒ Adding and subtracting in the exponent the term 31 j ln a (t), ͑''hat'' stands for Hilbert transform͒, we get after rearranging,

͑8͒
The above is analogous to the unique decomposition of the frequency response of a linear, causal, continuous-time system into its minimum-phase and all-pass parts. 9Observe that in the above the first term on the right is a MinP analytic signal.If we multiply both sides of the above by e Ϫln a(t)Ϫj ln a (t) ͑which is also MinP with spectrum confined to positive frequencies͒, since the spectrum of s(t) is already confined to positive frequencies only, it follows that the spectrum of e j((t)Ϫln a (t)) is nonzero only for positive frequencies.Hence e j((t)Ϫln a (t)) must be an AllP analytic signal.The AllP signal is also called a Blaschke function in analytic function theory, 32,33 and may be written as a product of all-phase ''sections,'' i.e., as ⌸ i (tϪz i )/(tϪz i *).It can be shown that the AllP signal has not only a one-sided spectrum but has the remarkable property that its IF is a positive definite function. 23,32Based on this property we have defined a function (t), called the positive IF ͑PIF͒, 34 of any analytic signal s(t) as follows: In words, we define an analytic signal's PIF as the derivative of that part of its phase which is left over after removing the contribution due to the signal's log-envelope ͑specifically the Hilbert transform of its log-envelope͒ from the original phase.The main point is that any analytic signal can be characterized by two positive functions: a positive envelope function ͑the magnitude of the MinP part͒ and a positive IF function ͑of its AllP part͒ rather than by its usual IF ͓phasederivative, ˙(t)͔.This is an important observation that we repeatedly exploit.

B. Periodic case
Although the above decomposition is valid for any analytic signal, as mentioned before, in practice one has to work with a finite, T-second, segment of a possibly nonstationary signal, s(t).Hence, we may invoke the ͑periodic extension͒ model we have used in Eq. ͑1͒.We shall repeat Eqs.͑2͒ and ͑4͒ here for convenience.

͑11͒
Note that the zeros, q i , and p i are assumed to be outside and inside the unit-circle, respectively.We shall reflect the q i to inside the circle ͑as 1/q i *) and cancel them using poles.Then we group all the zeros inside the unit circle to form a different MinP signal and the zeros outside the circle and the poles that are their reflections inside the unit circle to form the all-phase or AllP part of the signal.That is,

͑12͒
Equivalently, multiplying and dividing Eq. ͑11͒ by e j2␤ ˆ(t) and collecting terms we get

͑13͒
This grouping of signals is, of course, analogous to wellknown decomposition of a linear discrete-time system into minimum-phase and all-pass systems ͑see Sec.5.6 in Ref. 5͒.Analogous to the fact that the group delay of the all-pass filters is always positive ͑Sec.5.5 in Ref. 5͒, the IF of AllP part will always be positive ͑even if t , the frequency translation, is zero͒.See Appendix B for a derivation of the IF of an AllP signal.Thus the PIF, (t), of s(t) is a positive function and is as follows: The expression for ␤ ˆ(t) is the same as that of ␤(t) in Eq. ͑5͒ with cosine replaced with sine.Of course, we could also group the zeros outside the unit circle together to form a MaxP-AllP decomposition.That is, we could also rewrite Eq. ͑12͒ as a MaxP/AllP product as follows: s͑t ͒ϭA c e ␣͑t͒ϩ␤͑t͒Ϫ j"␣ ˆ͑t ͒ϩ␤ ˆ͑t ͒… e j" c tϩ2␣ ˆ͑t ͒… .͑15͒ In this case the IF corresponding to the AllP part will be always negative ͑assuming the frequency translation t is zero͒ and may be called negative IF ͑NIF͒.If we can separate the MinP and the AllP components of the signal s(t), the MinP part conveys the AM information, i.e., e ␣(t)ϩ␤(t) ͓or equivalently, its logarithm ␣(t)ϩ␤(t)͔ around the carrier c and the AllP part conveys the PIF information, (t).
The next question is: given s(t) over a T-second interval, how do we compute the PIF of the signal or equivalently separate the MinP and AllP components?There are at least three not so elegant ways to separate the MinP and AllP components.First, one could find the Fourier coefficients of s(t), then root the polynomial formed using the Fourier coefficients, i.e., find p i and q i , and then group them as in Eq. ͑12͒ to separate the components.Alternatively, one could compute the log-envelope of s(t) ͑i.e., ln͉s(t)͉), compute its Hilbert transform, and subtract it from the phase of s(t) ͓as in Eq. ͑8͔͒.Third, we can use the block diagram in Fig. 12.7 ͑p.784͒ of Oppenheim and Schafer 5 by replacing their X(e j ) by s(t).In this case one computes the logarithm of s(t) and keeps the causal part of its spectrum ͑i.e., spectrum corresponding to the positive frequencies͒ as the MinP part.The AllP part is obtained by dividing s(t) by the MinP part as in Ref. 5.However, there is a new and elegant way of achieving this decomposition which we describe next. 34Remarkably, it does not require explicit computation of the logarithm or the Hilbert transform or rooting of a polynomial.We also called this method a generalized AM-FM demodulator since the outputs of the algorithm are the envelope and PIF.

III. ALGORITHM FOR DECOMPOSING AN ANALYTIC SIGNAL INTO ENVELOPE AND PIF
Although in the previous section we have pointed to the fact that any analytic signal can be written as a product as in Eq. ͑13͒, the question is how do we separate these multiplied components?In this section we describe a remarkably simple algorithm to separate the MinP and AllP components.This is shown in Fig. 1.It consists of two parts.In the first part, which consists of a multiplier or modulator, an inverse signal generator ͑ISG͒, and an error minimization block, a model fitting procedure is used to flatten the envelope of the signal s(t).This is achieved by minimizing the energy of an error signal e(t)"ϭh(t)s(t)….The energy of e(t) is defined as follows: generates a low-pass periodic signal.The error energy is minimized by choosing the coefficients, h k .The reader who is familiar with model-based spectral analysis will immediately recognize the analogy between this method and the ''autocorrelation method'' of linear prediction. 4,25In the autocorrelation method, a discrete-time FIR filter, called an inverse filter or prediction-error filter, with frequency response H(e j ) ͑with first coefficient held at unity͒, is used to flatten the envelope of a spectrun X(e j ) of a sequence x(n) by minimizing the error ͐ 0 2 ͉X(e j )H(e j )͉ 2 d.This is an exact analog of Eq. ͑16͒.Analogous to the autocorrelation method, the error in Eq. ͑16͒ is a measure of the flatness of the envelope of e(t).Also, minimizing the error in Eq. ͑16͒ amounts to performing linear prediction on the Fourier coefficients of the signal s(t) and hence we called it linear prediction in spectral domain or LPSD in earlier work. 34The signal h(t) may be called the ''inverse signal'' analogous to the inverse filter.
Similar to the MinP property of the prediction-error filter used in linear prediction, 25 minimizing ͐ 0 T ͉e(t)͉ 2 dt results in a h(t) that is a MinP signal ͑having all its signal zeros inside the unit circle͒.This is true even if the envelope of s(t) goes to zero at some points between 0 and T seconds, i.e., even if some p i or q i fall on the unit circle.The significance of this MinP property is that, as we already know, h(t)'s log-envelope and phase are Hilbert transforms.Because the error minimization is performed to flatten s(t)'s envelope, if the value of H is chosen sufficiently large, then h(t) will be given by h͑t ͒Ϸe Ϫ"␣͑t ͒ϩ␤͑ t ͒… e Ϫ j"␣ ˆ͑t ͒ϩ␤ ˆ͑t ͒… .͑17͒

Thus, 1/h(t) is the desired approximation to s(t)'s MinP component and hence the name ''inverse signal'' for h(t).
Consequently, the error signal e(t) will be e(t) ϷA c e j" c tϪ2␤ ˆ(t)… , and hence is an approximation to the AllP component of s(t).In the second part, denoted in Fig. 1 as ''measure frequency,'' the PIF is computed as e ˙(t)/͉e(t)͉ or dЄe(t)/dt.The next section describes the algorithm used to minimize the error ͐ 0 T ͉e(t)͉ 2 dt.

A. LPSD algorithm using signal samples
In this section we present the details of the LPSD algorithm for computing the MinP and AllP approximations given the samples of the signal s(t).The algorithm amounts to performing linear prediction on the discrete Fourier transform ͑DFT͒ values of the signal samples.Let s͓n͔ (n ϭ0,1,...,K), given by Eq. ͑1͒, denote samples of the given signal; KϭNϪ1.Let ⍀ϭ2/N be the assumed fundamental frequency.By replacing h(t) and e(t) by their respective sampled versions, we have which can be further expressed in matrix notation as e jK⍀ s͓K͔ e j2K⍀ s͓K͔ ¯e jKH⍀ s͓K͔

͑19͒
If we let s, H, h, and e denote the vectors/matrices from left to right in Eq. ͑19͒, then the solution vector, h, that minimizes e T eϭ ͚ nϭ0 NϪ1 ͉e͓n͔͉ 2 , in Eq. ͑19͒, is given by h ˜ϭϪ͑H T H͒ Ϫ1 H T s. ͑20͒ Here T stands for conjugate-transpose and ( ) Ϫ1 denotes matrix inverse operation.The matrix, H, can be further decomposed into a product HϭS NϫN X NϫH : ] ] e jK⍀ e j2K⍀ ¯e jHK⍀ ͪ NϫH .

͑21͒
In Eq. ͑21͒, observe that S is a diagonal matrix consisting of signal samples while X is essentially the DFT matrix.Using this decomposition, the solution vector, h ˜, given by Eq. ͑20͒, can be rewritten as Clearly, the solution depends only on the magnitude of s͓n͔.h͓n͔ can then be reconstructed by substituting elements of the vector h ˜in h͓n͔ϭ1ϩ ͚ kϭ1 H h k e jk⍀n •s MinP ͓n͔ can then be computed as 1/h͓n͔; the log-envelope and phase of s MinP ͓n͔ correspond to ␣͓n͔ϩ␤͓n͔ and ␣ ˆ͓n͔ϩ␤ ˆ͓n͔, respectively.The positive frequency, c Ϫ2␤ ˆ͓n͔, can be found as the IF of the error signal, e͓n͔, using any standard IF estimator such as the phase difference between neighboring samples. 35Instead, as mentioned earlier, we may also apply the LPSD algorithm again to e ˙͓n͔ ͓because the envelope of the first derivative of e(t) is (t), which is the PIF͔.We call this step the second-stage LPSD.
The LPSD algorithm attempts to flatten the envelope of the signal s(t) by using an adaptive amplitude demodulator.This process not only eliminates the AM but also automatically removes from the phase of s(t) a quantity equal to the Hilbert transform of the log-envelope of s(t).This is what causes the IF of e(t) to be positive.Instead, if we simply ''clip'' s(t), i.e., obtain s(t)/͉s(t)͉, then its phase derivative, the traditional IF, will not always be positive.Second, the MinP property of h(t) guarantees that the envelope approximation 1/͉h(t)͉ will never equal zero.Further, MinP signals will have their energy concentrated over a relatively small region in the spectral domain analogous to a MinP filter which has its impulse response peaking close to origin.It is also possible to use the LPSD algorithm to achieve a MinP-MaxP ͑instead of MinP-AllP͒ decomposition of s(t).Separation of these components may also be viewed as deconvolution of their spectra in the frequency domain.Third, an important advantage of the LPSD algorithm is that it achieves the separation of the MinP and AllP components without explicitly rooting a polynomial or computing the logarithm or Hilbert transform of the signal s(t).

B. Simulation results
We now provide results of applying the LPSD procedure to decompose synthetic signals.It will be followed by an example of a speech signal.
The signal samples were fed to the LPSD algorithm described in the previous subsection.The coefficients of the inverse signal h(t) were computed using Eq.͑20͒.Once the coefficients of h(t) are computed, then h(t) ͑actually its samples͒ is synthesized.For the case of 60 coefficients ͓i.e., Hϭ60 in Eq. ͑19͔͒, the estimated log-envelope given by 1/͉h(t)͉ is shown ͑solid line͒ in Fig. 3͑a͒.Actually, two periods ͑10 msec͒ of the log-envelope are shown.Also shown is the true envelope ͑dashed line͒ given by ln͉s(t)͉.They perfectly match and hence the dashed line is not visible.The magnitude of the error signal e(t) is shown in the dasheddotted line in Fig. 3͑a͒, and is close to unity, indicating that the error signal e(t) is indeed AllP.In Fig. 3͑b͒ we have plotted the signal's raw IF ͓obtained by differencing the phase angles of adjacent samples of the signal s(t)͔.Note that the raw IF goes negative ͑dashed line͒.On the other hand, the PIF ͑i.e., c Ϫ2␤ ˆ͓n͔) computed by differencing the phases of the neighboring samples of the error signal e(t), stays positive, as it should.The PIF can also be obtained by using the LPSD algorithm on e ˙(t); we call this second-stage LPSD.The PIF obtained by differencing the phase angles of neighboring samples of e(t) or by using the second-stage LPSD gave essentially the same results.Also plotted in Fig. 3͑b͒ is the true PIF ͑dashed-dotted line, again not visible͒.The true PIF was obtained, for the purpose of comparison, by using the roots of the polynomial in Eq. ͑1͒ and synthesizing the AllP signal given in Eq. ͑12͒ and determining its IF.c was estimated as the mean of PIF and ␤ ˆ͓n͔ was separated by subtracting c 's estimate from the PIF.Further, ␣ ˆ͓n͔ was computed by subtracting the estimate of ␤ ˙͓n͔ from the MinP signal's (1/h(t)'s͒ IF; the solid line in Fig. 3͑c͒ corresponds to the separated ␣ ˙͓n͔; it matches with the true one ͑obtained using the signal's roots͒ shown as a dashed-dotted line.In Fig. 3͑d͒ we have displayed the real part of the signal reconstructed using the separated MinP and MaxP components using a solid line; the dashed-dotted line corresponds to two periods of the real part of the original signal s(t); they match exactly.
Figure 3͑e͒ corresponds to the estimated PIF ͑solid line͒ when Hϭ20 in first stage and Hϭ15 in second-stage LPSD.Clearly, a higher model order ͓the results of which are shown in Fig. 3͑b͔͒ results in a better approximation.The effect of varying a signal's duration and changing model order is shown in Fig. 3͑f͒: we have plotted Ϫ10 log ͑error͒ as a function of the signal length and model order; ''error'' denotes sum of squared error between the true PIF and the estimated one.First, not surprisingly, the approximation gets better as model order increases.Second, as T approaches the true period ͑80 samples͒ of the signal, the approximation improves.However, as T further increases, the assumed fundamental frequency, ⍀, decreases and hence LPSD requires a much higher order for a better approximation.The above example had no roots with unit magnitude.To test LPSD on signals with some zeros on the unit circle, magnitude of one of the zeros of the signal used in a previous example was set to unity.We used Hϭ40 in LPSD's first stage and Hϭ10 in the second one.The results are displayed in Fig. 4͑a͒ and ͑b͒.In Fig. 4͑a͒ we plot the log envelopes; sharp dips in the signal's log magnitude ͑dashed line͒ are due to the on-circle zero.Observe that the approximation ͑solid line͒ tends to exclude this zero.Further observe that the magnitude of the error signal e(t) is unity, but for the time corresponding to location of on-circle zero ͑dashed-dotted line͒.In Fig. 4͑b͒ we show the approximated PIF using a solid line along with the true c Ϫ2␤ ˆ͓n͔ ͑dashed line͒.Clearly, the PIF approximates the spikes due to oncircle zeros in addition to closely matching the IF due to zeros off the unit circle.To summarize thus far, given a signal s(t), its various components ͑MinP/MaxP/AllP͒, which are actually multiplied components, can be separated using simple linear techniques without resorting to logarith-FIG.3. The separated log-envelope using LPSD ͑60 coefficients͒ is shown ͑solid line͒ in ͑a͒; the true one is shown as dashed line; the magnitude of the error signal e(n) is shown as dashed-dotted line.In ͑b͒ we plot the signal's raw IF ͑dashed͒ which goes negative; the solid line refers to estimate of PIF (Hϭ15 in second stage͒; the true PIF is also displayed ͑dashed-dotted͒.First stage estimate of ␣ ˙͓n͔ is shown as solid line in ͑c͒ along with true ␣ ˙͓n͔ plotted as dashed-dotted line.The real part of the reconstructed signal using the separated components is plotted in ͑d͒ ͑using solid line͒ along with the real part of the original signal s(t) ͑dashed-dotted line͒; they match exactly.The PIF when 20 coefficients were used in LPSD's first stage and 15 in the second is plotted in ͑e͒.The effect of increasing a signal's duration and increasing model order is shown in ͑f͒.We plot Ϫlog 10 ͑error͒ as a function of the signal length ͑in samples͒ and model order; error denotes sum of squared error between true PIF and estimated one.Time is shown in samples.mic processing or rooting algorithms.We now give an example using speech signals.

Speech signal
In this section we give results of processing clean voiced speech, obtained from the TIMIT database, in the sentence train/dr3/fcke0/si1111.wav which corresponded to the utterance ''How do we define it?''Figure 5͑a͒ shows the results for a segment, whereas Fig. 5͑b͒ shows the results for the entire sentence.The signal ͑sampled at 16 kHz͒ was preemphasized using a high-pass filter ͑with transfer function 1 Ϫ0.98zϪ1 ) and its analytic version was computed using the fast Fourier transform ͑FFT͒ based Hilbert transformer in Matlab.We then chose 14.56 ms of the signal ͑samples 6851:7084͒ that was part of the phoneme /iy/.This signal was then bandpass filtered using three bandpass filters ͑BPFs͒ which were part of ''Lyon's Passive Long Wave Cochlear Model'' proposed by Lyon. 36The bandpass filters ͑BPFs͒ were manually chosen such that their center frequencies were roughly centered around the formant locations.In Fig. 5͑a͒ we have shown the magnitude spectrum of the preemphasized speech signal ͑solid line͒ along with the normalized magnitude responses of the three BPFs ͑dotted lines͒.The signals at these BPFs' output were inputs to our LPSD algorithm.The bandwidths (B c ) for BPFs centered at Ϸ500 Hz, 2.25 kHz, and 5 kHz were approximately 120, 340, and 900 Hz, respectively.These bandwidths roughly correspond to the critical bandwidths of the auditory filters at the given center frequencies.Recall that LPSD assumes a fundamental  frequency, ⍀, of 2/N; this corresponds to 32 Hz for the present example.Having specified a certain bandwidth for envelope approximation, one can compute the algorithm's model order as Hϭ2B c /⍀.Based on these calculations, we chose LPSD model orders, H, to be 12, 33, and 84, corresponding to three times the critical bandwidths for firststage envelope approximation.The values of H were set to 4, 11, and 28 for approximating the PIFs in second-stage processing.One may also keep H fixed and vary the processing interval for each BPF proportional to 1/B c .Our goal was not to parsimoniously describe the signal but to demonstrate that the carrier frequency and the modulations carry sufficient information to describe the signal.The estimated log envelopes are shown in Fig. 5͑b͒ as solid lines ͑not to scale͒ along with the signal's Hilbert envelopes for each of the three filters ͑dashed, dashed-dotted, and dotted for BPFs 1, 2, and 3, respectively͒.The raw IFs ͑obtained by phase-differencing͒ for signals filtered by the three BPFs are displayed in Fig. 5͑c͒ as dashed, dashed-dotted, and dotted lines, respectively, along with corresponding lowpass filtered ͑with order 50 and cutoffs 120, 340, and 900 Hz͒ IFs shown as solid lines.The PIFs resulting from second-stage processing are depicted in Fig. 5͑d͒.
Based on earlier discussions we can see that the sharp spikes in raw log-envelopes and most of the spikes in raw IFs ͑especially for signals at output of BPFs 2 and 3͒ are due to signal zeros very close to the unit circle; the latter may be caused by neighboring peaks in the signal's spectral envelope ͑or neighboring formants͒.Further, the raw IFs also go negative at times.In general, the raw log-envelopes and IFs are highly fluctuating quantities.Clearly, the LPSD may be viewed as a technique to compute a signal's envelope's logarithm.The IF approximated by LPSD has two distinct advantages over techniques that merely filter the raw IF.First, in the absence of on-circle zeros, it is always positive.Second, it approximates the typically impulsive IF better ͑due to the all-pole model assumption͒ as opposed to lowpass filtered IF.
When a composite signal consists of many spectral regions of interest which are time-varying, as in speech, the signal must be decomposed by a bank of time-varying filters which may then be followed by envelope and PIF decomposition described here.The bank of filters must be data adaptive and should form part of the speech signal representation.A block diagram depicting this basic idea is shown in Fig. 6.We have made some progress in implementing this block diagram, 31 but due to space limitations the details are not presented here.Figure 7͑a͒, ͑b͒, ͑c͒, and ͑d͒ show the results FIG. 6.We envision a ''tonotopic signal analyzer'' as a general purpose processor that decomposes an input signal ͑on the time-frequency plane͒ around regions of dominant spectral energies into carrier frequencies, log amplitudes, and MinP-AllP ͑or MinP-MaxP͒ modulations ͓␣ k (t) and ␤ k (t)͔.These modulations are further broken down into their respective center frequencies, and so on.The result is a treelike break-up of the signal wherein higher nodes of the tree correspond to more significant temporal-spectral events in the signal. of processing the entire sentence ''How do we define it?''using this decomposition.We may call this approach ''Tonotopic Signal Analysis ͑TSA͒,'' since the procedure not only attempts to track the formant center frequencies but also provides the details of modulations ͑the ␣ and ␤͒ about those frequencies.Reference 31 provides several such speech processing examples.

IV. DISCUSSION
In this paper our main accomplishment is the decomposition of an analytic signal into two analytic signals using a simple ͑LPSD͒ algorithm.Decomposition of analytic functions of a complex variable has been studied in systems theory and filter design since the days of Henrik Bode 37 in the 1930s.However, much of that work dealt with frequency responses, i.e., frequency is viewed as a complex variable s ͑continuous-time systems͒ or z ͑discrete-time systems͒.Cepstrum-related research 5 may be viewed as an extension of this work.Voelcker's contribution, which extends Gabor's work, 7 is that he recognized that analytic functions could be used for studying the relationships between phase and envelope of signals by treating time as a complex variable.To our knowledge, Voelcker did not attempt to decompose signals into MinP and MaxP or AllP components.The MinP/MaxP/ AllP decomposition was, perhaps, first done by Oppenheim and colleagues ͑see Ch. 12 in Ref. 5, and references therein͒.However, their decomposition was achieved by rooting a polynomial or computing logarithm/log-derivative in the z-transform or frequency domain.In contrast, the significance of our result is that the MinP-AllP or MinP-MaxP decomposition is achieved using an elegant adaptive demodulator without rooting, Hilbert transformation, or phase unwrapping, directly from the given signal s(t).A similar procedure can be developed for the frequency domain as well.The primary difference between our approach and the cepstrum analysis is that we explore the signal's logarithm in the time domain which yields a physically acceptable quantity like the positive instantaneous frequency.This helps us in characterizing the IF of signals which consist of many components such as a speech formant.The average PIF ͑i.e., the carrier frequency͒ indicates the place-location of a signal's spectral concentration.
Unfortunately, in this paper, we still need to form the analytic signal before the proposed decomposition can be achieved.That is, since in practice only real-valued signals are available for processing, one has to compute its Hilbert transform.In more recent work 38 we have proposed an algorithm which avoids computation of the analytic signal.It is possible to obtain the envelope and PIF directly from the real signal under certain restrictions.

ACKNOWLEDGMENT
This research was supported by a grant from the National Science Foundation under Grant No. CCR-9804050.

APPENDIX A: MINIMUM AND MAXIMUM PHASE SIGNALS
An elementary signal, 23  After exponentiating both sides, we get the following identity: From the above expression we note that for an elementary MinP signal, e(t), the logarithm of its envelope and its phase angle are related through the Hilbert transform.Similarly, for an elementary MaxP signal (1Ϫqe j⍀t ) where qϭ͉q͉e j , ͉q͉Ͼ1, we get the following identity: The key difference between Eqs. ͑A3͒ and ͑A4͒ is the change in the sign of the phase function.
Using the above identities in Eq. ͑2͒ yields s MinP ͑ t ͒ϭe ␣͑t͒ϩ j␣ ˆ͑t ͒ ͑A5͒ and s MaxP ͑ t ͒ϭA 0 e ␤͑t͒ϩ j͑ 0 tϪ␤ ˆ͑t ͒͒ , ͑A6͒ where ␣͑t͒ϭ ͚ Thus s(t) as described in Eq. ͑2͒ can be compactly represented as s͑t ͒ϭA c e j c t e ␣͑t͒ϩ j␣ ˆ͑t ͒ e ␤͑t͒Ϫ j␤ ˆ͑t ͒ , ͑A9͒ where A c corresponds to the overall amplitude of the signal and c denotes its ''carrier'' frequency.c is equal to Q⍀ plus any arbitrary frequency translation that the signal s(t) may have been subjected to.The log-envelope and phase of s(t) are expressed in terms of ␣(t) and ␤(t) as ln͉s͑t ͉͒ϭ␣͑ t ͒ϩ␤͑ t ͒ϩln A c ͑A10͒ and Єs͑t ͒ϭ c tϩ␣ ˆ͑t ͒Ϫ␤ ˆ͑t ͒, ͑A11͒ respectively.The above expressions can be a useful pedagogical tool in explaining phase-envelope relationships in the signal as well as systems domains.For instance, the wellknown results in Ref. 39, where one attempts to reconstruct a signal from either phase or magnitude information, may easily be explained using the above expressions.For example, if a pair of roots of s(t) occurs in complex conjugate reciprocal locations, i.e., p i ϭ1/q i * , then the ith term in the summation in Eqs.͑A7͒ and ͑5͒ are identical and hence vanish from the expression for phase in Eq. ͑A11͒.Hence, in this case, phase does not uniquely specify the signal s(t).This is essentially theorem 1 in Ref. 39, which is stated in the systems domain.
Similarly if p i ϭϪ1/q i * , then from Eq. ͑A10͒ we see that magnitude alone is not sufficient to specify a signal s(t).In general, both phase and envelope are required to represent s(t).
The instantaneous frequency ͑IF͒ of s(t) is the derivative of the phase of s(t) and is simply c ϩ␣ ˆ(t)Ϫ␤ ˆ(t) ͑where the dot stands for the first derivative͒, i.e., it consists of a dc ͑corresponding to carrier frequency͒ and a sum of IFs of s(t)'s MinP and MaxP components.Thus we have dЄs͑t ͒ dt Clearly, the spectrum of s(t)'s IF ͓given by Eq. ͑A12͔͒ contains an infinite number of harmonic components ͑⍀ being the fundamental frequency͒.A closed-form expression for IF is obtained by summing Eq. ͑A12͒ as dЄs͑t ͒/dt ͉1/q i ͉͑cos͑⍀tϩ i ͒Ϫ͉1/q i ͉͒ 1Ϫ2͉͑1/q i ͉͒cos͑ ⍀tϩ i ͒ϩ͉͑ 1/q i ͉͒ 2ͬ .͑A13͒ The above reveals that s(t)'s IF tends to Ϯϱ whenever one or more of its zeros tend to lie on the unit circle ͑see Ref. 31 for details͒.All these results were known to Voelcker.
Rearranging the numerator we have z͑t ͒ϭϪqe j⍀t 1Ϫ͑1/q ͒e Ϫ j⍀t 1Ϫ͑1/q*͒e j⍀t .͑B2͒ Simplifying the above equation, we find that z(t)'s envelope is a constant ͑equal to ͉q͉͒ for all time, t, and that its phase angle is Taking the first derivative of Єz(t), its IF can be expressed as Since the right side of Eq. ͑B4͒ is ⍀(1Ϫ͉1/q͉ 2 )͉1 Ϫ(1/q*)e j⍀t ͉ Ϫ2 and is analogous to a ''power spectrum,'' z(t)'s IF is always positive.We may generalize this result to the case of a signal consisting of a product of rational signals as in Eq. ͑B2͒, i.e., z(t) of form 1Ϫq i e j⍀t 1Ϫ͑1/q i *͒e j⍀t .

͑B5͒
Since the phase angle contribution due to each of the L terms in the above equation adds up, the corresponding IF is

͑B6͒
Since each of the L terms in the above summation is positive, we claim that the final IF given by Eq. ͑B6͒ is positive.These results are analogous to the results well known in discrete time all-pass ͑AP͒ systems, where the equivalent of IF is the group delay; 40 our derivation is slightly different than the one given in Oppenheim and Schafer. 5

FIG. 2 .
FIG. 2. The eight zeros of the synthetic signal s(t) are shown in ͑a͒; its magnitude spectrum is plotted in ͑b͒.The signal ͑sampled at 16 kHz͒ has a 200-Hz fundamental frequency and a carrier frequency of 800 kHz.

FIG. 5 .
FIG. 5.The spectrum of a preemphasized voiced speech segment is displayed in ͑a͒.The signal was filtered using 3 BPFs ͓magnitude responses shown in ͑a͒ as dotted lines͔ which correspond to Lyon's auditory filters.LPSD parameters were selected based on BPFs' bandwidths.The estimated log envelopes are shown ͑not to scale͒ in ͑b͒ as solid lines along with the signals' true log envelopes shown as dashed, dashed-dotted, and dotted lines for BPFs 1, 2, and 3, respectively.The raw IFs for signals filtered by BPFs #1, #2, and #3 are displayed in ͑c͒ as dashed, dashed-dotted, and dotted lines respectively, along with corresponding lowpass filtered ͑with order 50 and cutoffs 120, 340, and 900 Hz͒ IFs shown as solid lines.In ͑d͒ we plot the PIFs estimated using LPSD with Hϭ4, 11, and 28, respectively.

FIG. 4 .
FIG. 4. We consider a signal with a zero of unity magnitude.Its log envelope is shown in ͑a͒ as a dashed line; dips correspond to location of the on-circle zero.We used Hϭ40 in LPSD's first stage and Hϭ10 in the second one.The estimated log envelope and PIF are plotted ͑solid lines͒ in ͑a͒ and ͑b͒, respectively; original functions are shown using dashed line; dashed-dotted line in ͑a͒ denotes error's magnitude.

FIG. 7 .
FIG. 7. ͑a͒ The speech signal for the sentence ''How do we define it?''is plotted; this segment was obtained from the TIMIT database ͑TIMIT/train/ dr3/fcke0/si1111.wav͒.͑b͒ We have displayed the estimated average logenvelopes as solid, dashed-dotted, and dotted lines at the output of the three time-varying filters.The details of the time-varying bandpass filters ͑BPFs͒ are given in Ref. 31.͑c͒ We have superimposed on the spectrogram the estimated PIFs of the components at the output of the time-varying BPFs.͑d͒ The averages of the PIFs are shown.They tend to follow the trajectories of the first three formants.
e(t), is defined as e͑t ͒ϭ1Ϫ pe j⍀t , ͑A1͒where pϭ͉p͉e j .If ͉p͉Ͻ1 then e(t) is called a MinP signal, since no other signal with the same envelope has a smaller phase angle.Observe that ͉e(t)͉Ͼ0.Taking the natural logarithm of both sides and using the series expansion, ln(1Ϫy)