Some Limit Theorems for Szego Polynomials

We investigate a variety of convergence phenomena for measures on the unit circle associated with certain discrete time stationary stochastic processes, and for the class of Szego polynomials orthogonal with respect to such measures. Szego polynomials, which form the basis of autoregressive (AR) methods in spectral analysis , are not uniquely defined when the degree is less than the number of points on which the spectral measure is supported; that is, when the spectral measure corresponds to a sum of complex sinusoids, the number of which is less than the degree. We consider the asymptotic behavior of Szego polynomials of fixed degree for certain sequences of measures which converge weakly to such a sum of point masses. The sequence of measures can be formed in various ways, one of which is by convolving point mass sums with approximate identities, or kernels . In signal processing applications, this corresponds to "windowing" a signal composed of complex sinusoids. The Poisson and Fejer kernels are considered. Another way to form the measures is to add an absolutely continuous measure to a sum of point masses, thus obtaining a spectral measure for sinusoids with additive noise, where the noise coloration is described by the density of the absolutely continuous part. We characterize a limit polynomial for several different classes of sequences of measures. Some special cases are used to interpret research done by others in the field. Situat ions where the polynomial degree approaches infinity are considered for fixed measures with a rational spectral density. These measures are the spectral measures for autoregressive moving average (ARMA) random processes. We study the asymptotic behaviors of the reflection coefficients, or constant terms, of the polynomials, and the zero-distribution measures, which consist of point masses at each of the polynomial zeros. These analyses help describe the behavior of the "non-signal" zeros observed in some signal processing situations. Acknowledgements The candidate wishes to thank his advisor Lewis Pakula, and his parents, Robert and Barbara Arciero, to whom this thesis is dedicated.


Introduction
Given a measure on the unit circle there is an associated sequence of polynomials in the complex variable z , called Szego polynomials, which are orthogonal on the unit circle with respect to the measure. Szego polynomials minimize the integral of the squared modulus, over monic polynomials of degree k, with respect to the measure. These polynomials have many applications in analysis and applied mathematics, and their properties have been studied extensively. The classic references for much of the early work include Grenander and Szego ([GS]), and Geronimus ( [Gl], [G2]).
In recent decades there has been much interest in Szego polynomials by researchers in signal processing and control theory due to their intimate connection with problems in linear prediction and spectral estimation. The book by Kay ([K]) includes many examples of their use, as well as an abundance of references to the related engineering literature.
Szego polynomials are not defined when the degree of the polynomial exceeds the number of points on which the measure is supported. Such "point mass" measures can arise, for example, as the spectral measure of signals consisting of a finite number of complex sinusoids. However, we can approximate such measures by other absolutely continuous measures, for which the Szego polynomials are defined, and study the polynomials as the approximation improves. Sections 3 and 4 deal with the questions of uniqueness and existence of limits for Szego polynomials of fixed degree with respect to a sequence, or family, of measures which converges, in some sense, to a sum of point mass measures. A sequence of absolutely continuous measures cannot converge to a finite sum of point masses in the usual sense; that is, in total variation norm on measures. We consider a weaker type of convergence, which can be seen to arise naturally in applications.
Suppose we have a sequence of measures converging in some sense to a sum of point mass measures. Fix k, with k greater than the number of point masses, and consider the sequence of Szego polynomials of degree k corresponding to the converging measures. The convergence of a sequence of measures, in either the usual sense or in the weaker sense we consider , does not guarantee convergence of the associated Szego polynomials if the degree k is less than the number 1 of point masses. We show that any limit point, in polynomial space, of these Szego polynomials must have zeros at the point mass locations. Any limit point will thus have an " extra" factor with degree equal to the difference of the polynomial degree k and the number of point masses.
One such condition under which a unique limit does exist is that the Fourier coefficients of the sequence of measures parametrized by h, where h approaches zero, depend analytically on h.
The two main results of Section 3.2 deal with families of measures formed by convolving the familiar Poisson and Fejer kernels with point mass measures. These kernels are examples of approximate identities whose Fourier coefficients have analytic dependence on h. In Theorems 3.4 and 3.5 we characterize the limit polynomial for convolution with the Poisson and Fejer kernels, respectively, and show that, in each case, the extra factor is actually a Szego polynomial with respect to an absolutely continuous measure, which we specify. Furthermore, we obtain the same limit polynomial in each case. Comparison of the analytic dependence of the Fourier coefficients of these kernels on h, in particular, the agreement of the linear terms in h, suggest that this might hold, but the proofs require detailed analysis of exploiting two properties: The orthogonality of the Szego polynomials and the rate of convergence of zeros of the Szego polynomials to the point mass locations.
Recent work involving sequences of convergent measures and the associated Szego polynomials includes that of Pan and Saff ([PS]), Jones, Njasted and. Saff ([JNS]) , and Jones, Njasted and Waadeland ( [JNW]) . In Section 3.2.2, we use arguments of Pan and Saff to address the convergence rate of the zeros for a general situation where the sequence of measures satisfies two properties, one of which is analytic dependence on h. The cases of convolution with the Poisson and Fejer kernels are then considered separately.
We also consider, in Section 4, families of measures formed by adding a multiple, h, of a fixed absolutely continuous measure to a sum of point mass measures. We characterize limits of the Szego polynomials of degree k, with k greater than the number of point masses. Again we find that the extra factor is the Szego polynomial with respect to an absolutely continuous measure.
This situation differs from that of convolution of approximate identities with point masses, in 2 that the sequence of measures here does converge in total variation norm. We give a method of constructing sequences of measures that converge in total variation norm but whose associated Szego polynomials do not converge.
Another situation which has received attention is where the degree, k, of the Szego polynomials with respect to a fixed measure, approaches infinity. The behavior of the constant term, or reflection coefficient, Rki of the Szego polynomial of degree k has been of particular interest. Since IRkll/k is the geometric mean of the modulus of the zeros of the Szego polynomial of degree k, information about the reflection coefficients can help describe the asymptotic behavior of the polynomial zeros.
The work of Pakula ([P]), Nevai and Totik ([NT]), Saff ([SJ), Petersen ([Pe2]), and others addresses this situation. As in the previously described case of polynomials of fixed degree with respect to sequences of measures, the question of existence of limits is a central concern.
In Section 5 we consider absolutely continuous measures whose densities can be expressed as the squared modulus of a rational function on the unit circle. Such measures arise in applications as the spectral measures of certain random processes. Our main result here is Theorem 5.2, which is an extension of a result in [P], where measures whose densities can be expressed as the squared modulus of a polynomial are considered. A phenomenon which has been observed in the literature by Kumaresan, in [Ku], and others, is that the zeros of the Szego polynomials with respect to certain measures appear to accumulate on a circle of a certain radius. This phenomenon is interpreted in [P] as the convergence of a sequence of measures related to the Szego polynomials, and applies to a general class of measures which includes those with rational densities.
Theorem 5.2 states that, for measures with rational densities, under certain assumptions, IRk 1 1 /k approaches a limit as k --+ oo; this limit being the modulus of the largest zero of the numerator of the density. Results in [P] are then used to draw conclusions about the behavior of the zeros of the Szego polynomial of degree k as k --+ oo. The examples of Section 3.2.5 are special cases of Theorem 5.2. Here, they are also interpreted in the context of Theorems 3.4 and 3.5, and we attempt to make some connections between the situation with Szego polynomials of 3 fixed degree with respect to sequences of measures, and that of fixed measures with polynomial degree approaching infinity.
A related situation considered by Kumaresan and Tufts in [KTl] and [K T2], that of a signal consisting of damp ed exponentials, arises in modeling of speech. In Section 5.4, a sequence of measures is formed from a sum of damped exponentials. T his sequence converges to a measure with rational density. This is interpreted in light of Theorem 5.2 , and observations are made regarding the behavior of the zeros of the associated Szego polynomials. The classical trigonometric moment problem (see, eg., [GS], [L], [A]) will serve as a starting point for our discussions. It is stated as follows: Given a sequence {re}l'', _ 00 of complex numbers, when is a representation of the form re = .l7r7r exp( it:B) dµ ( B) (1) possible for some positive measure µ?
A necessary condition for (1) to hold is that {re} must be a positive semi-definite sequence. That is, given any finite sequence { ce} of complex numbers we must have This can be seen upon substitution of (1) into (2). Conversely, if {re} is positive semi-definite there exists a unique positive measureµ satisfying (1). This result is often referred to as Herglotz's Theorem ( [Ka]) , but there are also proofs due to Caratheodory, Toeplitz , F . Riesz, and Krein ( [A]).
Thus there is a one-to-one correspondence between positive semi-definite sequences and positive measures on the unit circle. The re are the Fourier coefficients 1 , or trigonometric moments ofµ; re = P,(£) . We will henceforth regardµ as a measure on th~ unit circle.

Stationary Processes.
A stochastic process is a sequence, {Xn};::' =_ 00 , of real or complex random variables on a probability space (D, M). We will assume that for all n, Let Hx denote the closure in L 2 (D, M)

Definition and Properties
Let Ak denotes the space of monic polynomials of degree k, and let µ be a positive measure on [-Ir, Ir). We will refer to the polynomials Pk(z, µ), k = 1, 2, 3, ... in the complex variable z of degree k satisfying (4) as Szego polynomials. The Pk(z,µ) are uniquely defined ifµ is supported on more thank points.  (5) where O"k are normalization constants, are the basis for autoregressive (AR) spectral estimates. In the AR approach, one uses µk to estimate µ. For k large, µk is "close to" µ in the sense of the following well-known result , which can be found in (GS] Theorem: The measures µ n in (5) converge to µ in the weak-star (denoted weak-*) sense [GS}, {Lj. That is, for all continuous J on the unit circle.
Situations where (6) holds will be studied in Section 3. We remark that weak-* convergence is weaker than convergence in total variation norm on measures (see, for example, Theorem 3.1 and the remarks which follow). The above theorem can be proved using the characterization (23) of Section 3.1 and the Weiner-Kinchin Theorem, which can be found in [K].
Similarly, ( 4) can be seen to justify the AR approach in fr equency estimation, where one uses the arguments of the largest zeros of Pn(µ , z) as frequency estimates [K]. Let µ be a measure of mixed type. That is where Bj E [-7r, 7r), Soj is the point mass at Bj and '"'f is absolutely continuous. The measureµ is then the spectral measure of a time series comprised of complex sinusoids with additive noise. In order to achieve the minimum in (4) one expects Pk(µ,z) to have zeros close to ei 8 j for large n ([SJ, (PS]). Indeed, if '"Y = 0 it is clear that any polynomial with zeros at z = ei 8 j for j = 1, 2, 3, ... , m will attain the minimum of zero in (4).
The property (4) can also be interpreted from the perspective of linear prediction. If we wish to estimate the random variable Xn in the least squares sense using a linear combination of Xo, X1, ... , Xn-l , we can write n -1 rr n-1 llXn -L an-eXn-ell 2 = 1 leinO -L an-eei(n-e)o l 2 dµ (B).
b l -rr b l 7 (4) we see that the prediction coefficients, aj coincide with the coefficients of Pk. In this From ' t P represents the prediction error filter of order n , and the minimum in ( 4) is called the contex k 2 prediction error power It is easy to see that the Pk (z, µ) are multiples of the orthonormal polynomials of degree k with respect to µ obtained by performing the Gram-Schmidt procedure on 1, z , z 2 , . . . . One simply expands an arbitrary p E Ak in terms of these orthogonal basis pol ynomials and observes that the minimum in (4) is achieved for a multiple of the kth basis element. Thus Pk( z, µ) J_ A k_ 1 , and we have the following orthogonality property, which characterizes Pk( z, µ) (assuming that it is well-defined).
Orthogonality Property: If p(z) is any polynomial of degree less than k, then We will make extensive use of this property in the proofs of Theorems 3.4 and 3.5, which are the main results of Section 3.
Given a polynomial p( z), of degree k , we define the reverse polynomial p* (z) := zkp(z-1 ), so that (9) Thus, for example, if Pk( Z, µ) = rr:=l (z -Zj), then Pf:(z, µ) = rr:=l (1 -ZZj) and the zeros of Pf: , are obtained from those of Pk by reflection in the unit circle. It is well-known that if µ is supported on more thank points, then Pk( z, µ)i s defin ed and the zeros of Pk(z, µ) lie in the open unit disk. There are many proofs of this minimum phase property (see, for eg, [KP],[S], [L]). That the zeros lie in the closed disk is a consequence of Fejer's Convex Hull Theorem ( [Ka]): The zeros of the polynomials orthogonal with respect to a m easure are contained in the closed convex hull of the support of the measure. 2 In th · · · e engmeermg literature, the prediction error power is usually d efined as 27r -rr 8 that Pk(z µ) = CT( z -Wj)· Then PJ;(z,µ) = n z(l/z -Wj) = CT(l -ZWj)· Thus the Suppose ' f P * are obtained from those of Pk by reflecting them with respect to the unit circle, and zeros o k therefore have modulus greater than 1. Note also that 3 2 Representation of Pk(z, µ) Letµ be a finite measure on the unit circle, with moments /1(£). The Toeplitz matrix . .. ~(n -1) is positive semi-definite, and is strictly positive definite for all n if logµ' is integrable, where µ' is the density of the absolutely continuous part ofµ (with respect to Lebesgue measure), and in which caseµ is said to satisfy Szego's condition. See [GS], [HJ, or [JNS] for further discussion. The following equivalent conditions are well known, and can be found in [JNS]. has no zeros inside the unit circle; that is , g is an outer function. This factorization is called the spectral factorization of dµ/ dB. See [GS] or [HJ for further discussion. In particular, if dµ/ dB is 9 't' trigonometric polynomial, it can be factored as the square of a polynomial in z of the a pos1 ive d gr ee This result is due to Riesz and Fejer (see [GS], Sec. 1.12). same e · The matrix Cn is often referred to as either the autocorrelation, or ACF, matrix. It is also referred to as the covariance matrix. This terminology is due to two methods used to estimate the µ(£) from the data; that is, from a realization of a random process with spectral measureµ (see, eg. [K] Ch. 7) . In this context, the sequence of moments is often referred to as the autocorrelation (ACF) function. This terminology also follows naturally from the relations (1) and (3).
If we replace the last row in the right-hand side of ( 11) with the vector ( 1, z, z 2 , .. . , zn), we get a matrix whose determinant is a polynomial in z. If we define We shall refer to (17) as the determinant representation of Pk(z,µ).
The polynomials Pk(z, µ) can also be generated by the computationally efficient Levinson's recursion ( [GS]): where Po(z, µ) = 1, and the reflection coefficients Rk(µ) are the constant terms defined by

Conventions
We adopt the conventions throughout this paper that ( = eie denotes an arbitrary point on the unit circle , and that all integrals are over [-7r , 7r) unless indicated otherwise. We will also, as in Section 2.3, define the prediction error power as the minimum in ( 4) , noting tha t this differs from the usual definition by a factor of 1/27r.
with CTj > o and the ej distinct. In the next sections we will consider absolutely continuous measures µh, on the unit circle such that (21) where the convergence is with respect to the weak-* topology on probability measures characterized in (6). For convenience, we restate this characterization for measures µh parametrized by h > 0.
The measures µh converge to µ in the weak-* sense if and only if It is well-known, and not hard to show, that a necessary and sufficient condition for (22) to hold is that the moments of µh converge to those of µ: We will study the family {Pk (z, µh)} h>O of Szego polynomials as h ---+ 0 where µh is an absolutely continuous family which approaches a sum of point masses, as in (21). We will consider cases where µh is obtained by convolution of absolutely continuous measures with point masses of the form (20), and also for measures consisting of a sum of point masses plus an absolutely continuous part.

The Limit Points of {Pk(z, µh)}
If k > m , then Pk(z,µ) is not defined, since Dk_ 1 (µ) in the denominator of (17) is zero but note that any polynomial of the form has norm in £ 2 (dµ) equal to zero in (4) . We will show that for fixed k > m , all with k-m .
·nts ash--+ 0 in the space of polynomials of degree k, of the family {Pk( z, µh)} , have the limit pm ' ' form (24).
Remark: If his a continuous parameter, we will call a decreasing sequence {he} whose limit 13 To this end we write The first term on the right-hand side of (27) approaches zero as h approaches zero by weak-* genc e of µh Since µ1i -+ µ we must have µh [-11" , 1l") < M for some M > 0 and all h. Thus conver · the second integral on the right-hand side of (27) is less than MJ J J PJ 2 -J Ph J 2 JI, which approaches zero as h -t 0 since the uniform convergence, JPhJ 2 -t J PJ 2 on J zl = 1 follows from uniform convergence of P1i on J zJ = 1. Thus the Lemma will b e proved if we show that J JPhJ 2 dµh -t 0.
If Q E Ak-m is arbitrary, by the minimization property ( 4)  We shall suppose, without loss of gen~rality, that w)") -t e;o, for j 1, 2, ... , m. Our aim is to study Qh(z) ash -t 0.
In the context of the frequency estimation problem in signal processing, the w)") are often called signal zeros, while those of Q h are called non-signal, or extraneous zeros. Information about the zeros of any limit point, Q, of Qh could be useful in discerning which of the zeros of Pk(z, µh) correspond to signal fr equen cies.
If the moments " jh,(C) are analytic functions of h, we can say more. if the µh(f) are analytic functions of h then Dk (z, µh) and Dk(µh) in (15) and (16) will also be an analytic functions of h. For h > 0, µh satisfies Szego's condition , so Dk #-0, and we see from the representation (17) that Pk (z , µh) is an analytic function of h of the form where Mis a constant, T(z) is a polynomial in z, f3(h, z) is a polynomial in h consisting only of terms with degree larger than p , and 'Y(h) is a polynomial in h consisting only of terms with degree larger than n. The coefficients of f3 ( h, z) are functions of z. We must have n ::; p , otherwise Pk(z,µh) is unbounded ash---+ 0. On the other hand, ifn < p , then lim 1i--+oPk(z ,µ1i) = 0 which cannot happen by Proposition l. D
We consider the convolution of point masses with the Poisson kernel, M Ver we will characterize the limit polynomial. oreo , We begin, in Section 3.2. 1, by studying some properties of approximate identities, establishing a basic result and giving two simple examples. In Section 3.2 .2 we will show that the convergence of signal zeros of Szego polynomials with respect to convolution of point m asses with an approximate identity is of the rate O(h) if the approximate identity satisfies Properties 1 and 2. We will use some arguments from (PS], where a situation in which Property 2 holds is considered. In   ( B), for r = 0.5, 0.6, 0.9. A well-known property of both 'l/Jr and </>n is reflected in the following, result which can be found in [R].

Approximate Identities
Theorem 3.1 Let Kh (B) be an approximate identity f or .C1 [-;r, ;r) , and let v be a finite measure on the unit circle. Then the convolution v * Kh converges in the weak-* sense to v ash--+ O.
Remark: Ifµ is a discrete measure, as in (20), and van absolutely continuous measure, then where II. II is the total variation norm, which induces the usual topology on measures. Thus, an absolutely continuous family cannot converge in the strong sense (i.e., in total variation norm) to a discrete measure.
Let Kh be any approximate identity and define where Q(z) E Ak-m· Since the Pk(z, flh) are polynomials of fixed degree equal k with all zeros inside the unit disk , the convergence of any convergent subsequence is uniform on compact sets in and suppose, without loss of generality, that w)hn) --+ eie, for j = 1, 2, .. ., m. Our aim, then, is to study Qhn (z) as n--+ oo.
Two Examples: We compare the Szego polynomial limits with respect to two approximate identities. Note that (B) for any £ 1 function f this correspond to finding the Szego polynomial . .t 1 ·th respect to convolution of each identity with the point mass at e = 0. We find that the hmI SW limit is the same in each case. Comparing the moments of each approximate identity, we find that they agree to first nonconstant terms when expanded as Taylor series in h. (   Note that the weak-* limit of both Fn and 9n is point mass at e = 0. By Proposition 1, we know that every limit point of both {Pk(z,Fn)} and {Pk(z,Qn)} have zlas a factor. We have the following: This follows from Fejer's Convex Hull Theorem (see Section 2.3.1).
Direct computation of Pk(z, 9n), for example, using Maple, for various n and k suggest that limn--+oo Pk(z, Yn) = (z -l)k as well. Computing the moments of the two kernels we find, with h £)and d 2 (h , £)both contain only terms in h of degree higher that 2. Thus, with suitable where 1 , trization of h the moments of the two kernels agree to first non-constant t erm in h and reparame ' f. We will see that this is the case with the moments of the Poisson and Fejer kernels in 3.2.3 and 3.2.4.

Convergence of Signal Zeros
We will show that for a certain class of kernels, the zeros w)hn) converge at the r ate (at least) O(hn)· To do this, we will use some of the arguments of Pan and Saff in [PS]. There a discrete If k > 2I + 1 the Szego polynomials Pk (z, VN) do not, in general, approach a limit as N ---+ oo, but all limit polynomials are of the form (33) for some Q E A 21 +i-k, the zeros of which are necessarily on Jzj :::; 1. In the proof of Theorem 2.4 in [PS] it is shown that the zeros of any such Q are strictly inside the unit circle. The assertion is that this is sufficient to prove the Theorem l . .t factors Q(z) get arbitrarily close to the unit circle. What is actually needed to prove zeros of um 2 4 is that all limit factors Q(z) have all zeros uniformly bounded away from the unit Theorem · circle.
When k :::: 21 + 1, and it is shown in [PS] that the rate of this convergence, as well as that of the moments of VN , its prediction error power, and the zeros of Pk(Z,vN), is 0(1/N).
For the situation addressed here, note that the measure in (20) corresponds to m complex sinusoids. Also , by Proposition 2, if the moments 'j),h(C) are analytic functions of h, as is the case for the Poisson and Fejer kernels, limh--+O Pk( z, µh) exists, and the problem of (asymptotically) discerning signal zeros from extraneous zeros does not arise. Vve will , however , use the proof of Pan and Saff to show that the zeros of any limit factor Q(z) lie strictly inside the unit circle .
Let Ph,k denote the prediction error power for Pk (z, µh) defined as the minimum in ( 4) As we will later show, the following properties hold for the measures when Kh :::: 'I/Jr and Kh = ¢n· They will be assumed here for otherwise arbitrary Kh, with µh defined in (32).  3 R.ecall our convent· . . · · ion o om1ttmg t he factor of l /2n m the d efi mt10n of t he pred1ct10n error power. 20 . h t Property 1 holds, by Proposition 2 limh-+O Pk (z, µ1i ) exists. Using the notation of Assummg t a ( 34 ) and ( 3 5)  and (42) h (hn ) -t e iO; for j = 1, 2, .. ., m. We will use some of the arguments of P an and Saff to show w ere wj that the rate of convergence of the signal zeros w) h ) in ( 42) is a t least 0 ( h).
We can use the relation found in [GS] and elsewhere, to bound the reflection coefficients, P1+ 1 (0, µh ), away from the unit circle uniformly in both k and h. The following is an immediate consequence of (40) and (43).
Lemma 3.1 Suppose that µ1i has Properly 2. Then for all k > m and h > 0 We now study the convergence of Pk(z , µh) in (41) . If /Ch has moments which are analytic functions of h, by Proposition 2 and its proof (see eq. (28)) , we can write p ( where Mis a constant, T( z ) is a polynomial in z, and (J (h ,z) and 1 (h ) are polynomials in h consisting only of terms with degree larger than p. Evidently, P k(z) = TJ:l . This yields T(h , z) and K(h) are polynomials in h consisting only of terms of degree at least 1. for all lzl ~ 1.
Before we address the convergence of the signal zeros, w( h), we will need to show that all the .1 zeros of Q(z) in (41) lie in the open unit disk . To do this we use the proof of Theorem 2.4 in [PS] .

22
Assume that Property 2 holds. Then all the zeros of the limit factor Q(z) in (41) ) . Then (41), (50), and (51) yield By (51) and (53), we see that r(z) is a constant of modulus 1 on C -U~1 eiO; with removable singularities at the eW;. We can then write where T = n e -~•; . The theorem will be proved if we show that all the zeros of R* (z) lie in lz I > 1.
We will show that Property 1 holds for 'l/Jr-The main result of this section is Theorem 3.4, which characterizes the limit Pk in (41) for the measure µ r. First , we consider the conver gence of the moments, :;j;r and µr .
or power in (38) can then be written pred1ct10n err Recall that Pk ,r is the minimum in (4) for the measure µr . We now prove that 'I/Jr has Property 2.
Proof: Since Pk,(z,µr) and X j (z ) are analytic in lzl < 1 + E for some E > 0, each integrand in the right-hand side of (68) is subharmonic in that region. Thus (70) This, with (68) proves the left-hand side inequality in (69) .
To finish the proof, since Pk ,r is the minimum in (4) and z k-m f17~1 (zrei 9 ') E Ak , the representation (67)

Usmg a
In (73) gives (74) In studying the limit factor Q(z) of Pk (Z,µr) in (41) we would like to take limits under the integral sign in (74). The next result concerns the convergence of the rational function in the integrand of (74), and is the key idea used in the proof of Theorem 3.4 , our first main result.  We come now, to the first main result of this work. Lemma 3.5 will allow us to let ' approach 1 under the integral sign in (74), and we can now characterize the limit polynomial , 'P,, in ( 4 l) . We will see that the "extra" factor Q(z) , in (41), is actually a Szego polynomial of degree k -m w·th 1 respect to an absolutely continuous measure , which we specify.
Proof: We need to show that the factor Q(z ), in (41), is the Szego polynomial Pk-m(z , v).
We will show that Q( z ) has the orthogonality property (8), which characterizes Pk-m (z, v).
Consider the factors in the integrand of (74). With the exception of IT 1~1 ( < -w;:l , which

The Fejer Kernel
We now let h = l/n and consider the convolution of the Fejer kernel with the sum of point masses We will see that the Poisson and Fejer kernels have a similar character. In fact we will show that the Szego polynomials with respect to either of these kernels have the same limit; that is , the limit in (41) is the same limit found in Theorem 3.4 with a change of parameter from the continuous r -t 1 in the case of the Poisson kernel, to the discrete n -t oo in the present case.
One starting point for comparison is the respective Fourier coefficients. To compare those of ' I/Jr with (79) we substitute r = 1 -l/n in (59) to obtain 1 -hl£1+ 1 (n) where/ contains only terms in 1/n of order larger than l. Upon comparison with (79), we see that the moments of the two kernels agree up to linear terms in h = l / n = 1r .
as the analog of (65). We see that (~)(£) of (64) also agrees up to linear term in h with (;;;¢;.)(£). We can thus consider(~)(£) and(;;;¢;,)(£) as polynomials in h = l/n , with the latter as linear a · · f · ( · } ·h d fi d pproximat10ns o the former. Jn either case that is, for 1. = 1 -r wit µ r e ne in (SS) or for h = l/n with µn defined in (78)) , we see from (15) and (16) ( 2 8), in light of these remarks, suggest that only the constant and linear terms, 1 -hl£1, 2) ntribute in the limit Pk(z) found in Theorem 3.4. of (8 , co We will prove directly that limn-too Pk (z, µn) = Pk (z) by exploiting the analytical properties of the Fejer kernel and the orthogonality of the Pk( z, </>n) · A key point will be the convergence of the signal zeros, w)n ) --+ eiO;, which has as counterpart the rate of convergence in (71) The above yield two more representations for µn . From (87) and (20) h density in (90) where we have used (91) and the properties of Pl ((, µn)· Since the integrand in (93)  By the orthogonality condition (8) characterizing Pk(z, µn) , we have, using (80) and (86) We write down some simple bounds for future reference which hold for all n = 1, 2, ... ; and j = l, 2 , .. . , m. As a result of the relationship between chord length and arc length between points on the unit circle we have (98) As a result of (95) . l 2 m do not converge in £1, due to the cos n( 8 -8j) term. We must also carefully forJ == ', ... , ' where the inequality follows since 1 -~0s 6 < 1 for all 8. Equations (102) and (103)  Equations n~ (( (,·) ) 3 5 of the last section addressed the £1 convergence of the factor i = 1 1 -u.~~. of (7 4). Lemma · n7=1 ( -1·e ' 12 m (' ( n)) 1: tor fL = 1 ,-wi" . of the present section is neither bounded, nor does it converge in £ 1 . The 1ac IT.i= • (( -e' 1) H r W e have the following result, which is one of the key ideas used in the proof Theorem oweve,

3
. 5 , the second main result of this work. . .

34
Let j --t fin .C1(dB) , and let gn be a bounded sequence with lgn (B) Letting n --t 00 gives the lemma. D We can now state and prove our second main result ; that the limit Pk(z) defined in (8 1) is the same as that found in the case of the Poisson kernel of the last section.
. 11 Lemma 3.7 to show that 1 (n, () is bounded on U't= 1 I 1 (<5) uniformly inn , and therefore We WI use h fi st integral on the right hand side of (109) is small if <5 is small. that t e r . t we rewrite 1(n, () . Fors E {1 , 2,. . ., m }, we can write f!fS , and In the sum appearing on the right hand side of the above, j is . Thus each of t he products which appear in the terms of the sum contain the factor I(eie, 1 2 . Therefore this sum can be written It follows from (112) and (109) Regarding the second term on the right hand side of (114) , we use Lemma 3.9 with (115) j=l p# j

Special Cases and Related Results
In this section we will consider the measure v (B) of Theorems 3. ·4 and 3.5 and relate these results to a result in [P] concerning the limit of the reflection coefficients of Szego polynomials, with respect to a measure whose density is the squared modulus of a polynomial, as k ---+ oo. We then consider convolution of m point masses with either the Poisson or Fejer kernel, for m = 1 and m = 2.
For m = 1 we exhibit the limit polynomial Pk characterized in Theorems 3.4 and 3.5. For the case m = 2, we factor the density ~~ of Theorems 3.4 and 3.5 as the squared modulus of a linear function. Using results in [P] we relate the modulus of the zero of this function to the distribution of the zeros of Pk-2 (z, v) ask ---+ oo. The explicit form of the limit Pk is given for a "degenerate" case. The m = 2 situation, in light of Theorems 3'.4 and 3.5 and the results in [P], is used to interpret . an example of Petersen m [Pe2]. Densities of the form (125), and the associated Szego polynomials are considered by in [P].
There, the asymptotic behavior as k-+ oo of the zeros of Pk( z, v) is studied. Suppose that (123) and (126) hold , and assume the following: 1. The vi are distinct.
2 · There is a unique v 1 of maximum modulus; without loss of generality 39 f [P] states that if v has a density that can b e factored in the form (126) on the unit Lemma o h e Assumptions 1 and 2 hold, then circle w er lim !Pk (0, v) 1 1 / k = r. n-+oo (127) t ·ons 1 and 2 are key considerations here. As discussed in [P], if they do not hold it may Assump 1 only be the case that The reflection coefficient IPk(O , v)l 1 / k is the geometric mean of the zeros of Pk( z, v) , and (127) gives information about the modulus of the zeros of Pk for large k.
It has been observed ( (Ku], [SJ) that for several different processes , including damp ed sinusoids (which we consider in Section 5.4) the zeros of polynomials used in AR estimation tend to become uniformly distributed on circles of various radii when the polynomial degree is large. In an attempt to interpret this observed phenomenon, Pakula, in (P] , defines the zero-distribution measure, t L;~= I SwJ, consisting of point masses of weight 1/k at each of the zeros , w 1 , w 2 , ... , Wk, of Pk(z, v). This is a measure on the unit disk. We have the following (Theorem 4 of (P]) Theorem 3.6 (Pakula) Suppose {1 21} holds. Then the zero distribution m easures of Pk (z, v) converge in the weak-* sens e to the uniform measure on the circle of radius r.
Recall that the main results of Sections· 3.2.3 and 3.2.4 dep end on an assumption an alogous to Assumption 1 above; the Bj of the point mass measure defined in (20) of Section 3.1 were assumed distinct. This was used in the proof of Lemma 3.2.
We now consider the zeros of g and the the limit factor Pk-m (z, v) in (75) and (106) .
(z) = bo + b 1 z and obtain the Fourier coefficients v(£) , we can solve t he system (129) If we wnte g d b In this simple case, we solve a qua dratic equation and obtain forboan l · 1 ± .j2CY(l -cosw)( l -CY) VO= · · 1 -CY + CYe-iw 'f · g that lvol < 1 leads to g(z) = y'c(z -vo) where  The case o: = 1/2, w =Jr may be considered "degenerat e" .in the sense that , from (132), we see that lg(()l 2 = 2 is constant. That is, g(z) is constant. So this situation is like the case m = 1, where we saw that all the the extraneous zeros of Pk (z, µh) approach the origin, and Theorems 3.4 and 3.5 give Since Y is constant, it has no zeros. This is reflected in the fact that v 0 is undefined in (135).
Furthermore, by Theorem 4 of [P], the zero distribution measur.es for Pk -2 (z, v) converge weak-* to the uniform measure on lzl = lvol-Now the two zeros of the limit polynomial Pk (from Theorems 3.4 and 3.5) on the unit circle will not contribute asymptotically to the zero-distribution measure for Pk. Thus it follows from Theorem 4 of [P] that The zero-distribution measures for Pk converge in the weak-* sense to the uniform measure on the circle lzl = lvol, where v 0 is given by (135).
An example considered by Petersen [Pe2] can b e seen as a special case of the above situation.
Recall the real signal then (139) By (37) and the convergence of moments of a weak-* convergent sequence, (139) gives (140) which is of the form (58). Thus the R-process , in the limiting case N = oo, can be seen as a specialization to real signals, of (58), and therefore, a special case of (32). We remark that in [JNS], a limit for the polynomials Pk(z, Xr ,N), for fixed k , as r --+ 1 and N --+ oo is neither exhibited nor characterized , as we do here in Theorem 3.4 for the case N = oo . Furthermore, that a limit, merely exists as r --+ 1 for the case N = oo is easily established by Proposition 1, which is quite simple and general. On the other hand, Petersen, in [Pel], does prove the existence of limit as both N--+ oo and r --+ 1 in a prescribed manner. The fact that the signal is real is exploited in the proof, and the limit is not characterized.
Now consider the reflection coefficients Rk, of Pk(z, Xr ,oo ) approaching infinity. Shortly, we collect tw · d" 0 1mme iate consequences of Theorem 3.4 and Lemma 5, [P] for the case where the limits are t k .
. . a en m the order N --+ oo, r --+ 1, k -+ oo. Jn [Pe2], Petersen find s an exphc1t form 44 flection coefficients, where the limits are taken in this order, for what is seen as a special for the re f them= 2 case just considered in (130). Petersen considers a signal of the form instance o 1 .
To summarize and place the example [Pe2] in the present context, observe that, with X r ,oo defined in (142) and µ r = Xr ,oo, the measure v , of Theorem 3.4, is simply a rotation of the measure (134) with a = 1/2. (Recall that we assumed, without loss of generality, that B1 = 0 and defined w :== B2 in (130) .) The measure v has density jg(()l 2 whi ch is a rotation of (132) , which can be factored as Vcl ( _ Vo 1 2.

G eral Case; a Conjecture
The en that m is arbitrary and let v of Theorems 3.4 and 3.5 have density dv = lg(() 1 2 with Suppose .
by (126). Let Rk(h) denote the kth r eflection coefficient of Pk(z, µh) , and denote , by g(z) given k) the zero-distribution measure for Pk (z, µh) . Equations (75) and (106)  Vj of maximum modulus form a set of measU:re zero in ~2 m. Thus the above statements hold for almost every choice of the signal zeros 8J, and masses CXJ · We note that the above results apply to the generalization of the example in (Pe2], considered previously, tom complex sinusoids. We wish to consider the situation where h --+ 0 and k --+ oo simultaneously. The above remarks suggest that if k--+ oo slowly enough , the measures µ(h, k) may converge. We make the following c . zmz is taken as h --+ 0 and k --+ oo in a manner to be determined.
46 mp totic behavior of reflection coefficients for measures with rational densities will be The asy . d. Section 5. We will also consider the reflection coefficients for a related m easure associated stud1e Ill ,.., is an absolutely continuous measure. Here, the "mixed" measure µh is the spectral were , measure of a sum of complex sinusoids with additive noise, where / is the density of the noise process. Jn the case of white noise, for example, / is the uniform measure on the circle. We remark that, in contrast to measures obtained by convolution of point masses with approximate identities discussed in Section 3.2, measures of the form (144) actually converge in t he strong sense; that is, in total variation norm, thus µh clearly satisfies (21). By Proposition 1, for k > m all limit points of Pk(z,µh) are of the form (24). We remark that the fact that (21) holds for a measure of mixed type says little about the existence of a limit of the associated Szego polynomials . We will comment further on this shortly. However in the present case we have  (147) 48 . 1 r (147) holds for q(z) = IIJ,;, 1 (z -ei 8 ')r(z) where r E A k-m-l · Thus, for all h > 0 we In part1cu a , Since r E Ak-m-1 is arbitrary, Q J_ Ak-m-1 with respect to the measure IIJ,;, 1 1 (ei 8 11 2 dr. By the orthogonality property (8), Q is the unique Szego polynomial; Q(z) = Pk-m(z, rr.~1 lzeie, l 2 d1).

Reflection Coefficients for Rational Densities Introduction and Statement of Main Result
In this section we consider a fixed sp ectral measure with r ational spectral density, and study the t otic behavior of the reflection coefficients, Rk = Pk (0 , µ) as k ---+ oo. This was considered in asymp Section 3.2.5 for measures with densities of the form rr;: 1 1( -wJ j 2 . Lemma 5, [P] , and T heorem 3.6 (Theorem 4 in [P]) describe asymptotic behavior of t he reflection coeffi cients and zero dist ribution measures, respectively, of Pk( z, µ) in this case. We seek to generalize these results to measures with rational densities. T he m ain result of this section is an extension of Lemma 5 in [P] regarding lim JRnl1 /n. The idea of the proof is sketched in [P]. The significance of our result is that the Jim IRnll/ n exists, and not merely the limsup IRnJ 1 fn . As a corollary we will have an extension of Theorem 3.6 to rational spectral densities.
The behavior of reflection coeffi cients is considered in [SJ and [NT], where we have the following has an analytic continuation to the disk lz l < 1/r .
We now state the main result of this section. 51 5 2 Suppos e thatµ has a rational density of the form (152) . L et R k(µ) be the kth Theorem · t . for the measure µ , as defined in (19). We assume, without loss of generality, that all reftec ion . of p(z ) and q(z) li e within th e closed unit disk, and that Wj =j:. 0 for j = 1, 2, ... ,£ . W e the zeros further assume the following : J. The zeros Wj are distin ct.
2 . There is a uniqu e Wj of maximum modulus, this being strictly less than 1: Then (154) The proof of the theorem , though elementary, is quite technical, and we will need some results about the form of the inverse of the autocorelation matrix defined in Section 2.3.2 for the measure µdefined in (152). Therefore for clarity, before proving the theorem we will sketch the main ideas.

Sketch of Proof:
Let n = k + £ and define

B(z ) :
Pk( z, µ)p( z) By the minimization char acterization, ( 4) , of Pk (z, µ) , B(z )  . ffices to show that So 1t su (158) 1 of the proof is to obtain a suitable expression for bn. In fact, we will see ( eq. (198)) that The goa we can write To show that limn-t oo lbn l 1 / n = lw1 I is then straightforward, and the theorem follows from (157).
To get an expression of the form (159) , we recast the minimum property, (156), for B, in matrix form. Define 1 0 and (162) where we suppress then-dependence for the vectors d and b , and for the matrix W. In light of (165) and the remarks of sec. 5.2.1, the condition (156), is satisfied if and only if bCnb * is a minimum, subject to Wb = d, over all vectors [1 a 1 a 2 . .. an] , where Cn is the covariance matrix for the measuredµ= lq(()l-2 dB, as defined in (11). It follows that We now study the form of the matrix C;; 1 .

The Autocorrelation Matrix
Letµ be an absolutely continuous measure on the unit circle with d1-i = j (B)dB and suppose that 0 < m :SJ (B) :S M < oo.
(164) C be the ACF matrix defined in (11) of section 2.3.2. We briefly consider an isometry between Let n e of polynomials of degree n in .C 2 (dµ) and nn+l .

the spac
To each polynomial A(z), of degree n, associate the vector of coefficients, a= [ao a 1 . .. ant so that A(z) = a 0 zn + a1zn-l + ... +an . If A(z) and B(z) are two polynomials of degree n we can express the .C 2 (dµ) inner product (see, for example, [L]) as (B) Thus, the space of polynomials of degree n in .C 2 ( dµ) is isometrically isomorphic to R n+ 1 with the inner product defined in (165), and we have llA(z)ll 2 = a*Cn a.
Using results in [GS,sec. 5.2] we find bounds for the eigenvalues of Cn with µ bounded as above. Suppose that J IA(()l 2 dB= l. Interpreting this in light of (165) , with dµ =dB and Cn as the identity matrix (or simply by direct computation), we have a * a = l. Now (164) and (165) give (166) If e is an eigenvector of Cn normalized so that e*e = 1 with corresponding eigenvalue . A. Then (167) So that m:::;.A:::;M.
Note that these bounds are independent of n.

2
Inverses for a Class of ACF Matrices 5.2.
Let q(z) :::: D7==i lz -Vj l 2 and let Cn be the ACF matrix defined in (11) of Section 2.3.2 for the measure lq(()i-2 d8. We will need some results for the form of C;; 1 . Spectral measures of the form lq(()i-2de correspond to autoregressive processes of order m. The properties of C;; 1 have been studied extensively ([K], [KVM], [Si]). Write q(z) = zm + Q1Zm-l + ... + qm, and denote, by Pn , the prediction error power defined in ( 4) Pm Note that Cn, Qn, and !ln are (n + 1) x (n + 1) matrices whose elements depend on n.
Define qf. := 0 for ~ > n and ~ < 0. We will denote the { i, j} entry of a matrix A by {A } i,j .
The Toeplitz-like character of C;; 1 is reflected in the following , due to Trench ([T]): It follows from (172) that (173) C -1 has a Toeplitz structure in its central portion. From the representation (169)  where Po := 1. Here, the first i -1 entries of the first vector on the right-hand side are zero, as are the last n + 1 -im entries. The first j -1 entries of the second vector are zero, as are the last n + 1 -jm entries. More precisely, if we define q~ = 0 for ~ < 0 and ~ > m, we have the following, due to Siddiqui [Sd], which can also be found in [K], p . 176: ( 1 75) From the form (174) and (178) Additionally, the matrix C;; 1 is persymmetric, that is, it is symmetric with respect to the principal cross-diagonal (see [K]) . This can be also be proved using ( Thus, {C,:;-1 } n+l,n+l -j is constant inn. The salient points, for our purposes, of preceding discussion be summarized as follows: C,:;-1 is a persymmetric band matrix of bandwidth 2m + 1. Each can element, { C,:;-1 }i,j , is a sum of at most m + 1 not necessarily distinct terms of the form q~q(. Also, for n > m + 1, the vectors Tj do not change with n. This is evident from the form (174) and from (177) and (178).

Reflection Coefficients Revisited
We now proceed with the proof of the main result of the section.
Proof of Theorem 5.2: We first consider the form of C,:;:-1 W*. In light of (160), for i, j > 1 we can represent the i,j entry of W * as {W*}i,j = (wj_ 1 )n-(i-l ), so that the elements of C,:;-1 W* are of the form From the observations of Section 5.2.2, the summation in any element of (182) is over at most 2m + 1 terms, each in turn consisting of at most m + 1 terms of the form Piqjqk,. Moreover, the elements of C,:;-1 W *, considered as polynomials in the qJ and qj, have produ cts of the Pj and powers of the wj as coefficients, and as n increases it is only the powers of the wj that may ch ange.
Indeed, from the form of c; 1 and inspection of the the matrix w' for i and j fixed , the powers of w{ appearing as coeffi cients increase with n . Since each of the w{ have modulus less than 1, each {C; 1 W *}i,J is the partial sum of a convergent geometric sequence.
On the other hand, if i is fixed, then for n > i + m the last i rows of C,:;-1 W* are constant inn.
In particular , by (176), {C; 1 } n+ l ,J = O for j ::; n+ 1-m, so that the terms in each of the sums in the last row of (182) are non-zero only for j > n + 1 -m. We can thus express the last row of 57 where Ti( Wj, q) is a polynomial in the elements of q whose coefficients are powers of wj.
Note that the sums in the first column are from i = 2 tom+ 1 since, by (176)

t •t
Of course, it would also follow that the limit matrix is invertible. We write (W*t )*C;:;-1 (W*t) t *WW *t (W* t )*(W*t) t •t Rayleigh-Ritz and (188) applied to the first factor on the right-hand side of (189) yield (W* t) *C;:;-1 (W*t) 1 Thus, (188) will be proved if we show that there exists a positive constant E such that t *WW* t ----> E for all t and all n.
Again, using Rayleigh-Ritz, (190) will hold if all eigen values of WW * are bounded away from zero, or, equivalently, if limn_, 00 WW* is invertible.
This is similar to the situation addressed in [P]. Note that WW* is positive definite and Hermitian. We see that 1 0 0 0 In general, the elements ui~/ can be expressed as u:-.I = r(i,j) 196) where 7 ( i, j) is a sum of determinants of principal minors of U , so the r ( i, j) likewise converge to polynomials in the elements of q.
Let 8j denote the jth element of the last row of C;; 1 W * , with 8 1 defined in (195). Using (183) and (196)  where "'i -I is again a polynomial in the elements of q. Note that with this not ation, the indices on the "'~ run from 1 to £. With b and d defined in (162) and (161 ), (163) where r i = /i -"'i · By the Claim 1, det U is bounded away from zero, t hus it converges to a positive constant. The f; converge to polynomials in the elements of q.
Recall the zero-distribution measure of Pk(z, µ),defined in the remarks preceding Theorem 3.6 , which assigns mass l/k at each of the zeros of Pk(z ,µ). From Theorems 5.2 and 3.6 follows

A Signal Consisting of Damped Sinusoids
We now consider a signal consisting of damped sinusoids . Let where lv1I < 1 and o:. 1 are complex. We can write o:. 1 = a 1 eiw, and v 1 = p 1 eie', for j = 1, 2, .. . , K, where the aJ are real amplitudes, PJ < 1 are damping factors, and the BJ and WJ are frequencies and Phases, respectively. Several methods exist for exactly determining amplitudes, damping factors, frequencies, and phases for the signal (200) given at least K of the X n (see [K], p. 224). Modern techniques such as the covariance method are variations of Prony's method, which dates to 1795.
Several methods have been investigated for determining signal zeros for damped sinusoids when noise is present ( (KTl,KT2]). Our interest here is in the mathematical properties of the Szego polynomials, and to illuminate the behavior of AR techniques, rather than to propose alternate means of determining signal zeros in applications.
We consider a related sequence of measures derived from the signal (200)  The reason for not normalizing is that we wish to consider the behavior of µ N as N -+ oo.
Since lvJI < 1, XN is bounded in N. On the other hand, (1/ N)µN -+ 0. More precisely, from where p(z) = rrf=~1 ( ( -Wj) and q(z) = ITf=1 ( ( -Vj). Thus , the asymptotic situation is the same as that considered in Sec. 5. In particular , if there is a unique Wj of maximum modulus , which is less than 1, Theorem 5.2 holds, and by Corollary 5.1, the zero distribution measures of Pk (z, µ N) converge in the weak-* sense to the uniform measure on lzl = r . We make the following conjecture:

Conjecture:
Let Xn and µN be defined in (200) , (201), and (202), and assume that the VJ and Wj in (206) are distinct and less than one in modulus, with 1 > r = lw1 I > Jw1 I for j = 2, 3, .. . , K -1. Th en as N and k approach infinity in a manner to be det ermined, the zero distribution measures of Pk(z, µ N ) converge in the weak-* sense to the uniform m easure on lzl = r.
Note that the conjecture is true if we first let N ---+ oo while holding n constant.

Convergence of zeros of Pn(z , µ)
We now consider convergence of the zeros of Pn( z, µ) to sign al zeros Vj . Suppose tha t the hypotheses of the the above conjecture hold, and that lvJ I > r for j = 1, 2, ... , d, and lvj I :::; r for j = d + 1, .. . , K. We will expand upon the remarks on p . 575, (PJ, and show that that as n ---t oo the d largest zeros of Pn(z, µ) converge to the Vj for j = 1, 2, .. . , d.
By the maximum modulus principle the above bound holds for all lzl :::; p ; that is l<Pn(z)I < (L + E)pn fori z l :::; p and n > M .
The limit function is analytic, and, by (211) , is the analytic continuation of 1/ g(O)g(z).
We further restrict p so that only the d smallest zeros of 1/ g(O)g (z) fall inside iz l < p. Since kn </>~( z )-+ 1/ g(O)g(z) uniformly on iz l:::; p, for large n kn<f>~( z ) will have no zeros on iz l = p. By Hurwitz' Theorem, kn<f>~(z ) has exactly d zeros inside izl < p for large n . Since the zeros of <f>* (z) are those of <f >(z) reflected in the unit circle, letting p-+ 1/ r yields the following: <f>n (z) has exactly d zeros in iz l > r; these approach v1, ... , vd .
In the context of estimating the signal zeros, v 1 , via autocorrelation method, the preceding remarks suggest that for large N, the largest zeros of Pn (z, µN) will b e close to those v 1 which lie outside iz l = r = maxw1 in (206). Since the w 1 depend on the a 1 , and thus upon the relative phases w 1 , in (200) , it would be of interest to determine the exact nature of the dependence of the WJ upon the phases, w 1 . For example, what conditions on the phases guarantee that one or more of the V j lie outside r ? There do not seem to be general results in this area, but the following corollary of Lucas' Theorem can be found in [MJ.