Bounds and Algorithms for Subspace Estimation on Riemannian Quotient Submanifolds

Subspace estimation appears in a wide variety of signal processing applications such as radar, communications, and underwater acoustics, often as a prelude to high resolution parameter estimation. As with any estimation problem the availability of statistical benchmarks on estimator accuracy is key to developing and understanding algorithm performance. The parameter space in general subspace/basis estimation problems is naturally described as a Riemannian quotient manifold. The concept of a manifold is central to many parts of geometry and modern mathematical physics because it allows more complicated structures to be described and understood in terms of the properties of Euclidean spaces. This identification permits the well-developed tools of differential geometry to be brought to bear on the analysis of subspace/basis estimation problems. Classical Cramér-Rao bounds (CRB), originally formulated for standard Euclidean spaces, have recently been generalized to Riemannian quotient manifolds and the particular case of the standard real linear statistical model with the Grassmann manifold (set of subspaces) as the parameter space has been analyzed. The present works applies this differential geometric approach to the analysis of the complex linear model estimation problem. We consider both the general unconstrained signal model and, most importantly, the parametrically constrained signal model. First, we show that the appropriate parameter space for the most general unconstrained signal model is a modified Stiefel manifold, termed here the Basis manifold. Elements of the Basis manifold are semi-unitary matrices grouped into equivalence classes dictated by the model invariances. Using this formulation we derive the full Fisher Information Matrix (FIM) and, from this, intrinsic CRBs on the both subspace and rotation estimation accuracy. Among the corollaries that flow from this Basis manifold formulation is a CRB on the individual columns of the eigen vector matrix, say Y (anN×p semi-unitary matrix), of the true signal covariance. The eigen decomposition of the sample covariance matrix (SCM) yields estimate of Y ; the analytical expression for the corresponding estimation error covariance is well known from statistics. We show the relationship between this existing expression and the CRB developed here using the geometric formulation. Secondly, we consider situation when the signal matrix is constrained, and in particular, the important special case when the constraint arises due to some underlying parametric model. Here the set of semi-unitary matrices representing the signal matrix are constrained to a submanifold of the Basis manifold. In this geometric approach these underlying parameters are viewed as a set of natural coordinates on the submanifold; the set of values that describe the relatedN×pmatrix representation is an alternate (extrinsic) coordinate representation. Using this differential geometric framework, we derive the FIM with respect to a set of normal coordinates, and the resultant intrinsic CRB on subspace and rotational estimation accuracies for this constrained model. The relationship between this new bound on constrained subspace estimation accuracy and the existing bounds developed for system parameters (i.e., natural coordinates) is established and its relevance to the estimation problem is explored in detail. The CRB development for the constrained model naturally suggests an asymptotically ML estimation approach that leverages the estimate given by the standard EVD of the SCM (ML for the unconstrained signal matrix assumption). This estimate, referenced to an algorithmically convenient extrinsic coordinate set, may be transformed to the natural coordinates defined by the underlying parametric model. By the maximum likelihood (ML) invariance principle, this estimation approach yields an asymptotically ML estimator of the desired parameters. An efficient implementation of this general theoretical approach is developed for the important special case of the uniform multi-dimensional array and complex exponential waveform model. Utilizing the shift-invariant properties that define the constraint in this setting, we derive a closed-formed estimator of the signal matrix Y . This new estimation approach is applied to the challenging 2-D Direction-of-Arrival (DOA) estimation problem and a set specific scenarios drawn from the literature are evaluated to demonstrate performance.


Introduction
Subspace and basis estimation appear in a wide variety of signal processing applications such as radar, communications, and underwater acoustics [1], either as an end in themselves or as a prelude to high resolution parameter estimation. As with any estimation problem the availability of statistical benchmarks on estimator accuracy is key to developing and understanding algorithm performance. The parameter space in general subspace/basis estimation problems is naturally described as a Riemannian quotient manifold. The concept of a manifold is central to many parts of geometry and modern mathematical physics because it allows more complicated structures to be analyzed in terms of the relatively well-understood properties of Euclidean spaces. This identification permits the well-developed tools of differential geometry to be brought to bear on the analysis of estimation problems on manifolds. Cramér-Rao bounds (CRB) have recently been generalized to Riemannian quotient manifolds and the particular case of the standard real linear statistical model with the Grassmann manifold (set of subspaces) as the problem parameter space has been analyzed [2]. Here we consider the complex linear model using a modified Stiefel manifold as the parameter space. In particular, we consider the case when the parameter space is constrained and develop intrinsic CRBs and estimation algorithms for this important model.
A vast number of algorithms for estimating unknown signal parameters from the measured output of an N -element sensor array have appeared in the literature. This problem is posed using the complex linear model with the N × p signal matrix 1 , H(θ), constrained by its parametric description. There are a variety of so-called subspace based estimation methods that employ a two-step procedure consisting of a subspace 1 N sensors or array elements and p signals estimation step followed by a parameter extraction step (see figure 1). An estimate in the form of a semi-unitary N ×p matrix, sayŶ , is formed from the K-snapshot measurement data matrix, say Z = z 1 · · · z K , and the p-tupleθ, is then extracted fromŶ in the second step. Standard algorithms (e.g., R-D Unitary ESPRIT) for the important and commonly encountered special case of uniform arrays use the p-dominant eigenvectors of the sample covariance matrix (SCM) as the subspace estimate. This is the maximum likelihood (ML) estimate of Y for the unconstrained signal matrix model, but is not ML when constraints are present, and as a result, standard subspace-based methods are nonoptimal. More recent algorithms such as structured least squares (SLS) [3] make partial use of the constraint information in forming the subspace estimate and therefore yield improved parameter estimators, but remain sub-optimal and perform poorly in certain challenging scenarios (e.g., highly correlated sources).
Formulating the problem with a constrained variation of the Steifel manifold as the parameter space we develop an intrinsic CRB on the subspace estimation step. Using the geometric insights that flow from this formulation we develop a closed-form, asymptotically ML, subspace estimator applicable to uniform arrays. Using this new constrained subspace estimator an improved subspace-based parameter estimation algorithm is achieved. This introductory chapter gives a brief outline of the general approach and summarizes some key results.

Differential Geometry and its Relevance to Subspace Estimation 1.Archetypal Example: 2-Sphere
The archetypal example of a Riemannian manifold is the surface of a ball, the 2sphere denoted S 2 , embedded in the familiar 3-dimensional Euclidean space 2 denoted here as E. The defining contribution to the theory of such surfaces was made by Gauss Figure 1. Subspace Based Estimation Form [4] in two papers written in the early nineteenth century. The approach developed there "marked a new departure from tradition because for the first time Gauss considered the intrinsic geometry of a surface, the properties which are determined only by the geodesic distances between points on the surface independently of the particular way in which the surface is located in the ambient Euclidean space" [5]. This point of view was extended to higher-dimensional spaces by Riemann and led to what is known today as differential geometry. The most important general concepts within this broad subject relevant to our discussion are: manifolds, coordinates, tangent spaces, metrics, exponential map, geodesics, and covariant derivative.
While modern differential geometry generally prefers coordinate free approaches, calculations require numerical representations, rather than abstract ones, and coordinates on the manifold fulfill this requirement. A coordinate system assigns a real Ntuple of values to each point of manifold. A particular system, say x, is a mapping x : M → R N (perhaps defined only locally). The N -tuple given by x = x(P) ≡ (x 1 (P), · · · , x N (P)); P ∈ M (1) are the numerical coordinates of the point P with respect this particular coordinate sys-tem 3 . This identification enables calculus on manifolds [2]. There are, of course, an infinite variety of ways to label points on manifolds. Points on the sphere, S 2 ⊂ E, for example, may be labeled by their 3-space Cartesian coordinates (e.g.; Earth-Center-Earth Fixed (ECEF) coordinates), or by their latitude and longitude. The first set will be called an extrinsic set since it references points on S 2 to the ambient space and thereby uses 3 numbers to label a point on a 2-dimensional surface. Coordinates are chosen for their convenience to the particular task at hand. In an estimation application, a position estimator may chose the set of coordinates that is algorithmically convenient and results established in one coordinate set may be transformed to an alternate set, if desired, provided the transformation is known.
Tangent spaces, vector spaces notionally attached to each point P of a manifold M, play a prominent role in the geometric development; at the base point P ∈ M the tangent space is denoted T P M. In the standard 3-dimensional space Euclidean space, for example, the set of all displacement vectors at the arbitrary point P, defines a 3-dimensional vector space, denoted T P E. The length of a displacement vector between two points, say ∆ = P 1 − P 0 , defines the Euclidean distance between points, dist E (P 1 , P 0 ) = ∆ . For the sphere, M ≡ S 2 , the plane tangent to the north pole, say P 0 , is a particular instance of a tangent space on the sphere denoted T P 0 S 2 ; the plane tangent to a given point on the equator, say P 1 , is another denoted T P 1 S 2 . The space T P 0 S 2 , for example, is evidently a 2-dimensional vector subspace of the 3-dimensional space T P 0 E. The tangent spaces of the sphere inherit the Euclidean inner product of the ambient space via restriction and thereby become inner product spaces. This process yields a smoothly varying inner product on the tangent spaces, termed a Riemannian metric. In general, a manifold endowed with such a metric is a Riemannian manifold.
The distance between any two points on such manifolds is well defined and the dis- 3 The common convention is to use the same symbol to denote both the mapping and the resultant set of values, that is x = x(P). tance minimizing curves between points are termed geodesics. The geodesic or Euler-Lagrange equation 4 is a vector differential equation generalizing theẍ = 0 that defines geodesics (straight lines) in a Euclidean space. The geodesic equation on S 2 referenced to a Cartesian coordinate system, y, with origin at the center of the sphere is given bÿ y + (ẏ tẏ ) y = 0. The geodesic emanating from the point P 0 , with coordinates y 0 , in the direction of the unit tangent vector ∆ ∈ T y 0 S 2 , may be expressed in coordinates as the exponential mapping y(t) = exp y 0 (∆ · t) 5 . The running length along this geodesic curve is dist S 2 (P(t), P 0 ) = ∆ t , so that, just as in the Euclidean case, the distance between points is given by the magnitude of this connecting tangent vector. The geodesic distance, an intrinsic measure or one that is independent of the choice of coordinates, is a useful estimation error measure allowing estimators that may use different coordinates to be evaluated with the same "yardstick".

Special Case Unitary Group
The sphere example, S 2 ⊂ E, notionally generalizes to the orthogonal group O(N ), The geodesic equation for U(N ) is the matrix differential equationẌ = X(Ẋ HẊ ), where X is N × N unitary. Direct substitution verifies that this non-linear differential equation has the general solution for W arbitrary anti-Hermitian, X 0 arbitrary unitary matrix, and expm the matrix expo-4 derived using the Calculus of Variations 5 This is the solution to the geodesic equation with initial conditions y(0) = y 0 andẏ(0) = ∆ and is a great circle. The precise meaning of this notation, exp Y0 (∆t) is given in section 2.1 nential operator. The short hand notation for this mapping (2) with ∆ ≡ X 0 W is exp X 0 (t∆) ≡ X 0 expm (tW ) .
In this expression ∆ ∈ T X 0 U(N ) is a unit tangent vector at X 0 represented as an N × N complex matrix. Thus, the exponential map describes geodesics and the curve (3)  The complex Stiefel manifold, St (N,p) , is defined via the equivalence relation that identifies X 1 , X 2 ∈ U(N ) as equivalent if the first p columns of the N × N matrix X 1 are identical to those of X 2 . The equivalence subset denoted X 1 , a fiber submanifold of U(N ), is a "point" of the Stiefel manifold. Such quotient manifolds inherit their geometric properties from U(N ) by restriction of geodesics to the quotient manifold tangent space (termed the horizontal space).
We introduce a closely related quotient manifold on U(N ) termed here the Basis manifold and denoted as B N,p (or simply B when N and p are understood). In estimation applications, the statistical model dictates the appropriate equivalence relation and, for the complex linear model considered here, this leads us to the Basis manifold. The tangent spaces of B naturally partition as T Y B ≡ A Y ⊕B Y . A geodesic emanating in the direction of a tangent vector in the B Y space represents a subspace perturbation, while motion in the A Y directions represents a rotation. This partitioning figures prominently in the formulation of intrinsic bounds on subspace and rotational estimation accuracy.
Since the tangent spaces of the Basis manifold are naturally partitioned, the geodesic distance is likewise partitioned as dist B (P 0 , P 1 ) = P B ∆ 2 + P A ∆ 2 .
In this expression ∆ ≡ exp −1 P 0 P 1 is the connecting tangent vector; P A and P B are the projectors onto A Y and B Y , respectively. A standard measure of the difference between two subspaces, represented say by Y 1 and Y 0 , are the subspace angles, which generalize the notion of angles between vectors 6 . The root sum square of these angles is the subspace distance denoted distSS(Y 1 , Y 0 ). In the Grassman manifold, G, formulation this corresponds to the geodesic distance between the two points, that is . For the Basis and Stiefel manifold formulations, the situation is more complicated. We show that for these formulations the subspace distance is equal to the minimum geodesic distance between the point Y 0 and an invariant submanifold over Y 1 . This relationship enables the development of an intrinsic CRB on subspace estimation accuracy for these parameter spaces.

Estimation on Manifolds
In estimation theory and statistics, the CRB relates the covariance matrix of estimators to the Fisher Information Matrix (FIM) of an estimation problem through matrix inequalities. In the classical case of a vector parameter θ ∈ R p in the distribution f (z|θ) this inequality is where C is the estimator error covariance C = E[(θ−θ)(θ−θ) T ] , F is the FIM given by F i,j ≡ E ∂L ∂θ i , ∂L ∂θ j ; i, j = · · · N , L = ln (f ), andθ =θ (z) is an unbiased estimator of θ based on the measurement z. The matrix inequality means that the difference C − F −1 is positive semi-definite.
The derivation of this inequality implicitly relies on the assumption that the parameter space is a Euclidean space. For a more general Riemannian manifold as the parameter space, the role of the error vectorθ − θ is played by the error tangent vector 6  appearing in the CRB inequality, so that in this setting (4) becomes [2] C F −1 + curvature terms .
At high SNR, estimation errors are small and these additional terms become negligible. The CRB, therefore, depends asymptotically only on the FIM, just as with classical bounds. While the components of C and F depend on the particular choice of coordinates, the error variance trC and bound trF −1 do not, provided the metric is properly accounted for. When the parameter space is the unit sphere trC represents the 2-D mean square positional estimation error measured in radians and trF −1 bounds this error.

Complex Linear Model: General
The complex linear model has received a huge amount of attention due to its simplicity and relevance in a large number of applications. Consider the model z = Hn 1 + n 0 where n 0 is a normal random complex N -tuple n 0 ∼ CN p (0, σ 2 I), and n 1 is a normal random complex p-tuple, n 1 ∼ CN p (0, P ). The observation z then is a normal random N -tuple, z ∼ CN n (0, R), where R = E[zz H ] = R s + σ 2 I with signal covariance R s defined by R s (H, P ) ≡ HP H H .
In this expression P is an unknown deterministic full rank Hermitian matrix and represents the correlation matrix of the source symbols n 1 . The complex N × p signal matrix H ∈ C N ×p is a deterministic unknown of full rank. We refer to this case as unconstrained since there are no constraints other than rank of H and the positive-definite Hermitian form of P . This data model is most relevant when we are unable or unwilling to make more detailed assumptions regarding the signal matrix H.
Sampling of the distribution reveals information about the unknown parameters, P and H, and the observable information is subject to estimation. It is readily seen from (6) that this model precludes the simultaneous and unambiguous estimation of these unknowns. To see this, note that for any invertible T , substituting H = HT −1 and n 1 = T n 1 into (6) leaves the measurement unchanged. Thus, both the pairs (H, P ) and (HT −1 , T P T H ) define the same measurement distribution. As noted in [6] "whenever two parameters give rise to the same measurement distribution they are indistinguishable in the sense that no argument based on the observed measurement can be used to favor one parameter over the other as estimator".
An alternate parameterization of the problem is provided by the eigen decomposition of the signal covariance R s (H, P ), (7). The eigen decomposition defines the mapping or transformation In this expression P p is the set all p × p positive-definate Hermitian matrices and D p ⊂ P p is the subset of positive-definite diagonal matrices.
The transformed problem in terms of this alternate parameterization is where n 1 ∼ N p (0, Λ) . The proper choice for the space B is dictated by the statistical model. As discussed in chapter 3, the Stiefel manifold is "too big" to serve as the proper parameter space for this model while the Grassmann manifold is "too small". The given model dictates a quotient manifold formulation in which multiplication by unitary diagonal matrices have been "quotiented" out of the Stiefel manifold. The quotient manifold with this desired feature is called here the Basis manifold, B, and defining the domain for Y in (10) as B yields a well-posed estimation problem.
We derive the full FIM for the estimation problem (10) on the product manifold, B × D p , and, from this, intrinsic CRBs on subspace estimation accuracy. Among the corollaries that flow from this Basis manifold formulation is a CRB on eigenvector (interpreted as the columns on Y ) estimation accuracy for the complex linear model. For the unconstrained case the maximum likelihood (ML) estimate of Y ∈ B is given by the p-dominant eigenvectors of the SCM. The corresponding estimation error covariance is well-known from statistics [7] and is often used to develop analytical expressions for estimator performance [8], [9], [10], [11]. The relationship between this error covariance expression and the CRB developed here is established in chapter 3.

Complex Linear Model: Constrained
When the set of allowable signal matrices, H in (6), is constrained in some way the set of allowable semi-unitary matrices Y in (10) is likewise constrained to submanifold denotedB ⊂ B. We derive the general form for the intrinsic CRB on constrained subspace estimation accuracy onB and then consider the special case when the constraint definingB arises due to some underlying parameterization, that is when H = H(θ). In this parametrically constrained case the eigen decomposition of the true signal covariance effects a transformation, denoted here as [Y, Λ] = Φ(θ, P ) , between the (θ, P ) and the (Y ∈B, Λ ∈ D p ) parameterizations. This transformation is [Y, Λ] = eig(H(θ)P H H (θ)) .
The parameters (θ, P ) and (Y, Λ) are then two different sets of coordinates on the man-ifoldB × D p . This is analogous to using either latitude-longitude or 3-space Cartesian coordinates to label points on S 2 ∈ E.
The tangents to the θ coordinate curves yield a natural coordinate basis for the tangent spaces at each point ofB. From this an orthogonal basis may be formed and via the exponential map these new basis vectors yield a set of so-called normal coordinates, denoted here as (α, s, λ) about any given point. This set is a third parameterization of L, or equivalently, a third set of coordinates on the product manifoldB × D p ; the interrelationships between these coordinate choices is shown in figure 3.
We compute the general form for the constrained FIM in these partitioned normal coordinates and the corresponding intrinsic bounds on (constrained) subspace estimation accuracy. The coordinate Jacobian transforms the FIM in a tensorial way; bounds (and estimates) are likewise transformed. Quantities computed in one coordinate system, which may be more amenable to computation, may be transformed to an alternate system, which may have a more meaningful interpretation. In particular, the FIM with respect to the natural coordinates, F θθ and the subspace (normal) coordinates, F ss , are related via the Jacobean of the coordinate transformation as indicated in figure (3). As noted in [12] "because the Jacobian determines how coordinate transformations affect accuracy bounds, it is sometimes called a sensitivity matrix." In a variety of applications the signal matrix (6) has a multi-dimensional modal form, that is H(θ) = h(θ 1 ), · · · h(θ N ) and the waveform h(:) is either a damped or undamped exponential. The FIM for this model, F θθ , was derived in [13] based on Figure 3. Coordinates on Product Manifold a maximum likelihood argument that ultimately relied on the eigenvector covariance result of [7]. A more direct "textbook" derivation of parameter space FIM, F θθ and the associated stochastic CRB was later developed in [14]. Alternatively, F θθ may be found from subspace block of the full FIM, F ss , and the coordinate Jacobean. The geometric approach thus yields another derivation of parameter space FIM, F θθ .

Maximum Likelihood Estimator
Typically, estimates of the underlying parameters, θ, are required rather than the semi-unitary matrix that they define, that is, Y (θ) via (11). The parameter estimation step in two-step subspace based methods is related to the various submanifold coordinates as follows. The inverse of the eigen decomposition (11) defines a mapping from (Y, Λ) to (θ, P ) coordinates and has the general form In particular, that is, θ, depends only on Y ∈B. Conceptually, a general parameter estimation/extraction algorithm is generated by extending the domain of the function ϕ (13), initially defined onB, to parent manifold B. IfỸ ≡Ỹ (Z) with the range space B, is some estimator of Y ∈B, then the corresponding parameter estimator in this generic subspace based approach isθ where the mapping ϕ, (13), is here interpreted as having the extended domain B. For harmonic signals on regular array geometries R-D Unitary ESPRIT [3] is an example implementation of such a general subspace-based estimation program. In this approach the subspace estimateỸ is the p-dominant eigenvector matrix of the SCM as computed by the EVD. This subspace estimate,Ỹ , although an ML estimate for the unconstrained assumption, B, is not ML when constraints exist and Y ∈B. The ML invariance principle ([15]) which states that the ML property is preserved under transformations, is therefore not applicable.
To improve subspace-based parameter estimation it is therefore desirable to find an ML estimator of Y ∈B ⊂ B, or failing that, one that more closely approximates to the constrained subspace CRB than does standard EVD estimateỸ . For the onedimensional uniform line array (ULA) problem the SVD of the foward-backward data matrix yields an improved subspace estimate. This approach, which makes partial use of the constraint information, is extended to the multidimensional case in the Higher-Order SVD (HO-SVD) [16] approach. More computationally intensive approaches like Structured Least Squares (SLS) [3] and Improved SLS (I-SLS) [17] make further implicit, but still incomplete, use of theB submanifold constraint information. As a result, although they yield lower variance subspace estimators compared to the standard EVD, they are not ML estimators of Y ∈B. The associated subspace-based parameter estimates are therefore degraded to varying degrees in stressful scenarios such as that of closely spaced sources and/or highly correlated sources [12].
In parameter estimation applications, a standard approach to achieving MLE performance is to use a sub-optimal estimator (e.g., R-D Unitary ESPRIT) to initialize a Newton's method approach that uses the full signal model. The success of this type of approach will depend, in part, on the quality of the initial estimate. As an alternative, we introduce here a subspace-based estimation algorithm for Y ∈B that is derived from the ML criterion for the damped exponential signal assumption. The approach leveragesỸ , the ML estimate for the unconstrained problem, and is notionally implemented using a set of normal coordinates on B with origin atỸ . The overall approach may be viewed as an implementation of a geometric Newton's method onB that avoids the need for explicit initialization by incorporating the submanifold constraint condition (i.e., shift-invariance) in a more integrated way. The algorithm developed here is closely related to the approach first proposed in [18]. Given this ML estimator of Y ∈B damped the corresponding ML parameter estimator (which here consists of angle and damping parameters) is found via the transformation (14).
Different signal waveform assumptions yield different submanifolds of B. Denoting the submanifold corresponding to the damped and undamped signal assumptions as B damped andB harmonic , respectively, we have the set relation As a practical application, we apply the asymptotically ML estimator developed here for theB damped assumption to problem of DOA-estimation for which theB harmonic submanifold applies. This new subspace-based estimation approach to the DOA problem is discussed in detail in chapter 5 and simulation results are shown for a variety of stressful scenarios, in which other methods are severely degraded, to demonstrate asymptotic efficiency.

Organization
Chapter 2 provides the required background in differential geometry and related topics required to develop the CRB for the Basis manifold formulation of the complex linear model. The chapter begins with brief discussion of the geometry of the sphere using notation consistent with later developments. The discussion of the Basis manifold is a straight forward extension of [19] to the complex case. Readers familiar with differential geometry and/or with [19], which gives a differential geometric foundation for numerical linear algebra, may wish to skim this chapter. In chapter 3 the complex linear statistical model is analyzed using this foundation and intrinsic CRBs are developed for rotational and subspace estimation accuracy for unconstrained and constrained signal models. A connection is made between the full inverse FIM for the unconstrained problem and the distribution of the eigenvectors of a sample covariance matrix given in [7].
Intrinsic bounds are developed for the constrained submanifold case and their relation to existing parameter bounds available for particular signal models is examined.
In chapter 4 a new subspace estimation algorithm is developed based on this differential geometric formulation for an important special case arising in array processing applications. In chapter 5 various estimation packages are compared to these intrinsic bounds for a variety of stressful scenarios. The highly correlated signal case often presented in the literature as an algorithm stress test is examined in detail.
[11] F. Roemer and M. Haardt, "Performance analysis of esprit-type algorithms for strictly non-circular sources using structured least squares," in Computational Advances in Multi-Sensor Adaptive Processing, vol. 56, 2013.

CHAPTER 2
Differential Geometry for Signal Processing Estimation Applications

Geometry of the Sphere
A differentiable manifold is one of the most fundamental concepts in mathematics and physics. We begin by considering the familiar 3-dimensional Euclidean space denoted as E. We take the many properties of this space as given including its inner product space structure and related norm. Each point P ∈ E may be assigned a real 3-tuple Y = (y 1 , y 2 , y 3 ) that are said to be the Cartesian coordinates of P. The 3-tuple Y is an element of the real 3-space R 3 = R × R × R. The assignment is a mapping, denoted here by the symbol y, from the manifold E to R 3 or Y = (y 1 , y 2 , y 3 ) = y(P), P ∈ E .
This assignment process, or labeling of the point P, is not unique and there a number of common coordinates systems (e.g., spherical,affine) that are often used.
The space E is also an inner product space (a vector space with a inner product). If v and w are vectors represented in an orthonormal basis e 1 , e 2 , e 3 as v = 3 k=1 e k v k and The associated norm is The pair (E, , ) provides the simplest example of a Riemannian manifold. If P and O are two points in E and v is the vector with tail at O and head at P, then the distance between these points, denoted d(P, O), is given by d(P, O) = v . In a different set of Cartesian coordinates obtained by translating or rotating the first the set, the 3-tuples representing both P and O will of course change. The distance d(O, P), however, will remain fixed independent of the choice of Cartesian coordinate system.
In general, a Riemannian manifold, (M, g P ), is roughly defined as a smooth manifold M, equipped with an inner product or metric function g P on the tangent space T P M at each point P that varies smoothly from point to point. For the Euclidean space example g P ≡ , .
Smooth closed subsets of E inherit the metric function , and are thereby also Riemannian manifolds. The important example considered here is the usual 2-sphere.
The approach in this section will be to discuss the geometry of the sphere "over notating" things and employing the notation used later when we consider signal processing manifolds. This approach makes things more cumbersome for the sphere, but makes the transition to signal processing applications more natural.
We consider a sphere as a set embedded in E. The set of all points equi-distant from some fixed point, say O, in this 3-dimensional point defines the 2 dimensional sphere S 2 . In particular, the unit sphere is the subset Let y denote a Cartesian coordinate system with origin at O ∈ E, that is y(O) = (0, 0, 0), and let the coordinates of P under y be denoted Y , that is Then, expressed in these coordinates, S 2 is defined as

Tangent Plane
The collection of all vectors with origin or "head" at the point P 0 ∈ E is denoted T P 0 E. This is a 3-dimensional inner product space with the usual Euclidean space inner product (19). The tangent plane to the sphere at a point P 0 is illustrated in figure 1. This 2 dimensional vector space is denoted as T P 0 (S 2 ) and is, evidently, a subspace of T P 0 E.
If P 0 is the north pole, then the Cartesian coordinates, Y 0 ≡ y(P 0 ), are and the submanifold tangent space T P 0 (S 2 ) is spanned by the two vectors An arbitrary tangent vector ∆ ∈ T P 0 (S 2 ) expands as If we define Y 0⊥ as this ∆ may alternatively be represented in numerical form as where B = β 1 β 2 t (this seemingly awkward representation of the vector ∆ is used extensively in our later signal processing examples). Note that the defining requirement is the orthogonal to Y 0 , so that it may more generally be expressed as for θ arbitrary.
The dimension of the tangent space is equal to the number of linearly independent basis vectors required to span the space, or equivalently here the degrees of freedom in B, so that we see dim(T P 0 (S 2 )) = 2. While trivial here, this approach is important in determining the dimension of the various tangent spaces encountered in later signal processing examples.
The orthogonal complement of T P 0 (S 2 ) is the normal space, denoted N P 0 , yielding the decomposition For an arbitrary ∆ ∈ T P 0 (E) we have P ⊥ ∆ ∈ T P 0 (S 2 ), where P ⊥ denotes the orthogonal projection operator unto T P 0 (S 2 ). The metric on the tangent space T P 0 (S 2 ) is simply the restriction of the ambient space metric to this tangent plane, that is Given this metric the pair The standard basis for T P 0 (S 2 ) given an arbitrary valid choice for Y 0⊥ is defined by This choice of B 1 , B 2 holds for every P ∈ S 2 in the sense that for Y = y(P) and In such an orthonormal basis, the components of (25) are β k = b k , ∆ . This sum has the concrete numerical form This tangent vector may also be expressed as where U and σ represent the pointing direction and length of ∆, respectively. The unit vector U is U = Y 0⊥ U 1 for some 2-tuple U 1 . In later applications, the tangent vectors are represented in N ×p matrix form and the above will correspond to the compact SVD of ∆ (that is ∆ = U ΣV H ) .

Tangents to curves
The tangent space at a point is alternately described by the collection of tangents to all curves passing through the point. If Y (t) is a smooth curve on the sphere then at every point Differentiating, and using the notation ∆ =Ẏ , yields At the point Y 0 = Y (0) on this curve the above equation is satisfied by choosing where B is arbitrary real 2-tuple and where Y t 0 Y 0⊥ = 0.

Geodesics and Great Circles
A geodesic on a manifold is defined as the shortest path between two given points.
For the Euclidean space geodesics are straightlines which may be specified by the fa- This is the Euler-Lagrange or geodesic equation for the flat Euclidean space, represented in Cartesian coordinates, and has the general solution The analogous geodesic equation in the general case is derived rigorously using the Calculus of Variations. For the sphere a simplified approach that leads to the geodesic equation is differentiating the defining relation (36) a second time. This approach yields Direct substitution verifies this is equivalent tö This geodesic equation has the general solution alternately expressed in matrix form as where Y 0 , U , σ specify initial location, launch direction and speed, respectively. This is referred to as the CS form of the geodesic. To verify this solution, note that the first and that the 2nd derivative is The factorẎ tẎ , the squared speed, is the constanṫ Substituting these results into (40) verifies the solution.
Since Y is the radial vector from the origin to the surface of the sphere, the vector at the point Y extending in this direction is normal to the surface. From (45) we observe that the accelerationŸ is normal to the surface at every point on the geodesic and that the tangential acceleration is zero. In the case of the sphere, acceleration for uniform motion on a great circle is directed radially and therefore normal to the surface; therefore, great circles are geodesics on the sphere.
Initial Value Problem: The initial value problem (IVP) is specified by the differential equation (41) together with the initial conditions In these expressions U is a unit tangent vector at Y 0 , that is U ∈ T Y 0 S 2 , and σ is the scalar speed. This IVP has the solution (42) where the parameters U ,Y 0 , and σ are specified as initial conditions. For unit speed (42) is said to be the geodesic emanating from the point P 0 in the direction of U ∈ T P 0 S 2 .
Example: IVP at the North Pole : If P 0 is the north pole in the given y coordinates , then an arbitrary tangent vector ∆ = σU in T P 0 S 2 has the form where γ specifies direction in the tangent space; σ is the magnitude of the vector. Substituting this form into (42) yields and carrying out the sum gives Differentiating (42) we get the tangent to the geodesic curve at Y (t) given by

Exponential Map
The exponential map plays a prominent role in our later applications. The following lemma relates the geodesic specified by (42) to an equivalent exponential form.
is equal to the exponential map form defined by Let B be represented by its SVD Using (56) the argument of the matrix exponential (54) is where and where we have used the fact that Using the general matrix exponential identity expm(U ΛV H ) = U expm(Λ)V H and (59) the matrix exponential in (54) expands as where we have used the fact that Post multiplying of (62) by the selection matrix I 3,1 yields Premultiplication by Y 0 Y 0⊥ , letting σ = σt and noting (57) yields The desired form (53) is recovered and this establishes equivalence with the exponential map form (54).
The short hand notation for the exponential map (54) is where the tangent vector ∆ = Y 0⊥ B. The exponential map defined in this way is seen to be a map from the tangent space at P 0 , T P 0 S 2 , to the manifold S 2 .

Inverse Exponential Map, Normal Coordinates and the Two-Point Boundary Value Problem
The principal inverse of the exponential map, termed the inverse exponential map, is therefore a mapping from S 2 to T P 0 S 2 . Given P 0 and the tangent vector ∆, let P 1 = exp P 0 ∆. The inverse exponential map then is Given an arbitrarily chosen orthonormal basis b 1 , b 2 for T P 0 S 2 the tangent vector ∆ ∈ T P 0 S 2 expands as where Suppose that Y 0 = y(P 0 ) and Y 1 = y(Y 1 ) are the Cartesian coordinates of these points The above procedure that defines the 2-tuple β provides an alternate means of labeling points on the sphere. Let the transformation between these two set of coordinates, Y to β, which consists of computing the tangent vector ∆ and its components β with respect to the given basis, be denoted as In general coordinates defined via the exponential map are termed normal coordinates.
The coordinate transformation from the Y coordinates to these polar normal coordinates is denoted as The transformation form normal to the Cartesian coordinates is therefore Y = y•ϕ −1 (β) and defined by exponential map (54). Using the equivalent CS form of the exponential mapping given by (42) with t = 1 we have An alternate definition of the sphere is given parameterically as

Two point boundary value problem
In the two point boundary value problem (BVP) the auxiliary conditions are The objective here is to determine the initial velocity (i.e., speed σ and direction U ), or tangent vector, required to move from Y 0 to Y 1 in unit time (in our later applications this procedure is analogous to determining the tangent vector ∆ that connects Y 0 to Y 1 via a geodesic). The required initial velocity vector is determined from the boundary conditions. From (42) with t = 1 The speed σ then is and the unit tangent vector direction is The resultant tangent vector that satisfies this BVP is ∆ = U σ.
Example: 2 point BVP: Let Y 0 be the North Pole Y 0 = 0 0 1 t and let Y 1 be an arbitrary point on the equator The polar normal coordinates of Y 1 are therefore π 2 , γ . The standard normal coordinates are β 1 = π 2 cos(γ) and β 2 = π 2 sin(γ) so that using (68) as expected.

Coordinate Curves
Let b 1 and b 2 be an orthonormal basis for the tangent space at north pole represented in the extrinsic frame as b 1 = [1, 0, 0] t and b 2 = [0, 1, 0] t . The extrinsic coordinate representation of the normal coordinate (geodesic) curve emanating from the and is given by The tangent at an arbitrary time t is At t = π 2 we have Y (1) ( π 2 ) = 1 0 0 t which corresponds to a point on the equator.
The tangent to this coordinate curve at t = π 2 evaluates to [0, 0, −1] t . This vector is attached to the point Y (1) and pointing in a downward direction.
Similarly, the normal coordinate (geodesic) curve emanating in the b 2 = 0 1 0 t direction is denoted Y (2) (t) and is given by and the tangent vector is At t = π 2 we have Y (2) ( π 2 ) = 0 1 0 t which corresponds to a point on the equator.

Natural Geodesic Distance Between Points
The running length of the arbitrary smooth curve Y (t) emanating from the point Y 0 is given by elementary calculus as where ∆(t) = dY (t) dt . The geodesic distance between two points on the sphere is the length of the shortest curve connecting the two points. The distance between Y 1 and Y 0 is then is defined as minimized over all curves Y (t) ∈ S 2 between Y 0 and Y 1 .
If Y (t) is a geodesic then the tangent at each point on the curve is given by (52).
The magnitude of ∆(t) is constant along the curve with ∆(t) = ∆(0) = σ. Using this expression in the integral (140) yields On the unit sphere this arc is a great circle, and the arc length is the angle in radians separating the two end points (σ = radians sec ). Remark: If Y 0 is the North Pole and Y (t) is the geodesic given by (50), then and the geodesic distance between the points is evidently In the signal processing examples discussed later the subspace angles between the semiunitary matrices Y 0 and Y 1 are defined as the acos of singular values of the Y H 0 Y 1 .

Estimation Example on the Unit Sphere
Consider the linear model where n 0 is a 3 × 1 additive random noise vector n 0 ∼ N p (0, σ 2 0 I), with σ 0 known, and n 1 is a scalar random variable where σ 2 is a deterministic unknown. The observation z then is a 3 × 1 normal random amd where the signal covariance R s is defined by In this model Y is a deterministic unknown and represents the point on unit sphere.
Let Z = z 1 , · · · , z K be a 3 × K data matrix whose columns are independent and identically distributed (iid) random vectors defined by (93). The log-likelihood function is (ignoring constants) whereR is the sample covariance matrix (SCM)R = K −1 ZZ t . Representing Y in normal coordinates (74) yields an alternate parameterization of this function as L(β) = L(Y ).
Let Y denote an estimator of Y ∈ S 2 and let ∆ = exp −1 Y Y be the estimation error tangent vector. For b 1 , b 2 an orthonormal basis suppose that this stochastic error vector expands as The geodesic distance error measure (92) and Because this distance corresponds to the norm of coefficients β, the Fisher information, denoted F ββ , and inverse Fisher information matrix(FIM) are computed using this parameterization of the log-likelihood function. The FIM for this problem is developed in [1, equation 143]) and is given by where For the standard basis (32) this is expressed in the 2 × 2 matrix form so that Ignoring curvature, the variance of any unbiased estimator Y satisfies the inequality The ML estimator of Y ∈ S 2 for this linear model is the principal eigenvector of the SCMR.

Geometry of Signal Processing Manifolds
Many estimation problems can be formulated as the minimization of a function F (Y ) where the parameter Y is an orthonormal matrix, that is, Y is constrained to the set of N × p matrices such that Y t Y = I. This set is known as the (real) Stiefel manifold. Often the objective function of interest has some type of homogeneity property In other words, the objective function depends only on the subspace spanned by the columns of Y ; it does not depend on the particular basis used to represent the subspace. The set of p-dimensional subspaces in R n is called the Grassmann manifold and so this manifold is natural parameter space or domain for the functions with this homogeneity condition. A variation for the complex case is the homogeneity

Coordinates on Manifolds
Modern differential geometry relies on coordinate-free methods, but because the focus is on estimation algorithms, a coordinate-based approach is used from the start. A coordinate system on a manifold M is a mapping,say y, from the abstract manifold M A point Y ∈ M has the coordinates Y ∈ R N , assigned by the coordinate mapping y In the usual theoretical development the dimensions of R N and M are equal but this is not required, and in our signal processing applications it is more natural initially to work with extrinsic coordinate representations.
Extrinsic coordinates correspond to the natural numerical matrix representation of elements of the Stiefel manifold. The length of the extrinsic coordinate vector is greater than the dimension of the space. For example, the complex Stiefel manifold has dimension 2pN −p 2 but its representation as N ×p complex matrices requires 2N p values. The dimension of the Stiefel manifold reflects the orthonormality constraint on the columns of matrix representation. As discussed in the previous section, this is the same situation that occurs when a point on the usual 2-dimensional sphere is represented by the standard (x, y, z) Cartesian coordinates of the ambient 3-dimensional space.
The alternative to such extrinsic coordinates are intrinsic coordinates in which the number of required coordinate components equals the manifold dimension (normal coordinates are a special type of intrinsic coordinates). For the case of the sphere the u, v coordinates on the 2-sphere defined by the mapping to extrinsic coordinates, x = cosusinv, y = sinusinv, z = cosv, provide an example of intrinsic coordinates.
A more general discussion of intrinsic coordinates and their application is given in appendix A.1.

Unitary Group as a Riemannian Manifold
As noted in [2], "given a manifold whose geometry is well understood (where there are closed form expressions for the geodesics and, perhaps also, parallel transport), there is a very natural, efficient, and convenient way to generate closed form formulas on quotient spaces of that manifold." This is precisely the situation with the Stiefel, Basis, and Grassmann manifolds, which are quotient spaces within the Unitary Group U(N ).
We first consider the Unitary Group and review its geometric properties. Using this geometry the geometric properties of quotient manifolds on U(N ) are developed in a straightforward way.
The set C N 2 ≡ C N ×N , the collection of all N × N complex matrices, along with the standard operations of matrix addition and multiplication by real scalars, defines a real vector space of real dimension 2N 2 . We define an inner product on this real vector space as The set of all vectors with origin at the arbitrary point X 0 ∈ C N 2 is a vector space, denoted by T X 0 C N 2 , with same dimension as C N 2 . This is also an inner product space with the inner product (108). This inner product is a bilinear form defined by The set C N 2 along with this inner product defined on the tangent spaces, defines a Riemannian manifold. The unitary group U(N ) is defined here as a subman-ifold of C N 2 as the set of all X ∈ C N 2 such that X H X = I. The set U(N ) is also a group, where the group operator is standard matrix multiplication. U(N ) is closed with respect to the group operation, matrix multiplication, so that for X 0 ∈ U(N ), Q ∈ U(N ) the product X = X 0 · Q is also in U(N ). Note that any unitary matrix Q can be written , and expm is the matrix exponential function.

Example:Unitary Matrices
Consider the Hermitian matrix W and its eigen-decomposition From the basic properties of matrix exponentials [3] Clearly X H X = I so that X is unitary. This is a general fact. If W is anti-Hermitian, then expm(W )is a unitary matrix. Furthermore, every unitary matrix has this form and therefore the exponential map from the set of anti-Hermitian matrices to the set of unitary matrices is surjective. 1 By virtue of the embedding in the Riemannian manifold C N 2 , the submanifold U(N ) is endowed with the inherited metric (108) and as such constitutes a Riemannian manifold.

Tangent Space
Intuitively, the tangent space at a point is the plane tangent to the submanifold at that point (see Figure 2.1) in the ambient space. For d-dimensional manifolds, this plane is a d-dimensional vector space with origin at the point of tangency. The normal space is the orthogonal complement in the ambient space. In the general case the tangent space at the point P ∈ M is denoted T P M ; P is the base point or origin.
All elements in T X 0 U(N ) have the form ∆ = X 0 W , W is anti-Hermitian, while all elements of (T X 0 U(N )) ⊥ , the normal space, have the form Φ = X 0 S where S is Hermitean.
The anti-Hermitian requirement on the matrix W above may be derived from the defining property unitary matrices In this expression X(t) represents an arbitrary smooth curve on U(N ). Differentiating where ∆ =Ẋ is the N × N matrix derivative.
Substituting ∆ = XW , the SVD representation of the arbitrary matrix ∆, into (116) yields that is, W is constrained to be anti-Hermitian.
Since all elements of T X 0 U(N ) must have the form X 0 W the real dimension of this space is equal to the degrees of freedom in the N × N anti-Hermitian matrix W so that To see this, note that for an N × N matrix, there are N (N −1) 2 complex elements above the diagonal. If the matrix is anti-Hermitian, then the N diagonal elements are pure imaginary. The total number of independent elements in an anti-Hermitian matrix, therefore,

Riemannian Manifold Inner Product
The restriction of the Euclidean metric (108) to T X 0 U(N ) defines an inner product on the tangent space at each X 0 ∈ U(N ), denoted Since all ∆ ∈ T X 0 U(N ) have the form ∆ = XW this becomes product becomes The square magnitude of the ∆ = XW is

Geodesics
On a Riemannian manifold a geodesic is the curve of shortest length between two points. From [1, page 309] "A straightforward exercise from the calculus of variations reveals that for the case of manifolds embedded in a Euclidean space the acceleration vector at each point along a geodesic is normal to the submanifold so long as the curve is traced with uniform speed (i.e., tangential acceleration is zero). This condition is necessary and sufficient. In the case of the sphere, acceleration for uniform motion on a great circle is directed radially and therefore normal to the surface; therefore, great circles are geodesics on the sphere." Thus for a curve X(t) on U(N ) to be a geodesic, the acceleration vectorẌ must be in the normal space (T X(t) U(N )) ⊥ at every point along the curve X(t). Since all elements of the normal space at X(t) have the form X(t)S, S Hermitian, conclude thaẗ An arbitrary curve X(t) satisfies the manifold defining constraint (115); twice differentiating this equation yieldsẌ If X(t) is a geodesic then it must satisfy (124); substituting this into (125) results in Back substituting this result in to (124) yields or equivalentlyẌ This differential equation is referred the geodesic equation or the Euler-Lagrange equation of motion. Direct substitution verifies that this non-linear differential equation has the general solution for W anti-Hermitian and C a arbitrary scaler constant. From the basic properties of matrix exponentials [4] we haveẊ = exp(W t)W andẌ = exp(W t)W 2 . Substituting these into (130) yields where we have used the anti-Hermitean condition W H = −W .
The geodesic equation (130) and the arbitrary initial conditions, X(0) = X 0 anḋ X(0) = X 0 W 0 , W 0 anti-Hermitian, define an initial value problem (IVP). This IVP has the solution This curve in U(N ) is said to be the geodesic emanating from X 0 in the direction of the tangent vector ∆ = X 0 W 0 . The short hand notation for this geodesic mapping (133) is In this expression, ∆ is a tangent vector at X 0 represented as a N × N complex matrix.
The tangent at any point along the geodesic curve X(t) is and at t = 0 we have ∆(0) = X 0 W 0 .

Orthonormal Basis
An orthonormal basis for the tangent space T X U is given by where the anti-Hermitian matrices, W k , are chosen to form an orthonormal set, that is An arbitrary element of ∆ ∈ T X U is then represented with w.r.t. this basis as ∆ = N k=1 e k α k and α k = e k , ∆ .
Example: For N = 2 we have dim(T X U) = N 2 = 4. An orthonormal basis for T X U is This choice of basis is not unique. Multiplying the above matrix representations of the basis vectors by an arbitrary 2 × 2 unitary matrix yields an alternate basis.
Example: The IVP defined by (130) and the initial conditions X(0) = X 0 anḋ X(0) = e 3 , defined above, has the solution

Natural Geodesic Distance Between Points
The running length of the arbitrary smooth curve X(t) emanating from the point X 0 is given by elementary calculus as where ∆(t) = dX(t) dt is the tangent to the curve at X(t). The Riemannian or geodesic distance between two points is the length of the shortest curve connecting the two points. The distance between X 1 ≡ X(t 1 ) and X 0 is then is defined as minimized over all curves X(t) between X 0 and X 1 . On the unit sphere this arc is a great circle and the arc length is the angle in radians separating two end points.
Let X(t) be the geodesic curve emanating from the point X 0 in the direction ∆ = X 0 W where W 0 is constant anti-Hermitian matrix. This curve is given by X(t) = exp X 0 t∆ (133) and the tangent vector at any point along the this curve is Since X H (t)X(t) = I, the magnitude square ∆(t) 2 is constant at each point of the curve and is given by The right-hand side of this equation is corresponds to ∆(0) 2 so that magnitude to the running tangent vector is the constant ∆(t) = ∆(0) . Substituting this into (140) gives the running length of the curve as If X 1 is an arbitrary point near X 0 , and ∆ = exp −1 X 0 X 1 , then the geodesic distance between the points is The equality of the distance between points and the magnitude of the corresponding tangent vector will used later in analyzing the subspace estimation problem.

Signal Processing Manifold Representations
Having described the geometry of the Unitary group manifold we next consider the related spaces that naturally arise in signal processing applications. These spaces may be described in two different, but equivalent, ways.
In section 2.2.4 the spaces of interest are considered as quotient manifolds on U(N ). This approach is useful for obtaining closed-form expressions for the geometrical objects of interest. The Stiefel manifold, denoted St (N,p) , is represented as a quotient space within U(N ) as Each point of St (N,p) represents a subset of U(N ) defined by the equivalence relation Thus two points in U(N ) represented by matrices X 0 , X 1 are in the same Stiefel equivalence class, (X 1 ∼ X 0 ), and therefore represent the same point on the Stiefel manifold if for some unitary matrix Q ∈ U(N − p). This quotient manifold is sometimes written as St (N,p) = U(N )/ ∼ to suggest that elements in the same equivalence class are "divided" or "quotiented" out, where ∼ is understood. Variations on this equivalence relation lead to the Basis and Grassmann manifolds discussed in the following sections.
In section 2.2.7 we consider more concrete manifold representations that are useful in numerical applications. For numerical signal processing applications, a single representative of an equivalence class must necessarily be chosen. In this alternate concrete representation, a point on the Stiefel manifold St (N,p) is represented by an N × p complex matrix and the manifold is defined as the collection

Quotient Space Representation of the Complex Stiefel Manifold
Representing St (N,p) as a quotient space of U(N ) (147) allows the geometric properties of St (N,p) to be derived from the known geometry of U(N ) (e.g, geodesics defined by the exponential map). The key idea here is the partitioning of unitary group tangent spaces into horizontal and vertical spaces, the metric, and their relationship to geodesics.
Referring to (147) the dimension of the St (N,p) is given by the difference A point in the Stiefel manifold is a particular subset of the unitary matrices. In particular, the equivalence class X (148) is the set of all N × N unitary matrices with the same first p columns as X. The set X is a Riemannian submanifold of U(N ), termed a fiber submanifold. To introduce the decomposition of the tangent space in the quotient space setting, we write the N × N unitary matrix X 0 unitary matrix in partitioned form as where Y 0 and Y 0⊥ correspond to the first p and the remaining (N − p) columns of X 0 , respectively. Then, for any X 1 ∈ U(N ) partitioned as in the Stiefel sense if the first p columns of X 1 and X 0 are equal: that is, if Y 1 = Y 0 .
Although we introduce the partitioning notation in this section, we are working with equivalence classes and an N × N class representative rather than an N × p matrix representation. This later concrete method of representation, applicable to the Stiefel manifold, is consider in section 2.2.7.

Ouotient Tangent Space
The general form of a tangent vector (135)). Writing W in partitioned form as where A is p × p anti-Hermitian, C is (N − p) × (N − p) anti-Hermitian, and B is (N − p) × p arbitrary complex, and using (153), the tangent vector ∆ = X 0 W takes the form The tangent space T X 0 U(N ) can be partitioned into two complementary subspaces based on this tangent vector form and the properties of the exponential map. These spaces, described below, are called the vertical and horizontal spaces and denoted respectively as V X 0 and H X 0 . The tangent space T X 0 U(N ) then may be expressed as Figure 5. The vertical fiber submanifold is the set X = exp X Φ. All points on this fiber belong to the same equivalence class and therefore represent the same point of the quotient manifold. The metric and geodesics must all be restricted to the horizontal space.
Vertical Space V X 0 The vertical space, denoted V X 0 , is a subspace of T X 0 U(N ) consisting of tangent vectors that leave the first p-columns of X 0 unchanged under the action of the exponential map. Using the notation (153), vectors in this space have the form Movement along a geodesic in the direction of any vector in this space stays within the equivalence class (148). To verify this, compute the exponential map so that and where Q C ≡ expm(C) is unitary since C is anti-Hermitian. Observe that the first p columns of X 1 ≡ Y 1 Y 1⊥ are left unchanged, while the remaining columns undergo a rotation. The vertical space and the resulting fiber manifold (the set X = exp X Φ ) are illustrated in figure 5. The dimension of V X 0 is equal to degrees of freedom of

Horizontal Space
The orthogonal complement of V X 0 in T X 0 U(N ) is defined as the horizontal space, denoted H X 0 . The significance of the horizontal space is that it provides a representation of tangents to the quotient manifold space, which in the current discussion is St (N,p) , that is Movement along a geodesic in the H X 0 direction changes the first p-columns resulting in a different point in the quotient space. As discussed previously, movements in the vertical direction make no change in the quotient space. Therefore, the metric and geodesics must all be restricted to the horizontal space.
Referring to (157) and (155), tangent vectors in H X 0 have the general partitioned matrix form as or equivalently The unitary group geodesic corresponding to an arbitrary tangent vector in H X 0 is evidently or in partitioned form The tangent vector ∆(t) ∈ H X(t) at every point along this curve is The curve defined by (167), a geodesic in U (N ), is a geodesic in the quotient space St (N,p) as well. We will return to this formula when deriving an expression for geodesics with respect to the concrete representation of the Stiefel manifold in section 2.2.7.
The dimension of H X 0 is equal to degrees of freedom of A ∈ U (p) plus degrees of Note that The partitioning of T X 0 U(N ) into vertical and horizontal subspaces (156) is dependent on the choice of equivalence relation. Different equivalence relations result in different partitions. This will be evident when we consider the related Basis and Grasmann manifold formulations (see Table 2).

Partitioning the Horizontal space
In later signal processing applications we wish to distinguish between pure rotations and motions which change the subspace. It is therefore useful to partition the horizontal space H X 0 into a rotation space and its complement. Referring to (154), an arbitrary element of H X 0 can be expressed as the sum The space A X 0 is defined as the subspace of H X 0 The dimension of this space, denoted N a , is equal to the degrees of freedom in a p × p skew-Hermitian matrix so that The complement of A X 0 in H X 0 , denoted B X 0 , is given by Since B is an arbitrary (N − p) × p complex matrix, the real dimension of the vector By construction Referring to (169), (173) and (175),we note that the spaces are dimensionally correct The exponential map corresponding to tangent vectors in and in partitioned form the first p-columns are The last N − p columns are unchanged. Motions along A X 0 directions (178) represent pure rotations of the p columns of Y 0 ; there is no change in the subspace.
The significance of the B X 0 space is that it corresponds to motions that alter the subspace spanned by first p-columns of X 0 . The exponential map corresponding to tangent vectors in B X 0 (174) is given in partitioned form as The first-order approximation of this map is Basis for A X 0 and B X 0 An orthonormal basis for A X 0 has the form where the indexed set of p × p matrices A k , are anti-Hermitian and furthermore satisfy the orthonormality condition Similarly, an orthonormal basis for where the indexed set of (N − p) × p matrices B k are arbitrary complex and satisfy the orthonormality condition Any tangent vector ∆ ∈ H X 0 can be represented with respect to the above basis vectors for some real N a -tuple α and some real real N b -tuple β. For a given ∆ the coefficients α and β are computed in the standard way If we define then we recover the A and B matrices appearing in tangent vector form (165).

Example: Basis Vectors for Stiefel Manifold Tangent Space
For p = 2, N arbitrary, we have dim(A X 0 ) = N a = p 2 = 4 and dim(B X 0 ) = . For this case a set of basis matrices A k appearing in (183) are The standard set of B k matrices appearing in (185) are where i k is a real p(N − p) × 1 unit vector of all zeros an 1 in the kth location. For In this case the k = 2 basis matrix is and for k = 2 + N b 2 = 8 Relation between inner product and norm and metric The canonical metric on the Stiefel manifold is the restriction of the unitary group metric (110) to the horizontal space,sH X 0 . Any element ∆ ∈ H X 0 can represented as (187) or equivalently (165).
The squared norms corresponding to these two representations are and The components α k and β k are related to the A and B matrices appearing in the (187) representation by and Note that so that β 2 = 2Re tr(B H B) and the expressions for the squared norm given by (199) and (196) are equivalent. The canonical metric on the Stiefel manifold is then simply the restriction of the unitary group metric to the horizontal space, that is where P H is the projector onto H.

Basis Manifold
The Basis manifold is defined as the quotient U(N )/(D p × U (N − p)) with each point being the equivalence class given by where D p is the space of p × p unitary diagonal matrices. Elements of D p have the form diag (exp (iθ 1 ) , · · · , exp (iθ p ))) for θ k real so that dim(D p ) = p. A point in the Basis manifold is a particular subset of the unitary matrices, and the Basis manifold itself is the collection of all these subsets. The dimension of B (N,p) is given by the difference The vertical space, V X 0 , consist of vectors of the form where the elements M kk are pure imaginary. The dimension of V X 0 is equal to the degrees of freedom of C ∈ U (N − p) plus the degrees of freedom of M ∈ D p , that is The dimension of the vertical space for the basis manifold formulation is increased by p relative to the Stiefel manifold case (compare eqns (209) and (157)). Accordingly, the dimension of the horizontal space H X 0 is decreased by p relative to the Stiefel manifold case.
As in the Steifel manifold case, it is useful to partition T X 0 B (N,p) ≡ H X 0 , into two component spaces as in (176). The space B X 0 is defined in exactly the same way as Stiefel manifold case (174). The space A X 0 is defined in a similar way as Stiefel manifold case, with the exception that unitary diagonal matrices have been removed (these are exactly those that appear in vertical space V X 0 for Basis manifold case (209)).

Example: Basis Vectors for Basis Manifold Tangent Space
Referring to example 2.2.4, the basis elements for the A X 0 space are The basis elements A 3 and A 4 , appearing in the complex Stiefel manifold example, do not appear in the Basis manifold since these are unitary diagonal matrices and, as such, belong to the Basis manifold vertical space V X 0 .
In general, the standard basis for the A X 0 basis manifold space is found by removing the p diagonal unitary basis elements that appear in the Stiefel manifold A X 0 space (note that we are using the same symbol A X 0 to denote two different spaces as they appear in the Stiefel and Basis manifold formulations). The dimension of Basis manifold A X 0 space is therefore reduced by p relative to the Stiefel formulation (173), so that . The basis elements for B X 0 are identical to those in the Stiefel case (192) as this space is unchanged.
The canonical metric on the Basis manifold is the restriction of the unitary group metric to the horizontal space

Grassmann Manifold
The Grassmann manifold is defined as the quotient U(N )/(U(p) × U(N − p)) with each point an equivalence class defined by A point in the Grassmann manifold is a particular subset of the unitary matrices, and the Grassmann manifold itself is the collection of all these subsets. The dimension of the The vertical space at a point X 0 consists of vectors of the form where A is p × p anti-Hermitian and C is (N − p) × (N − p) anti-Hermitian. The We see that the first p columns and the last (N − p) columns undergo independent rotations. The N × p matrices Y 0 and Y are in the same Grassmann equivalence class so that Y ∼ Y 0 . Note the dimension of the vertical space for the basis manifold formulation is increased by p 2 relative to the Stiefel manifold case (compare eqns (209) and (157)).
The horizontal space H X 0 at X 0 consists of the tangent vectors of the form The A X 0 subspace of H X 0 appearing in different forms in the Stiefel and Basis manifold cases is empty here since these rotations, (219), now belong to the vertical space V X 0 .
The canonical metric on the Grassmann manifold is the restriction of the unitary group metric (121) to the horizontal space H X 0 = T X 0 G, that is where P H X 0 is the projection onto H X 0 ; for the Grassmann manifold H X 0 ≡ B X 0 .
The table below summarizes the dimensions of the various spaces encountered in the for each of the manifolds as expected.

Concrete Representation of the Stiefel Manifold
In this section the Stiefel manifold is represented as the collection of all N ×p semiunitary matrices, viewed as a subset of C N ×p . This is in contrast to the quotient space representation where points on the manifold were identified with subsets of U(N ) ⊂ C N ×N . For this complex case the Stiefel manifold is defined in concrete form as

Tangent Space
In the quotient manifold approach the ambient space was N 2 -dimensional and tangent vectors were represented as N × N matrices of the form (171) Introducing the notation ∆ ≡Ẏ this becomes An arbitrary N × p matrix ∆ ∈ T Y C N ×p may be decomposed as for A, p×p anti-Hermitian, S, p×p Hermitian , B, arbitrary (N −p)×p complex matrix and Y H Y ⊥ = 0. The tangent space T Y St (N,p) < T Y C N ×p is constrained by tangency condition (225). Substituting (226) into (225) shows that the symmetric part of (226) must vanish, that is S ≡ 0. Elements of T Y St (N,p) are therefore This form is the same as found by selecting the first p-columns of the quotient space tangent vector representation (165). The space T Y St (N,p) partitions as before (176), that is where A Y and B Y and defined by selecting the first p-columns in the definitions (172) and (174). respectively. The symmetric component of (226), Y S, is normal to the embedded manifold in the ambient space. The collection of all such vectors is the normal The total space decomposes as Note that for p = 1, Y is N × 1 unit vector an S is a complex scalar.

Canonical Metric
A metric on manifold is a smoothly varying mapping that at each point satisfies the standard inner product conditions (see section A.3.2). The standard Euclidean metric on M = C N ×p , viewed as a vector space over the reals, is The scale factor 2 has been included for reasons of simplicity that will become evident in the following. The subscript "e" stands for Euclidean. Substituting the tangent vector ∆ ∈ T Y St (N,p) form given by (227) into (231) yields This is not the metric selected for St (N,p) since it counts independent elements of A of two times. To see this consider the anti-Hermitian matrix Then We see that the independent element a is counted twice. To correct this, the canonical metric on St (N,p) , denoted as g c , is instead selected as or equivalently as For an arbitrary ∆ ∈ T Y St (N,p) given by (227) this yields Comparing the above result (236) to (199), the metric for the quotient manifold formulation, we see that the two forms they are identical. Therefore, the formulas for geodesics for the Stiefel manifold given in section 2.2.4 are correct if we view the Stiefel manifold as the set of semi-unitary N × p matrices with the metric g c (235). The canonical metric g c is thus not simply the restriction of the g e to the submanifold.
For simplicity we sometimes will use the alternate angle bracket notation for the metric ∆, ∆ ≡ g c (∆, ∆) where any dependence on the base point Y has been suppressed.

Geodesics (Stiefel)
Let Y (t) denote an arbitrary smooth curve in St (N,p) between the points Y 0 and Y 1 , . The length of this curve is given by the integral whereẎ = dY dt is a tangent to the curve at Y (t).
Direct substitution verifies that general solution is given by As anticipated, this is identical to the geodesic derived from the quotient manifold formula and given by (167).

Basis Manifold as a Quotient manifold of the Stiefel Manifold
The Basis manifold can be naturally defined as a quotient of the above concrete Stiefel manifold formulation. Recalling (207), the Basis manifold is represented as the quotient St (N,p) /D p with each point being an equivalence class given by A point in the Basis manifold is thus a particular subset of the N × p semi-unitary matrices, and the Basis manifold itself is the collection of all these subsets. When performing computations on the Basis manifold we will use an N × p matrix Y to represent an entire equivalence class; any representative of the class will do.
The vertical space, a subspace of T Y 0 St (N,p) , corresponds to motions that remain in the set (242) and consists of tangent vectors of the form where D is unitary-diagonal. This collection defines the vertical space V 0 at Y 0 . The orthogonal complement is the horizontal space, H Y 0 , which here consists of vectors of the form The significance of the horizontal space is that it provides a representation of tangents to the quotient manifold space which in the current discussion is B N,p , that is Movements in the vertical direction make no change in the quotient space. Therefore, the metric and geodesics must all be restricted to the horizontal space. Viewed as a quotient manifold of St (N,p) we have The Grassmann manifold can be defined as a quotient manifold of St (N,p) in a similar way.

Subspace Angles, Subspace Distance and Geodesic Distance
The principal angles between subspaces (PABS) (also called canonical angles) provide a measure of the difference between two subspaces. We begin by stating two alternate and equivalent definitions of PABS. Next we relate the root sum squared (RSS) PABS, defined as the subspace distance, to the natural (geodesic) distance between points on the signal processing manifolds of interest (i.e.,Grassmann,Basis and Stiefel).
The acute angle γ between two real unit vectors x and y is defined implicitly as cos γ = x, y . This definition can be recursively extended to PABS; see, e.g., [2,10,11].
Definition 1: Recursive PABS. Let X ∈ C N ×p and Y ∈ C N ×p . The principal angles, γ k , between X and Y are recursively defined by subject to |x| = 1,|y| = 1 and x i , x = 0 and y i , y = 0 for all i < k. The notation x ∈ col(X) indicates x is an N × 1 vector in the column space of X, col(X). The inner product is x, y = Re(x H y).
Remark: Denote the pair that maximizes the first iteration as (x 1 , y 1 ). The second iteration, maximizes of over the set of all x ∈ X, y ∈ Y with the constraint that x 1 , x = 0 and y 1 , y = 0.
An alternate definition of PABS, from [10,11], that is useful for computations is based on the SVD of the product X H Y . It is stated below in the form of a theorem.
Theorem 2 ((PABS from SVD of X H Y )). Let the SVD of X H Y be where U and V are unitary matrices and C is a p × p diagonal matrix with the real diagonal elements C kk in non increasing order. Then the subspace angles γ k between col(X) and col(Y ) from definition 1 are equal to Note that for the special case of p = 1, and X and Y real N × 1 unit vectors which agrees with the standard definition of the angle between two vectors.
The subspace distance between X and Y is defined in terms of this p-tuple of angles, γ, as This p-tuple may also be used to define several different "distances" between two subspaces [?, page 337].
For the Grassmann manifold formulation this distance is exactly equal to the geodesic distance. For the Stiefel and Basis manifold formulations the situation is more complicated. In the next section we show that for these formulations, distSS(Y 1 , Y 0 ) corresponds to the geodesic distance between Y 0 and an appropriately defined fiber submanifold over Y 1 . This submanifold corresponds to the equivalence class containing Y 1 in the Grassmann formulation.
The rotational distance defined below for the Stiefel and Basis manifolds recognizes that on these manifolds X and XQ, where Q an arbitrary unitary matrix, are different points.
Definition: Steifel and Basis manifold Rotational Angles: Let the SVD of X H Y be given by (248) and let Q = U H V . The rotational angles, denoted ν k , between X and Y are where ρ k is the k-th singular value of Q for the Steifel manifold and k-th singular value of (Q − diag(Q)) for the Basis manifold. The rotational distance is defined as Note that in the Basis manifold formulation Y ∼ Y Λ Q for Λ Q unitary diagonal. The definition of rotational angles using (Q − diag(Q)) for the Basis manifold accounts for this equivalence.

Subspace Distance on Grassmann and Basis Manifolds
For the Grassmann manifold, the subspace distance (251) is identical to the geodesic distance between points [1]; that is, for any pair In all the manifold formulations the geodesic distance between two points,Y 1 and Y 0 , is given by the magnitude of tangent vector ∆ = exp −1 Y 0 Y 1 (146). Note that the meaning of this inverse operation depends on the manifold formulation.
. We first establish that the subspace distance corresponds to the geodesic distance in the Grassmann manifold formulation of the problem. This is done most easily using the CS decomposition form of the exponential map discussed below. This form was encountered earlier in the discussion of the sphere in section 2.1.
Theorem 3 (Equivalence of Forms CS decomposition). The exponential (geodesic) map This is equal to where U, V and Σ are defined by the compact SVD of ∆ as Using the standard matrix exponential identity expm(U ΛU H ) ≡ U expm(Λ)U H and the fact that the matrix exponential in (254) is Simplifying yields the desired result (256) where we have used the fact that U = Y 0⊥ U 1 (verified by premultiplying both sides of (258) by Y 0⊥ and noting (257)).

Using the CS form (255), with the left hand side as
Since the subspace angles, γ k , are defined as the arccos of the singular values of Y H 0 Y 1 , see (249), we conclude that The subspace distance (251), therefore, can equivalently be written in terms of these singular values as The geodesic distance between Y 0 and Y 1 is equal to magnitude of the tangent vector ∆, which expressed in terms of its' SVD (257) is Therefore the subspace distance defined by (251) is equal to the geodesic distance be- Finally, for an arbitrary orthonormal basis b k for the tangent space

Basis Manifold
In contrast to the Grassmann manifold case, the Basis manifold formulation does not "quotient out" rotations so that the matrices Y 1 and Y 1 Q are represented by different points on the manifold. While dist G (Y 1 , Y 1 Q) = 0 this is not true for the Basis manifold; where P A and P B are the orthogonal projectors and Note that this distance (272) includes a measure of the rotation between the points and so does not correspond to the subspace distance measure desired.
Rather, the subspace distance is equal to the minimum geodesic distance between the point Y 0 and an appropriately defined fiber submanifold over the point Y 1 . This fiber submanifold, denoted A Y 1 , consists of points reached via rotations of Y 1 ; that is, all points of the form Y 1 Q, for Q an arbitrary p × p unitary matrix, or All Y ∈ A Y 1 have column identical column spans and therefore represent the same subspace as Y 1 . In the Grassman manifold formulation A Y 1 is the equivalence class containing Y 1 . In the Basis manifold formulation elements in A Y 1 are regarded as different points.
Using this definition the subspace distance for the Basis manifold case is equal to The above formulation also yields the rotational distance as the geodesic distance between the intermediate point Y g0 , and the end point Y 1 . The rotational distance then is In the following section we introduce an alternate coordinate system and show that

Alternate Non-Normal Coordinate System
Geodesic distances on manifolds are invariants, independent of any particular coordinate system. Although a coordinate independent quantity, subspace distance for the Stiefel/Basis manifold case is most easily analyzed using the non-normal coordinate system introduced in this section. This system approximates a normal coordinate system to first order in the sense discussed below.
These coordinates are developed as a two-leg path between Y 0 and Y 1 by way of given by Substituting (282) into the above yields where we have used the fact that Q = expm(Ȃ) can be expressed as Using the standard basis matrices A k , B k , (191) and (192), define the real parameters We thus have a parameterization, or coordinates, for arbitrary points Y 1 ∈ B near Y 0 .
Denote the implied coordinate mapping as Equation (284) defines a two-leg path from Y 0 to Y 1 in which each of the legs is a geodesic path. The first leg travels in the direction of the tangent vector . The second leg from Y g to the destination Y 1 is a pure rotation (283).
These (α,β) coordinates may be related to a Riemannian normal coordinate repre- then by definition This exponential map has the numerical form where A and B are defined by (190). The Riemannian normal coordinates of Y 1 are (α, β) and the implied coordinate mapping is denoted The (α, β) and (α,β) coordinates are related as where from (288) and (292) Using a fixed set of basis matrices, A k and B k , the transformation between the matrix forms is notionally expressed in a similar way In terms of the non-normal coordinates, all points on To develop the explicit functional relationship for (293), note that since Y 1 is arbitrary, from (291) and (284) we have Recall that for arbitrary X and Y so in general A =Ȃ and B =B; equality holds, however, for the special case ofȂ = 0 and for the special case ofB = 0.
The following lemma shows that the Jacobian of the transformation (293) between these two non-linearly related coordinate systems is the identity matrix.

Theorem 4. Jacobean Transformation
Let (α, β) and (α,β) be the coordinates defined above. Then the Jacobian transformation matrix is Proof: The Baker-Campbell-Hausdorff (BCH) formula for matrix exponentials is where the Lie Bracket of two matrices is defined as Identifying U and V as Using this expression in (299) the matrix argument of two-leg path form (284) becomes Equating the matrix arguments of (291) and (284), and using (303), yields and therefore Expressing the above in expanded form with with respect to the same basis matrix set and therefore ∂α k An analogous result holds for theβ coordinate. Writing this in matrix form yields the desired result, that is The point Y g used in the coordinate construction was defined as a point on Lemma 5 (Lemma: The tangent vector to closest point on The coefficients A and B define the normal coordinates of Y 1 . Since (272) and (312) that In terms of the non-normal coordinates, all points on A Y 1 have the form (Ȃ,B) whereB is constant on A Y 1 . Using coordinate relation (306) the factor B H B appearing in the (313) is .
In developing the above we have used the fact that tr(Re(B HBȂ )) = 0 (317) since in general tr(SA) = 0 for S Hermitian and A anti-Hermitian. Substituting (316) into the distance formula (313) yields SinceB is fixed for all Y ∈ A we conclude that the distance is minimized whenȂ = 0 and therefore, from (305), the normal coordinate parameter A = 0. Therefore P A ∆ g0 = 0 and so P B ∆ g0 = ∆ g0 , that is, ∆ g0 ∈ B Y 0 .

Computing the Tangent Vector /Inverse Exponential Map
Given any two points Y 1 , Y 0 ∈ B there is a two-leg geodesic path between them of the form The tangent vector ∆ ∈ B Y 0 and the rotation matrix Q that define this path are computed as follows.
Expressing the tangent vector in terms of its SVD, ∆ = U ΣV H , the above exponential map from Y 0 to Y 1 (319) can be expressed in the CS decomposition form (329) as Let the SVD of the complex p × p matrix Y H 0 Y 1 be given by with singular values of C 1 in ascending order. Identifying the factors on the r.h.s. of (322) with l.h.s. of (323) yields The last of these is equivalent to The diagonal Σ matrix is This completes the specification of the factors V, Σ, Q in the first term of (321) in terms The remaining factor U is found by solving (321) for U in terms of the given Y 1 and Y 0 and the factors computed above.

Two-Leg form Algorithm
The algorithm for computing the two-leg path, (319) and (320), between the given Compute Q (327) Compute Σ and S 1 Compute ∆ = U ΣV H .

CHAPTER 3 Intrinsic Cramér Rao Bounds for Basis Manifold and Complex Linear Model
The likelihood function that arises in the standard complex linear model may be expressed as a function of an N × p semi-unitary matrix Y . This function has the ho- In the Stiefel manifold formulation the matrices Y and Y M p are considered different points and so the Stiefel manifold is "too big" to serve as the proper parameter space for this function. The form of the likelihood function dictates a different quotient manifold formulation in which multiplication by such unitary diagonal matrices have been removed or "quotiented" out. The quotient manifold with this desired feature is specified below and called here the Basis manifold, denoted as B (N,p) .

Complex Linear Model
Consider the complex linear model where n 0 ∼ CN N (0, σ 2 I); that is, n 0 , is a normal random complex N -tuple where σ 2 is known. Similarly, n 1 ∼ CN p (0, P ) is a normal random complex p-tuple. In this expression P is an unknown deterministic full rank Hermitian matrix and represents the correlation matrix of the source symbols n 1 . The complex N × p signal matrix H ∈ C N ×p is a deterministic unknown of full rank. We refer to this case as unconstrained since there are no constraints other than rank of H and P . In section 3.8 we consider the case when H is constrained in some way.
The observation z, then, is the normal random N -tuple where with signal covariance R s given by Let Z = [z 1 , · · · , z K ] be a random sampling of (337), so that Z is a complex N ×K data matrix whose columns are independent and identically distributed (iid) random vectors. The data matrix Z is an element of the measurement sample space M = whereR is the sample covariance matrix (SCM) given byR = K −1 ZZ H . The loglikelihood function is (ignoring constants) Sampling of the distribution reveals information about the unknown parameters, P and H, and the observable information is subject to estimation. It is readily seen from (339) and (340) that this model precludes the unambiguous and simultaneous estimation of the unknowns H and P . To see this, note that for any invertible T , substituting H = HT −1 and n 1 = T n 1 into (339) leaves the measurement unchanged. Thus, both the pairs (H, P ) and (HT −1 , T P T H ) define the same measurement distribution. As noted in [1] "whenever two parameters give rise to the same measurement distribution, they are indistinguishable, in the sense that no argument based on the observed measurement can be used to favor one parameter over the other as estimator". We also observe that, in general, the pairs (H, P ) and (HT −1 , P ) define different measurement distributions.
Since H and HT −1 define the same subspace, col(H), it is evident for fixed P that the subspace is not sufficient to uniquely specify the distribution for this model. 1 2 An alternate parameterization of the problem is provided by the eigen decomposition of the signal covariance (340). The eigen decomposition defines a mapping from The "eig" function is viewed here as the transformation or mapping In this expression P p is the set all positive-definite Hermitian matrices and D p ⊂ P p is the subset of positive-definite diagonal matrices. Under this transformation the complex linear model (337) is restated as where n 1 ∼ N p (0, Λ) and n 0 ∼ N N (0, σ 2 I) as in the original formulation. The set B appearing in (344) and (345) is discussed in the following.
The eigenvector matrix Y and the signal matrix H are related as Y = HT for some T nonsingular and so col(H) = col(Y ). In addition to yielding the subspace defined by col(H), the eigenvector matrix Y also reveals information about the source correlation matrix P . In terms of this alternate parameterization the signal covariance R s (340) is and the full covariance (337) is therefore 1 Observe that R s = HP H H = HT −1 T P T H T −H H H = HP H H 2 [2] "Therefore,only the column span of H and the covariance matrix of z may be measured, and we ask how accurately we are able to estimate this subspace, i.e., the column span of H , in the presence of the unknown covariance matrix P ." where Y H ⊥ Y = 0 and Λ s ≡ Λ + σ 2 I.

The estimation problem is thus recast in terms of the semi-unitary matrix
A well-posed estimation problem requires enough parameters to completely specify the distribution but not too many so that the problem is ambiguous (by ambiguous we mean different parameters values that yield identical measurements). This was the situation in the initial formulation of the problem, where it was noted that (H, P ) and (HT, T −1 P T −H ) defined the same distribution. For the given log likelihood function (342) involving complex data, the complex Stiefel manifold is "too big " for problem in the sense that we can choose different elements of St (N,p) that yield identical measurement distributions. Suppose that the parameter Λ is fixed. Then both Y and and Y Λ Q , Λ Q an arbitrary unitary diagonal matrix, define the same signal covariance R s and so give rise to the same measurement distribution (same set of numbers). This follows and Y Λ Q are different points on the complex Stiefel manifold, we are led to an alternate quotient manifold defined by the equivalence relation Y ∼ Y Λ Q . The set of elements in St (N,p) that are so related defines an equivalence class, denoted by Y , and given by All elements in the equivalence class define the same distribution and so the quo- is the natural parameter space for the estimation problem. The quotient manifold B N,p is formed from the complex Steifel manifold by "quotienting out" multiplications by unitary diagonal matrices (denoted here as U D(p)).
Said another a way, a point in the Basis manifold is a particular subset of the semiunitary matrices defined (349), and the Basis manifold itself is the collection of all these subsets. Note that in the Grassmann manifold formulation the defining equivalence re- (346), we conclude that the Grassmann manifold is "too small" to properly parameterize L. While the Steifel manifold is "too big" and the Grassmann manifold is "too small" the basis manifold B N,p is just right for the problem. Since we assume that N and p are known throughout the remainder we drop the subscripts and denote the basis manifold as simply B.
The parameters of the log-likelihood are then Y ∈ B and the real p-tuple of positive eigenvalues specified in the diagonal matrix Λ ∈ D p . The complete parameter space of interest is then the product manifold This has real dimension dim(B × D p ) = 2p(N − p) + p 2 (see section 2.2.5).

Estimation on Riemannian (Quotient) Manifolds
The Estimation problems where the parameter space is a Riemannian manifold differ from the more usual parameter estimation in two important ways. In the typical estimation problem (e.g; angle and Doppler) a fixed set of coordinates for the parameter space is naturally provided. In contrast, for the general manifold case, no set of intrinsic or preferred coordinates are given. This is the situation with position estimation on the Euclidean flat plane, for example, where no particular set of coordinates are naturally provided and Cartesian and polar are equally valid choices. Estimators and evaluators are free to choose among these or any others. The second difference with the usual estimation problem is that more general curved Riemannian manifolds such as the sphere are not themselves vector spaces and so the vector space concepts implicitly used in classical Cramér-Rao bounds (CRB) approaches are not immediately applicable.
In these situations, a universally meaningful estimation error measure is the distance between the estimate and the true location. Distance is an invariant or intrinsic quantity dependent on the given metric but independent of any particular choice of coordinates (e.g., the 2-D mean square positional error on the sphere).
In [2] a methodology is established for computing the CRB when the parameter space is an arbitrary Riemannian Quotient manifolds. The frequently encountered example of estimating an unknown subspace is analyzed and the author gives intrinsic CRBs in the form of matrix inequalities relating the covariance of estimators and the Fisher information. The classical bounds developed for Euclidean spaces are generalized to Riemannian Quotient manifolds, such as B, via the exponential map, i.e., geodesics emanating from the estimate to points in the parameter space 3 . In section 10 the approach detailed in [2] is used to compute the full FIM for the Basis manifold formulation of the unconstrained problem (342). The subspace submatrix of the FIM for this case is shown to be identical in form to that given in [2] for the Grassmann manifold formulation. The rotation submatrix of the FIM which appears in the Basis manifold formulation (but absent in the Grassmann case) is also computed here.
In section 3.8 the FIM for the constrained version of (342) is developed and closed form results are provided for the important special case when the signal matrix H(θ) that defines the constraint has a modal form.

Exponential Map and Normal Coordinates
Geodesics on general curved Riemannian manifolds are defined as distance minimizings curves between points. For the special case of the standard Euclidean (flat) manifold E, a geodesic emanating from the point Y 0 in the direction of the vector ∆ is The vector ∆ is an element of a vector space with origin at Y 0 . This vector space is denoted T Y 0 E where "T " is read as "tangent" and Y 0 is the origin or base which corresponds to the "head" of the vector.
For the unitary group manifold U (N ), a geodesic emanating from the point X 0 in the direction ∆ = X 0 W (W anti-Hermitian) is the curve given by exponential map Note that so that the description of ∆ as a tangent is consistent with intuition. The collection of all such tangents through the point X 0 defines a vector spaced termed the tangent space.
This space is denoted as T X 0 U (N ) and the tangent vector ∆ is an element of this space. The short-hand notation for the above exponential map at the point X 0 with tangent The exponential or geodesic map plays a central role in the development of the intrinsic CRB for U(N ) and related quotient manifolds. For the B manifold, a quotient manifold of U (N ), geodesics are naturally defined by constraining the set of tangent vectors to a subspace of T X 0 U (N ), termed the horizontal space, that corresponds to T X 0 B. Elements of this subspace have the same general form as with (352), with the constraint expressed through the matrix W in a partitioned form as The exponential map can be expressed in a more concrete form if we select only the first p columns of (351) by right multiplying by the selection matrix I N,p . Writing where Y (t) = X(t)I N,p . The tangent vector is as an element of the tangent space T Y B. This space is evidently partitioned as the direct sum where elements of A Y and B Y have the general form Y A and Y ⊥ B respectively. The exponential map on B defined by (355) and (354) is denoted in short hand as Denoting the basis vectors for A Y and B Y as a k and b k , respectively, an arbitrary ∆ expands as where α and β are the real components of ∆ with respect to the given basis. 5 The exponential map Y = exp Y 0 ∆ which maps Y 0 to Y , based on ∆, is invertible (for small ∆ this map is well-approximated numerically by Y 0 + ∆). Given an arbitrary point Y near Y 0 this inverse exponential map, which assigns a tangent vector  u, v). A real function on S 2 is the mapping L: S 2 → R and its value at the point P is L(P). For the two above coordinate systems the value is expressed as The distance between two points on a Riemannian Manifold is the path length of geodesic curve between them. It was shown in section 2.2.7 that this length is equal to the magnitude of the tangent vector

Estimation Problem
We consider an estimation problem on the product manifold B × D based on measurements in M = C N × · · · × C N , with the probability density function of the measurement given a parameter (Y ∈ B, λ ∈ D) denoted as Let The measurement data matrix Z is a random sample whose probability density function is shaped by the unknown parameters Y and Λ. An estimator of Y , denoted Y , is a mapping from the probability space M to the parameter space B, that is,Ŷ : Similarly a generic estimator of Λ is denoted Λ and is given by an analogous mappingΛ : M → D.
The estimation error in the general manifold case is the tangent vector∆(Z) ≡ Going forward for conciseness we write Y to mean Y (Z) and∆ to mean ∆(Z). By definition,∆ ∈ T Y B, and given a partitioned orthonormal basis for this space the estimation error vector expands aŝ For conciseness we again writeα,β to mean the stochastic quantitiesα(Z),β(Z).
For unbiased estimators the error covariance is given by the tensor product E[∆ ⊗ ∆]. Viewed as a matrix defined by the coefficients with respect to the given partitioned basis, this covariance matrix is Denoting the orthogonal projector onto where we have used the standard vector identity β 2 = tr (ββ t ).

Intrinsic Mean Square Error Measure
In this section we define the mean square error measures used to analyze estimator performance. Because there are no intrinsic or preferred coordinates in the general manifold case, different system evaluators are free to chose different coordinate systems.
The error measure chosen to evaluate estimator performance should therefore be an invariant quantity, or one that is independent of the arbitrarily chosen coordinate system.
Given such an error measure, all evaluators will record the same "error" irrespective of their choice of coordinates.

Subspace and Rotation Angles
The difference between two elements of the basis manifold can be represented in terms of two sets of angles (subspace and rotational angles) that correspond to the natural partitioning of the tangent space (357). We first define these angles independently of the geometric discussion.
Let Y 0 and Y 1 be two N × p semi-unitary matrices and denote the SVD of the The p-tuple of subspace angles,γ, between Y 0 and Y 1 is defined by The root sum square (RSS) of these angles is referred to as the subspace distance and The p-tuple of rotational angles denoted ν is defined implicitly by The subtraction of diag(Q) term above is in recognition of the Basis manifold quotient structure for which Y , and Y Λ Q , Λ Q unitary diagonal, represent the same point. The RSS of these angles is referred to as the rotation distance and denoted distR(

Subspace Angles on Grassmann and Basis Manifolds
In this section (see also section 2.3.1) we relate the subspace distance defined above to geometric distances on the Grassmann and Basis manifolds. In the Grasmman manifold case this subspace distance is equal [2] to the geodesic distance between the points on the manifold, represented as the semi-unitary matrices, say Y 1 and Y 0 ; that is For the Basis manifold formulation this is not the case. Rather, the subspace distance here is equal to the minimum geodesic distance between Y 0 and a fiber submanifold containing Y 1 . This submanifold, denoted A Y 1 , consists of all points related to Y 1 via a rotation; that is, matrices of the form Y 1 Q for Q unitary. For the Basis manifold we have In the Grassmann formulation A Y 1 is recognized as the equivalence class containing Y 1 The rotational distance (369) then is equivalent to geodesic distance between Y g0 and Y 1 It can be shown that ∆ g0 corresponds to a pure subspace motion (no rotation) so that ∆ g0 ∈ B Y 0 (see figure 6).
The geodesic distance between points on Riemannian manifolds such as B is independent of the choice of coordinates. For numerical calculations some set of coordinates are required. Normal coordinates are a standard choice but the analysis here is facilitated by the alternate non-normalφ coordinates defined below (see also section 2.3.2).
Since ∆ g0 ∈ B Y 0 we have ∆ g0 = Y 0⊥B for some matrixB and so the point Y g0 can be expressed as where Since by construction, Y 1 = Y g0 Q for some unitary Q, combining this with (374) yields the relation between Y 1 and Y 0 where and where we have used Q = expm(Ȃ). The desired alternate non-normal coordinates are determined using inner product (110) as where A k , B k are the standard set of orthogonal basis matrices (see section A.1).
The above defines an algorithm for assigning a unique pairα,β to each point Y in the neighborhood of Y 0 . The result of this process is the coordinate mapping These coordinates are closely related but not identical to normal coordinates. The fact that they are not identical follows from the fact that in general exp( The advantage of this choice is that once theφ coordinates of Y 1 are calculated as (α,β) those of Y g0 are readily revealed as (0,β). The normal coordinates of Y g0 are likewise (0,β) but those of Y 1 are only approximately equally to (α,β). From We seek a bound on the subspace distance error measure where Y 0 is an unbiased estimator of Y 0 . Since this distance is given by β we develop the bound for problem in terms of this N b -tuple. Using the Baker-Campbell-Hausdorff formula (299) it can be shown the Jacobian of the transformation between the ϕ normal coordinates and the non-normalφ coordinates is the identity matrix. Because of this relationship, we are able to develop the intrinsic CRB on (382) by considering the FIM for the log likelihood function expressed in normal coordinates as L(α, β).
If P A and P B are the projection operators from

Intrinsic CRB
Cramér-Rao bounds relate the covariance matrix of estimators to the Fisher information matrix (FIM) of an estimation problem through matrix inequalities. In the general manifold case the Fisher information metric tensor F (see section A.3.2) is defined with respect to a given coordinate basis by In this expression ∇L[e i ] is the directional derivative of L in the direction of the coordinate basis vector e i at the point of interest. This coordinate basis is formed from the tangents to the coordinate functions e k = dθ (k) (t) dt . To emphasize this relation the coordinate basis vectors are often denoted symbolically as e k ≡ ∂ ∂θ k , while the dual coordinate basis is denoted e k ≡ dθ k . Alternate notations for this directional derivative are Using the last of these, the component matrix takes the more familiar form In the classical estimation problem of a vector parameter θ ∈ R p in the distribution f (θ), the CRB is given by the inverse of the Fisher Information matrix F and the lower bound is expressed as the matrix inequality where C is the estimator error covariance, The classical proof of the inequality (388) relies on the fact [4] that the partial derivative of the estimation In the general Riemannian manifold case, the role of the error vector ∆ = θ−θ is played by the error tangent vector ∆ = exp −1 θθ . The usual partial derivative ∂ ∂θ becomes the covariant derivative denoted by the symbol ∇ which pulls in the manifold's geometry where "curvature terms" represent second-order and higher terms involving the manifold's sectional and Riemannian curvatures. This key difference results in additional terms appearing in the intrinsic CRB.
The intrinsic generalization of the CRB for the general Riemannian manifold case is Theorem 6 (Cramer-Rao [2]). Let f (z|Y ) be a family of pdfs parameterized by Y ∈ M, let L(Y) be the log-likelihood function, F = E [∇L ⊗ ∇L] be the Fisher information metric, and ∇ be a Riemannian connection on M. For any an unbiased estimator Y of In this expression C is the matrix form of E ∆ ⊗ ∆ and F is the component matrix corresponding the tensor F in a arbitrary coordinates (see section 3.2). 2) Neglecting the curvature terms at small errors At high SNR estimation errors are small and these additional terms become negligible. Thus just as with classical bounds, the unbiased intrinsic CRBs depend asymptotically only on the Fisher information. The error variance is then bounded by Alternate form of FIM Reference [2] gives an alternate form for the "Fisher information that employs the Hessian of the log-likelihood function for arbitrary affine connections -a useful tool because in the great majority of applications the second-order terms are much easier to compute".
The Hessian tensor of L is denoted as ∇ 2 L, or in coordinates as where e i is a coordinate basis for the tangent space. In arbitrary curvilinear coordinates θ, at the point θ 0 the components L ;ij evaluate to where the derivatives are evaluated at θ 0 and Γ k ij are the Christoffel symbols which reflect the bending and twisting of the coordinate curves (see section A.4).
Theorem 1 from [2], restated below, allows us to compute the Fisher information on an arbitrary manifold using the same approach as is done in classical Cramér-Rao analysis.
Theorem 7 ( [2]). Let f (Z|Y ) be a family of pdfs parameterized by θ ∈ M L = logf be the log-likelihood function, and (385) be the Fisher information tensor. Suppose that the pdf satisfies the regularity condition or equivalently in the arbitary coordinates θ Then for any affine connection ∇ defined on M Proof: The second term in above expression vanishes as a result of the regularity condition.
Integration by parts of the first term in the sum as in classical CRB analysis yields

Computing the Hessian Tensor
As discussed in section A.4.1 the Hessian, ∇ 2 L, of the function L is defined by the quadratic form In extrinsic coordinates this geodesic is Using (400) we have The second derivatives of the left-hand side of (402) represent the ordinary Hessian matrix of L interpreted to be a scalar function defined on R N b . To compute the above substitute the expanded form of Y (k) (t) given by (A.39) into the right hand side of (402) and compute the derivative (see section A.5). The cross terms of the full Hessian are found via polarization (A.126) based on (400) and vectors of the form ∆ = b i +b j , ∆ = a i + b j , etc. as required. This approach is used in the next section to compute the full FIM for log likelihood function (342).

Nuisance Parameters
The form of the statistical model (337) dictated the choice of the Basis manifold rather than the Grassmann manifold as the proper parameter space for both analysis and algorithm development. In estimation applications, a nuisance parameter is "any parameter which is not of immediate interest but which must be accounted for in the analysis of those parameters which are of interest". In many signal processing applications it is the subspace that is of primary estimation interest, rather than the precise set of basis vectors that parameterize the model. In DOA estimation problems, for example, the subspace is sufficient to unambiguously determine the DOA parameters. In the (α, β, λ) normal coordinates, the parameter β is the subspace parameter; the rotation parameter α and eigenvalues λ would be in the viewed as nuisance parameters in the DOA application.

CRB for Basis Manifold formulation of Complex Linear Model
Theorem 7 gave the FIM as the expected value of the Hessian, ∇ 2 L, of the likelihood function. The likelihood function of interest has as its domain the product manifold B × D and so the derivative computation requires a basis for T Y B and T Λ D at each point (Y, Λ). In addition, for the Basis manifold formulation, the T Y B tangent space is nat- respectively, the coefficient matrices of ∇ 2 L with respect to this set are In addition, for d k a basis for T Λ D at an arbitrary point of D All derivatives are evaluated at the base point (Y, Λ) with normal coordinates (0, 0, λ).
The approach outlined in section 3.4 is used to compute the full FIM components. The required ancillary calculations are provided in the following two lemmas and the final FIM submatrices are given in the follow-on theorem. Since the parameters of interest, Y and Λ, make their appearance in the log likelihood (342) through the covariance matrix R (347) we first record the expansion of L in terms of R.
Lemma 8 (Log Likelihood Expansion as a function of the model Covariance R). The given log likelihood function (342) viewed as a function of R can be expressed as where the first order and higher order Taylor series terms are The expectation of δL is Proof: (see appendix 8) The following lemma records the value of F[∆, ∆] for an arbitrary tangent vector The term δR in the expansion (406) is then given by where The Proof: (see appendix 9) The component submatrices F αα , F ββ and F αβ of the full FIM are computed from (411) and (410) The submatrix of cross terms F αβ and the off diagonal terms of F ββ and F αα are computed via polarization.
Theorem 10 (Full FIM for the Complex Linear Model (337)). The full Fisher Information matrix for the log-likelihood function given by (342) in the (α, β, λ) normal coordinates (defined by the basis a k , b k and d k ) at the arbitrary point (Y, Λ) (normal coordinates (0, 0, λ)) has the block matrix form All the off-diagonal submatrices vanish and the square matrices F αα , F ββ and F λλ are defined below with respect to an arbitrary choice of basis for the tangent space. The basis set is and for B Y respectively. The matrices A k , B k are chosen so as to define an orthonormal basis. In the following Λ s and Λ are as given in (347) and (343), respectively and K is the number of independent snapshots. where This result is expressed alternatively in matrix form as and where E b is aN b × N b complex matrix defined in terms of the basis matrices B k as Since the cross terms F βα and F βλ vanish there is no estimation loss on the subspace in this example.  (422) is Substituting this into (420) yields The inverse of F ββ (426) is

Intrinsic CRB Subspace Distance
Ignoring curvature terms the covariance matrix C ββ (363) and FIM submatrix F ββ are related by the CRB matrix inequality As consequence trC ββ > trF −1 ββ . Note that both sides of the inequality are independent of the orthonormal basis selected for B Y .
The RMS subspace distance between the true subspace represented by the matrix Y and any unbiased estimate Y was defined independently of the geometric formulation Following the discussion in section 382 the intrinsic CRB on the subspace estimation accuracy is given by the formula Using (427) trF −1 ββ evaluates to where we have used (419) Remark 2: Equi-Power Sources: If λ k ≡ λ for all k = 1 · · · p, we have SN R = λ σ 2 and (432) becomes at high SNR (434) and (433) further simplifies to Recognizing the numerator,N b = 2p(N − p), as the dimension of the tangent space B Y this high SNR equi-powered case can be expressed as Remark 3: Note that neither the basis represented by the matrix Y nor its tangent space represented by the matrix E b appear in the bound (431).

EigenVector CRB
Using the mathematical framework established above we develop a CRB, in the form of a matrix inequality, on estimation of the ordered columns of Y in the statistical model (337). Recall that Y is defined as the eigenvector vector matrix of the true signal covariance (343). The standard estimator of Y is the eigenvector matrix of the SCM and, as is well known (and shown in chapter 4 ), this is the ML estimator of Y for the given model.
For convenience and consistency with [5] in this section we denote the columns of and those of the orthogonal space as Similarly we denote columns of the estimate Y asŶ = ŝ 1 · · ·ŝ p . The classical error covariance matrix result [5] when the estimatorŶ is the EVD (p-dominant part) of the SCM, is stated below.
Lemma 12 (Covariance of Eigenvectors [6] ). The estimation errorsŝ i − s i are asymptotically (for large K) jointly Gaussian distributed with zero means and covariance matrix given The CRB for this estimation covariance is developed as follows. In the geometric approach the estimation error is given by the tangent vector ∆ = exp −1 Y 0 Y . Given a fixed basis a k , b k for the partitioned tangent space this vector expands as (362) The covariance cov( ∆) satisfies the matrix inequality where the FIM sub-block component matrices F αα and F ββ are given by (439) and (426) (and F αβ = 0). In this formulation the error in estimating the k − th column of Y is ∆ · i k , where i k is a unit vector with 1 at the k-th location. This representation follows since ∆ ≈Ŷ − Y and so ∆ · i k ≈ŝ k − s k . We therefore seek a lower bound on error covariance cov( ∆·i k ). This bound is developed from (450) and is given in the following theorem.
where T k and M k are selection matrices defined below ( (B.132) and (455)) and F αα and F ββ are as defined in theorem 10. At high SNR the r.h.s. simplifies resulting in the Proof: (see B.1.4, outline of proof follows) The k-th column of ∆ can be expressed as Using the main result of theorem 10 expressed as (450) we have Substituting (439), (442) (439) for F −1 αα and using (446) and (447) yields the high SNR asymptotic result (452).

Constrained Basis Estimation
In many problems of interest the signal matrix H is constrained to some subset M The eigendecomposition of the true covariance HP H H defines a mapping between the The source correlation matrix P ∈ P p is not constrained and because of this the submanifoldB has the property that if Y ∈B then Y Q ∈B for any unitary Q. This property is important in defining the tangent space forB and is established in the following lemma.
Lemma 14. The submanifoldB defined as the image of mapping (459) has the property that if Y ∈B then Y Q ∈B for any p × p unitary matrix Q. Proof: The signal covariance R s ≡ HP H H therefore can be expressed as The eigenvector matrix of the signal covariance is therefore Y 0 Q. Note that Y 0 depends only on H while Q depends on both H and P through (460). Since the correlation matrix P ∈ P p is not constrained the q × q unitary matrix Q is likewise unconstrained.
SinceB ⊂ B the submanifold tangent space at any point Y ∈B is a vector subspace As a result of the lemma the tangent space T YB can be partitioned into a rotational space and its complement in the same way as Using this expression in (462) the full tangent space of B can be expressed as Motion in the N Y direction corresponds to subspace perturbations that leave the submanifold, while motions in the S Y and A Y directions remain on the constraint submanifold B. Motion in the A Y corresponds to a rotation and motion in the S Y direction corresponds to a subspace perturbation. In remainder of this chapter we denote dimension of Let s k denote an orthonormal basis for S Y and, as in the unconstrained case, let a k denote an orthonormal basis for A Y . Then with respect to this basis any tangent vector where s and α are the real components of the representation. The numerical form of the basis vectors for S Y is where the S k 's are appropriately chosen (N − p) × p complex matrices. The matrices that define the spanning basis for S Y vary from point to point on the submanifold and are defined by the problem specifics (i.e., the submanifold definition).

Estimation on the Submanifold
Let Y be an estimator for Y ∈B . The estimation error vector is defined by In this expression the basis vectors are fixed and the coefficients are the stochastic error componentsα =α(Z) andŝ =ŝ(Z). The error covariance, denoted C, is defined in terms of these components as where (see (363) ).

Computing the constrained case FIM 1
Given an orthonormal basis for the partitioned space (463) the computation of the FIM proceeds exactly as in the unconstrained case. Since the source of the constraint in this problem is the signal matrix H, while the source correlation matrix P is unconstrained, there are no additional constraints on the rotations or the eigenvalues. The associated submatrices ofF are therefore identical to those of the unconstrained problem, that isF αα = F αα ,F λλ = F λλ , as given in theorem 10. Also since F βα = 0 and F βλ = 0 in the unconstrained case it follows that F sα = 0, F sλ = 0. The full FIM for the constrained case is denotedF and has the general block form The q × q submatrix F ss with respect to the basis s (466) on S Y is In the parametrically defined submanifolds discussed in a following section, the parameterization definesB and are the natural coordinates onB. The tangents to these coordinate curves naturally define a basis for the tangent spaces S Y . A basis defined in this way is termed a natural coordinate basis.
Remark: The approach here uses an arbitrary but explicit basis (472) for the submanifold tangent space. An intrinsic CRB on the geodesic distance can be alternately computed using a projection onto the submanifold tangent space Theorem 2]. In this situation Of course, if an orthogonal basis for S Y is available defined by the transformation T s then tr(C ββ ) = tr(T s C ss T t s ) = tr C ss T t s T s = tr (C ss ) .
Note that in this expression C ββ is the submanifold error as measured in the ambient frame. See section A.12 for additional details on transformations.

Constrained CRB
Ignoring curvature terms, as in the unconstrained case, the lower bound on the subspace estimation accuracy is the given by the formula and The dimension ratio between the two different constraint assumption is The ratio of the unconstrained CRB to the constrained CRB gives the processing gain realized by an optimal subspace estimator onB ⊂ B relative to one that is optimal subspace estimator on B, that is For benign signal conditions (e.g., widely spaced uncorrelated sources in DOA estimation problems) this ratio is found to be equal to the dimension ratio given in (478), that is As is shown later, an estimator that is optimal for damped exponential modal model (r = 2) will suffer 3 dB loss when applied to the harmonic signal model problem (r = 1) consistent with (479).

Parametrically Constrained Problem
The computation of F ss (471) As in the general constrained case if Y ∈B, then Y Q ∈B. The submanifoldB is then the collection of all N × p semi-unitary matrices of the form Y (θ)Q where Y (θ) is left singular vector matrix of H(θ) for some θ ∈ R p and Q is arbitrary p × p unitary.

Defining the Submanifold Tangent Space
Let θ 0 = θ 1 0 · · · θ q 0 be arbitrary and let Y 0 be the corresponding eigenvector matrix of H(θ 0 )P H(θ 0 ) H . Denote the kth coordinate curve in parameter space as where e k is a q × 1 unit vector with 1 at the k-th location. The corresponding representation of this curve in the extrinsic Y coordinates on B with origin at Y 0 is given by where the function Y is defined by (482). This curve is in general not a geodesic. The tangent vector to Y (k) (t) at Y 0 at is denoted ∆ (k) ∈ T YB and is defined by The component of this vector corresponding to a subspace change iś where P B is the orthogonal projector onto B Y . The set of q vectors generated from the q coordinate curves in this way defines S Y (and therefore is a basis for S Y ). Since this basis is derived from a set of natural coordinate curves it is called a natural coordinate basis. In general this is not an orthonormal basis. An arbitrary vector ∆ ∈ S Y expands in this basis as We have used the suggestive notation δθ k to denote the components of ∆ with respect to this natural coordinate basis.

Computing the Natural Coordinate Basis
To calculate the derivative (484) . Suppressing the index k, the curve Y (θ (k) (t)) can therefore be written as The tangent vector to this curve at Y (θ 0 ), where the derivatives are evaluated at t = 0.
A change in the parameter θ results in motion in both the S Y (change in subspace) and A Y (pure rotation) directions. The component in S Y is where the (N − p) × p matricesŚ k are defined using (489) Remark A straight line in R p , the θ parameter space, emanating from the fixed point θ 0 in the direction of the vector δθ is given by the following equivalent forms The value of θ(t) at t=1 is θ 0 + δθ and δθ = δθ 1 · · · δθ q . The tangent to the where we have used Recalling from elementary calculus (see section A.2.4) that the partial derivative of where θ (k) (t) is the coordinate curve (483), an alternate expression for the coordinate basis vectors (485) is given byś Let Y 0 ∈B be the matrix corresponding to θ = θ 0 , let Y 1 ∈B be an arbitrary point near Y 0 , and let the difference vector be ∆ = exp −1 Y 0 Y 1 . The first order approximation of The term P B ∆, which represents the change in the subspace, is an element of S Y 0 and expands as The componentsś k are found using a Gram-Schimdt procedure (with respect to the inner product) and to first orderś k ≈ δθ k . The value of the parameter θ corresponding to Y 1 is Note that for

Generate an Orthonormal Basis on S Y from the natural coordinate basis
The computation of F ss is facilitated by using an orthonormal basis while the natural coordinate basisś k developed above is useful when analyzing the underlying parame- terization. An orthonormal basis may be formed from an appropriate linear combination of natural coordinate basis vectorsś k in the usual way. Let s k be defined by or in formal matrix form as where J is a q × q real matrix.
An arbitrary element ∆ ∈ S Y may be represented with respect to these two sets of basis vectors as or in formal matrix form as ∆ = s 1 · · · s q s = ś 1 · · ·ś q ś .
The transformation matrix J (Jacobian matrix) may be selected so that s k is an orthonormal basis. The real orthogonalizing matrix J is computed from the basis representations in the usual vector space way. Let the numerical representation of the basis vectors s k andś k be expressed as whereŚ k is given and S k is to be determined. The transformation (501) may then be expressed as or alternatively in matrix form as where E s and Eś are formed by vectorizing the set of matrices S k andŚ m , that is The real q × q orthogonalizing matrix J is then (see section A.10) Using this transformation, estimates and bounds computed with respect to one coordinate frame may be transformed to the alternate frame, which may have a more meaningful interpretation. Since F ss (471) is a tensor quantity it obeys the tensor transformation rules, that is or equivalently As noted in [2] "because the Jacobian determines how coordinate transformations effect accuracy bounds, it is sometimes called a sensitivity matrix." This effect on error measures is explored in section 5.2.4.

Modal Forms
The above expressions (491) for the basis vectors on S Y take a simple form when the signal matrix H(θ) has a modal form. We first develop the 1-parameter per mode case (r = 1, M = 1 so that q ≡ r · M · p = p), before addressing the general case. For convenience we define the N × p matrix of derivatives as where . Noting (483) and the modal form (514) this derivative is In this expression J kk is the p × p matrix selection matrix defined by where e k is a p × 1 unit vector with 1 at the k-th location. J kk thus has 1 at the k-th diagonal location and zeros elsewhere.
Substituting (516) into (491) yields the modal form for theŚ k matrix, appearing in the basis vectorś k = Y ⊥Śk , asŚ Suppressing the arguments, using the matrix identity vec(ABC) = (C t ⊗ A)vec(B) and noting (510), yields where j kk = vec(J kk ). The complete Eś matrix (510) may then be written as where G is a p 2 ×p real matrix whose columns are the vectorized p×p selection matrices J kk , that is G ≡ j 11 · · · j pp ≡ vec(J 11 ) · · · vec(J pp ) .
The factor Example: For p = 2 the individual selection matrices are so that Lemma 15 ( Orthogonalizing Matrix J, for 1-Dimensional case, q = p). For the modal case the p × p orthogonalizing matrix J (511) is To bring the F ss matrix into the desired form we use the following identity which relates the diagonal matrix Λ OP T (418) appearing in general FIM definition (471) to the matrix P s ≡ (P · H H · R −1 · H · P ) which appears later in the parameter space FIM definition (533) (see also [6]).
Lemma 16. Equivalence of Forms [5, eqn. 75] Let Y Λ s Y H = HP H H (340) and let be T be the p × p non-singular matrix defined by Y = HT.
Define the p × p matrix where R = HP H H + σ 2 I. Then where Λ OP T is given by (419).
Proof: (Appendix B.1.7 ) The FIM subspace block matrix F ss for the modal case is found using (527) and (520) in the general case definition, (471), and is recorded in the following theorem.
Theorem 17 ( FIM subspace block matrix F ss Modal Case). Let E s = EśJ (509) where Eś is defined by (520). Then the F ss matrix (471) with respect to this basis is where J and P s are defined by (524) and (526) respectively.
Proof:(Appendix B.1.8) The inverse of (528) is Substituting (529) into (476) the lower bound on subspace estimation accuracy is the given by the formula Using (524) J −2 evaluates to Coordinate Transformation of the subspace FIM The subspace FIM with respect to the natural coordinate basis, denoted Fśś, may be found via the transformation of F ss as (513). Using (528) and (513) yields The inverse FIM is and the bound on the estimation error covariance is The bound on the error variance is Recalling that ∆ = q k=1ś k δθ k , that is, the components of the tangent vector represent the error in the parameter estimates, we have and The inverse FIM in (533) is recognized as the asymptotic (for K >> 0) covariance matrix of the unconditional maximum likelihood (UML) estimate as given in [6]. Of course, the FIM could first be computed with respect to the natural coordinate basis (520) directly from (471) and then transformed into the orthonormal coordinates.

Generalization to Higher Dimensions: Modal Model
The above results were developed for the one parameter per mode case. These result can be generalized to theM = r·M parameters per mode case in a straightforward way. In array processing applications M denotes the geometric dimension of the array and r = 1 for undamped exponentials and r = 2 damped exponentials.
In this general case the dimension of the S Y tangent space is q = r · M · p where p is the number of modes. In the 1-dimensional array problem M = 1 and for the damped exponential signal model (shift invariant model) there are 2-parameters per mode, r = 2 so that q = 2p. In the 2-dimensional array case, M = 2, for the undamped exponential model r = 1 so that q = 2p, In the damped exponential model, r = 2, so that for the 2-dimensional array case q = 4p.
Thus each mode h(:) is defined by r · M parameters. The derivative matrix is the N × q complex matrix where D m is an N × p complex matrix given by and The m; k-th column of the derivative matrix D(θ) corresponds to where J mkk is a q × p selection matrix given by In this expression, J kk is the p × p selection matrix given by (517) and i m is rM × 1 unit vector with 1 at the m-th location (zeros elsewhere). Then the multi-dimensional extension of (518) isŚ and so where the index l is given by l = m + k.
The p-columns of the matrix Eś corresponding to the m th parameter vector θ m where G m is the qp × p G m = vec(J m11 ) . . . vec(J mpp ) .
Concatenating rM such matrices of this form gives the complete Eś matrix of size or equivalently where Example:2-Dimensional Array Harmonic Signals In this case r = 1, M = 2 so that there are 2-parameters per mode and for p modes q = 2p. The modal signal matrix has the functional form In this expression we used the suggestive notation m = h, v for the parameter index (i.e.,h =horizontal, v =vertical). The derivative matrix (541) where and where and the 2p × p selection matrices (545) are and For 2 source signals, that is p = 2 we haveN b = 2(N − 2) and q = 4.
The two J ikk selection matrices are each 4 × 2 so that vec(J ikk ) is 8 × 1. The matrix G (557) formed from 4 such selection matrices is therefore 8 × 4 and given by TheN b × q matrix Eś for this choice of M, p and r case is and so is dimensionally correct.
and the N × 4p derivative matrix D has the form where each of the 4 submatrices appearing above are N × p.

Pertinent Signal Models Summary
Generalization to Higher Dimensions: FIM and CRB Lemma 18 (Orthogonalizing Matrix J forM > 1 case). For the matrix Eś given by (551) the following equality holds where and where 1 M is a M × 1 column vector of all 1s. Proof: The real symmetric q × q orthogonalizing matrix J is where XM is q × q given by and where 1M is anM × 1 column vector of all ones (M = rM ).
Proof: Appendix B.1.6 Remark: For the case of the harmonic signals (r = 1) on a 1 dimensional array M = 1,M =1 and (1M ⊗ 1 tM ) reduces to the scalar 1 so the previous result for the orthogonalizing matrix given by (511) is recovered.
The multidimensionalM > 1 generalizations of formulas for F ss (528) and Fśś (532) are given in the following theorem.
Theorem 19 (FIM subspace block matrix F ss Modal Case,M > 1). The subspace block of the FIM with respect to the natural coordinate basis is where and P s is given by (526) or (527). The subspace block of the FIM w.r.t the orthonormal basis is where J is given by (567) Proof: B.1.9 Intrinsic CRB on subspace estimation accuracy The lower bound on subspace estimation accuracy is the given by the formula where and from (567) CRB on the parameter θ Following the discussion for the 1-dimensional case, the estimation error covariance C θθ satisfies the CRB matrix inequality where the right hand side is inverse of Fśś (569) and This is recognized as the asymptotic (for K 0) covariance matrix of the unconditional maximum likelihood (UML) estimate as given in [6]. The error variance is therefore bounded as The variance trC θθ is equivalent to List of References [1] A. Boumal, "On intrinsic cramér-rao bounds for riemannian submanifolds and quotient manifolds," IEEE Transactions on Signal Processin, vol. 61, no. 7, April 2013.

CHAPTER 4 Maximum Likelihood Estimator
The log likelihood function (ignoring constants) for the stochastic complex linear model introduced in chapter 3 is where the parameters Y, Λ, σ 2 appear in the function as the model covariance where The data Z ≡ [z 1 , · · · z k ] is compressed in the SCM, denoted byR = K −1 ZZ H , which has the eigen decomposition whereΛ s is p × p and Λ o is (N − p) × (N − p). The same symbol "R" is used for both the SCM as "R" and for a model covariance (580) expressed as a function of the unknown parameters Y, Λ, σ 2 .
The MLE estimates of σ 2 and Λ are given in terms of the components of the SCM (582) asσ In the remainder of the discussion we consider the compressed likelihood function defined as L(Y ;R) ≡ L(Y, Λ =Λ, σ 2 =σ 2 ;R).
Let α, β denote a normal coordinate system generated by the partitioned basis (a k , b k ) with its origin at an arbitrary point Y ∈ B. The arbitrary tangent vector ∆ expands as The transformation to extrinsic coordinates is Y (α, β) = exp Y ∆(α, β). Under this coordinate transformation, or change of variables, a function L defined on B in the extrinsic Y coordinates as L(Y ) becomes where Y = Y (α, β) expresses the coordinate transformation.
The partial derivative ∂L ∂(α k ) is defined as the directional derivative in the direction specified by the coordinate basis vector a k = Y ⊥ A k . The partial derivative ∂L ∂(β k ) is defined in a similar way. The following lemma first records the form of the directional derivative for an arbitrary direction specified by the tangent vector and then computes the desired coordinate partials using the appropriate choices for A and B.
Lemma 20 ( First Directional Derivative). The directional derivative of the function L (579) in direction of the arbitrary tangent vector where For ∆ = b k = Y ⊥ B k , a basis vector for B Y , this evaluates to For ∆ = a k = Y A k ,a basis vector for A Y , this evaluates to Proof: (see appendix C.1) The ML estimate of Y ∈ B is defined as At this value the partials (590) and (591) vanish; the Y ∈ B satisfying this simultaneous set of equations is recorded in the following theorem.
Theorem 21 (Unconstrained Maximum Likelihood Basis Estimate). A necessary condition for the point Y to be an extremal of L is that partial derivatives w. and The solution to this set of equations that maximizes the value of the L is given by Proof: (see appendix C.2) Recall that the equivalence relation on the basis manifold is Y ∼ Y Λ Q where Λ Q is an arbitrary unitary diagonal matrix. Therefore the numerical equivalence class of ML estimates is the set Y Λ Q , Λ Q arbitrary unitary diagonal. In numerical applications one representative of the class must be selected and the choice used in what follows is Y =Ỹ s , the principal eigenvectors given by EVD of the SCM (582).
The model covariance function R (580) evaluated at the MLE parameter values is denoted R and is given by and the inverse is Using this form the product R −1R which appears in the ML function definition when evaluated at the MLE is or where Using this expression for R −1R the value of the function at the ML estimates is

Approximating the compressed likelihood function nearỸ
In this section we consider the compressed log-likelihood function as L(Y ) ≡ L(Y ; Λ,R). We develop a 2nd order approximation to L(Y ) in a neighborhood of the unconstrained MLEỸ ∈ B. Let ∆ ∈ TỸ B be an arbitrary unit length tangent vector and let Y (t) be the corresponding geodesic curve emanating fromỸ given by Y (t) = expỸ (t∆). The value of the log likelihood function on this curve is denoted L(Y (t)) .
The Taylor series expansion of L(Y (t)) is where Y (0) =Ỹ . At the MLE the first derivative dL(Y (0)) dt vanishes by definition.
The component sub-matrices of the 2nd covariant differential ∇ 2 L with respect to this basis are given by the ordinary Hessian of L(α, β) in block form by where and where Λ bb is real theN andΛ OP T =Λ 2 (Λ −σ 2 I) −1 with δΛ o given by (602).
The mixed derivative submatrix is identically zero For the standard basis on BỸ , the basis matrix given by (425), the matrix L ββ (607) simplifies to The second directional derivative in the direction and using (601) (612) is To compute the desired derivative recall (410) and (B.81), that is Using this, the second term in (614) is where Λ OP T ≡ ΛΛ −1 s Λ = Λ −1 s Λ 2 . Evaluating the first term in a similar way and summing the two terms yields Applying the matrix identity B.1.10 the second term is and the first term is Replacing the B by B k and substituting (619) and (620) into (618) yields the desired 132 The second order term in the Taylor expansion (605) may be written as Using the above quadratic expression and L αβ = 0 the form (605) is where we have used the notation L(tα, tβ) = L(Y (t)) to represent L(Y (t)) in the (α, β) normal coordinates. Absorbing the parameter t into the components(α, β) a 2nd order expression for the likelihood function in the neighborhood of Y is Note that the origin of (α, β) coordinate system corresponds toỸ .
Partitioning the real N b -tuple β as β = β R β I and using (607), the quadratic form Defining the complex vector b ≡ β R + i · β I this becomes or

Constrained MLE Criterion Function
In the constrained problem the log likelihood function has the same form as in the unconstrained case but with a restriction on the domain. In this problem the set of N × p semi-unitary matrices Y is restricted to a lower dimensional submanifold, denotedB, of the full basis manifold B.
The constrained MLE of Y is then defined by The membership constraint condition Y ∈B can be stated in terms of the exponential map based at the unconstrained MLE,Ỹ , as expỸ ∆ ∈B where and ∆ ∈ TỸ B. The set of tangent vectors atỸ that satisfy this constraint is denoted VỸ Using the quadratic approximation of the likelihood function (624) and representing ∆ = ∆(α, β) as (585) the quadratic ML criterion is Let Y ∈B be an arbitrary point nearỸ . Suppose that the corresponding tangent vector is ∆ = exp −1 Y Y and that the components of ∆ are (α, β) so that Y = expỸ ∆(α, β). Following the discussion in section 2.3.2 we have where the matrix Q 1 can be expressed as Q 1 = exp a k α k . Recall from lemma 14 that if Y ∈B, it is also the case that Y Q ∈B for Q an arbitrary p × p unitary matrix.
Since both terms in (630) where the set of tangent vectorsVỸ ⊂ VỸ is defined as Since elements of the space BỸ have the general numerical representation where B is an (N − p) × p complex matrix, the setVỸ (641) can be defined in terms of The constrained MLE form (633) can then be restated as in terms of complexN b -tuple b using (A.265) asb

Solution to the Constraint MLE criterion using an S-N Frame
For any Y 0 ∈B the full tangent space B Y 0 , the space corresponding to subspace perturbations, is naturally partitioned as figure 7). Using parallel translation, a similarly partitioned basis for BỸ at the unconstrained MLEỸ may be formed. Using this S-N frame an alternate form of the MLE criterion (637) is developed.
After developing this form we consider a new solution method when the submanifold is defined by a shift-invariant constraint condition.

Constraint Submanifold using an S-N Frame: Examples
Before considering the full problem on the Basis manifold we discuss an example set in a 3-dimensional Euclidean space with the primary objective of illuminating the submanifold membership constraint condition (636). In order to make the ideas more easily transferable we denote this Euclidan space by B and generic points in the space by the symbol Y . The submanifold of interest, denotedB, is a two-dimensional plane surface.
Consider a point Y 0 ∈B and let n be normal toB and let a and s be an orthogonal pair tangent toB. Define the 2-dimensional space spanned by n and s as B Y 0 and the 1-dimensional space spanned by a as A Y 0 (the full 3-dimensional tangent space, denoted LetỸ denote a point not on submanifoldB that lies in plane defined by B Y 0 . The submanifold membership condition Y ∈B can be stated in terms ofỸ and the displacement vector ∆ as (Ỹ + ∆) ∈B. The set of vectors ∆ that satisfy this condition is denotedV and is defined as (compare to (634)).
By translating the set of basis vectors a, s, n from Y 0 toỸ we establish a basis for the vector space BỸ . Since the space is flat these vectors can be translated to the point Y in the usual rigid way (i.e.; no rotations). Representing ∆ with respect to this basis creates a new coordinate system with origin atỸ .
The vector with head atỸ and tail at Y 0 is given by ∆ = (Y 0 −Ỹ ). Since by construction Y 0 andỸ lie in the same B Y 0 plane this vector can be represented in the S-N basis for BỸ as ∆ = ss 0 + nn, wherê The 3-tuple (α = 0, s = s 0 , n =n) are the normal coordinates of Y 0 with respect to the given basis with origin atỸ . Since, by construction, s atỸ is parallel to the planeB, elements in the setVỸ are given by wheren is given by (639) and s is a free parameter.
If Y ∈B then Y =Ỹ + (sŝ + nn) forn fixed and s chosen appropriately. The components (0,ŝ,n) are the normal coordinates of Y with respect to the s, n basis atỸ .
The point Y can, of course, also be expressed in terms of a coordinate system at with The distance betweenỸ and Y is equal to the length of the tangent vector ∆ and, for the s, n basis, we have From this expression it is evident that the point ofB nearest toỸ , denoted Y M D where the subscript stands for minimum distance, has normal coordinates (0,b) so that

Constraint Submanifold using an S-N Frame: Basis Manifold
The development of the submanifold membership condition for the manifold case proceeds in same way outlined above with a few modifications. Recall that for any point Y 0 ∈B, the tangent space B Y 0 of the ambient manifold B partitions as where N Y 0 is normal to the submanifold tangent plane T Y 0B = A Y 0 ⊕ S Y 0 . Let s k and n k be an orthonormal basis set for S Y 0 and N Y 0 , respectively.
In the Euclidean space example this basis defined at Y 0 was translated in the usual parallel way to form a basis for the tangent space BỸ at the pointỸ . In the general case this simple translation is replaced by its generalization, termed parallel translation.
Denoting the parallel translation operation as τ we can define a set of q vectors atỸ by Denote the subspace of BỸ spanned by this set as SỸ and its orthogonal complete as NỸ so that Let (n k )Ỹ be an orthonormal basis for NỸ . When it is clear from the context the subscript indicating the base point of the vector is suppressed for convenience.
Using the orthonormal S-N basis for BỸ a vector in this space expands as Given the two pointsỸ and Y 0 the tangent vector between them is defined by ∆ = exp −1 Y Y 0 and the components, denoted (s 0 ,n) are found using (639) and (640). These components (0, s 0 ,n) are the normal coordinates of Y 0 with respect to the given basis for BỸ .
The set of tangent vectors ∆ ∈ BỸ that satisfy the membership condition expỸ ∆ ∈ B can be expressed in terms of this S-N basis and then component defined above. Since SỸ is parallel to S Y 0 the set of tangent vectorsVỸ (641) have the form wheren is fixed by (639) and where s is a real free q-tuple.
The geodesic distance between the unconstrained MLEỸ and nearby elements of Y ∈B is given by the magnitude of the tangent vector between them. Using (647) this magnitude is given in terms of the constraint-satisfying pair (s,n) as The point onB that is the minimum distance fromỸ , denoted Y M D , occurs for s = 0

Alternate Real Representations and the setVỸ
Suppose that a partitioned set of orthonormal S-N basis vectors for BỸ are given by s k =Ỹ ⊥ S k and n k =Ỹ ⊥ N k . Using these basis vectors to represent the vector ∆ =Ỹ ⊥ B as in (647) yields

Vectorizing both sides of this equation yields
where b = vec(B) and From (648) the set of complexN b -tuples that satisfy the membership constraint condition is given by and s is a free real q-tuple.

Constrained MLE in terms of b
Substituting the constraint-satisfying form for b (655) into the constrained MLE criterion (637) yields an unconstrained minimization in ŝ s = min s Re (b n + E s s) H Λ bb (b n + E s s) .
Expanding the quadratic form on the right hand side yieldŝ where the submatrix F ss (see (471)) is and This term (660) corresponds to the first derivative of L with respect to the subspace coordinate variable s evaluated at the point Y M D (643). This is discussed more fully in appendix C.6. Since the last term on the right hand side of is Re(b H n Λ bb b n ) ≥ 0 and indepdendent of s the solution to (657) is given bŷ which isŝ Substituting this result forŝ into (655) The MLE for Y ∈B denoted Y then is given by where the MLE tangent vector is

Solution Methods
The estimation method outlined above required a reference point Y 0 on the sub-manifoldB that was near to the unconstrained MLEỸ . When the submanifoldB is defined by some underlying parametric model, H = H(θ), a standard approach to implementing the above program is to leverage a suboptimal estimate of the parameter, say θ 0 , to establish this reference point. Using this point and the differentiable signal model, a basis for S Y 0 is readily computed. Parallel translations then yields a basis for SỸ , if desired, although it is generally preferable to work with directly with S Y 0 when available.
In this situation the overall approach reduces to a 1-step Newton's method (662) on the likelihood function in the parameter space. The overall performance of this approach depends in large part on the quality of the initial suboptimal estimate θ 0 . An example of this type of MLE approach using Newton's method is described in [1]. This Newton's method approach is discussed in the context of the established geometric framework in appendix C.6.
In the special case of complex damped exponential signal model on uniformly sampled multidimensional arrays, the constrained submanifold is denotedB damped . In this case, matrix representation of any Y ∈B damped satisfies a shift invariant property defined below. By using this shift-invariant structure in a more integrated way, we develop a closed-form ML estimator of Y ∈B damped .

MLE for Submanifold Defined by Shift-Invariance: Uniform Arrays
In this section we consider the submanifoldB damped ; for notational convenience we drop the subscript and denote this submanifold simply asB. The setVỸ (641) of submanifold constraint satisfying tangent vectors is specified in (655) by the matrix E s and the complexN b -tuple b n . The matrix E s is a representation of a basis for SỸ . Thē Y ,E s and b n define the set of constraint-satisfying tangent vectors (655). Once this set is known, the ML estimate Y ∈B is computed from (664), (665) and (663).

Complex Representation of Constraint
In the previous development we treated the submanifold tangent spaces S Y as real vector spaces. This was convenient for analysis purposes, since in the H(θ) model the parameters of interest are themselves real (and potentially odd in number). For the special case of the damped exponential signal model, these spaces may be represented as complex vector spaces of lower dimension. This key feature follows from the form of the derivative matrix that defines the space S Y and the related fact that the signal parameters occur in pairs, so that the real dimension of S Y is even.
Lemma 23 (Basis for the subspace S Y onB damped ). The basis matrix E s (653) for thē B damped assumption has the form as The columns ofĒ s span S Y as a complex vector space of dimensionq = q 2 . To see this, let ∆ ∈ S Y be arbitrary, where S Y is spanned by E s (see (A.161) ). By definition, this vector has the form ∆ = Y ⊥ B for some matrix B. Let s be the real q-tuple such that b = E s s, where b = vec(B). Referring to (666) this can be expressed as where the real q-tuple s has been partitioned into twoq-tuples. (i.e.; s R and s I are defined by the firstq and lastq elements of s, respectively). Define the complexq-tuplē Using this, (667) The following theorem records the complex form of the derivative matrices L s and L ss that appear in the ML estimate of s using this complex vector space formulation of the space S Y .
Theorem 24. If the basis matrix E s has the form (666)  Proof: see C.4 As a consequence the ML estimate (663) is given bŷ

Solution
Elements ofB damped satisfy a shift invariance condition, defined later, which can be expressed in terms of a vector valued function r(:) defined on B such that Y ∈B if and only if r(Y ) = 0. In this case the submanifold membership condition is and the set of constraint-satisfying tangent vectorsVỸ (641) is Using the fact that elements of BỸ have the general form ∆ =Ỹ ⊥ B, we express the constraint as a function of b = vec(B) for a fixedỸ ⊥ r(b;Ỹ ) ≡ r(expỸ ∆) .
Consider the Taylor series expansion of the constraint function r(b;Ỹ ) and for convenience define ( H is a function ofỸ ⊥ ). The first order approximation of the constraint then is A critical feature of this approximation is that H is constructed in such a way that it has a null space null( H) of complex dimensionq = dim SỸ , and this null space well approximates SỸ . LetÊ s denote the matrix whose columns are a basis for null( H), whereĒ s is a basis for SỸ .
The set of constraint-satisfying tangent vectorsVỸ (636), (673) is approximated bȳ Elements in this set are given by least squares solution of r(Ỹ ) − Hb 2 with respect to b, which is b =b n +Ê ss ands is a complex freeq-tuple.
Example:1-Dimensional ULA On a N -element uniform line array (ULA) a single damped exponential signal has the form where z = ρ exp(iθ). The parameters θ and ρ are reals and denote the spatial frequency and damping parameter, respectively, of the complex exponential waveform. If J 0 and J 1 represent selection matrices that select the elements (0 : N − 2) and (1 : In the case of multiple signals we use subscripts so that h k = z where Ψ = diag (z 1 , · · · , z p ).
Because a single damped exponential is specified by two real parameters all submanifolds here are of even dimension, that is dim(S Y 0 ) = q = 2p. p =q.
Definition: Shift Invariance for Semi-Unitary Matrices: Define the function and J k , J 0 ∈ R N (sel)×N are the selection matrices that select the N (sel) elements from the N elements. A N × p semi-unitary matrix Y is said to be shift invariant with respect to the pair of selection matrices J k and J 0 if The relation (689) is often referred to in the literature as the invariance equation [1].
Lemma 25. If Y is shift-invariant with respect to J 0 and J k then Y Q is also shift invariant with respect to this same pair.
Proof: (appendix C.5) The following lemma expresses the invariance condition X k (Y ) = 0 in a reduced form.
Lemma 26. Let Z = orth(J 0 Y ), and Z ⊥ its orthogonal complement, and define Then Proof: The columns of [ZZ ⊥ ] span the range space of X k so that It follows that Note that each r k (Y ) is a complexN b -tuple so that r(Y ) isM ·N b × 1. For an appropriately chosen a set ofM +1 selection matrices the elements ofB damped satisfy or equivalently, in terms of the exponential map atỸ , Approximating the constraint A key idea in developing the approximation for (694) where and In the above we have used and whereΦ k ≡ Φ k (Ỹ ) and The tangent vectors of interest are elements of BỸ < TỸB damped , which have the general form∆ =Ỹ ⊥ B. Substituting this form into (698) yields Vectorizing (702) and (699) using the vec(ABC) = (C t ⊗ A)b identity yields the vectorized form for (697) as and and where r k (:) ≡ vec(R k (:)) and δΦ k (b) ≡ δΦ k (Y ⊥ B). Note that H k and C k arē To arrive at the desired form we note that, at high SNR, small b , that Dropping this term and the higher order terms in (703) results in the approximation From these component function linearizations (707) we approximate the full constraint (694) as The matrixH(Ỹ ) is approximately rank N −q. Motivated by the dimension of the constraint submanifold tangent space S Y with complex dimensionq, we replaceH(Ỹ ) by its rank N −q approximation, denoted H. This defines the matrix H in (677) and (679) and using (659) through (671) completes the subspace estimation algorithm, which is summarized in the next section.

M LESI Algorithm Summary
1 Compute EVD of the SCM in partitioned form (p-dominant part)

Compute the full SVD ofH asH
TheĒ s matrix isĒ s ≡ V 0 .

Example:2-Dimensional Regular Grid
For the 2-dimensional uniform rectangular (URA) let N 1 and N 2 denote the number of elements in the two array dimensions (row and column) so that N = N 1 · N 2 , is the total number of elements. A signal may be represented by the Kronecker product For the multiple signal case the kth signal is Let J h;0 , J h;1 and J v;0 , J v;1 by the selection matrices for the two array dimensions; that is, and This holds for the different permutations. First consider The factor on the right hand side is 1 Vectorizing this equation using the identity vec(ABC) = (C t ⊗ A)vec(B), observe that the right hand side becomes where h ≡ vec h (1) ⊗ (h (2) ) t and J 1 ≡ (J h;1 ⊗ J v;1 ). The left hand side of (730) factors in the same way yielding where Organizing these into matrix form yields where

CHAPTER 5 Summary and Simulation Results
We briefly summarize the previous developments before presenting numerical simulation results demonstrating the various bounds and estimators. In chapter 3 we considered the standard stochastic complex linear model in the form where Y was an unknown deterministic N × p semi-unitary matrix, viewed as a point in the Basis manifold B, (a quotient manifold of the complex Stiefel manifold).
The data appeared in the log likelihood function L(Y, Λ;R) as the SCM,R = whereỸ is the N × p semi-unitary matrix corresponding to the p largest eigenvalues.
The parameter Y ∈ B appears in the compressed model covariance given by whereσ 2 , defined in equation (583), and Λ = Λ s −σ 2 I correspond to the ML estimates of Λ and σ, respectively.
In this geometric formulation our objective was to establish intrinsic CRBs for the accuracy of unbiased estimators of Y ∈ B, or equivalently, bounds on subspace and rotational estimation accuracy.

Unconstrained Problem: Y ∈ B
The subspace distance error measure was defined, independently of the differential geometric formulation, as the square root of the sum of squares of the subspace angles γ, between the generic estimator Y and the true Y In MATLAB γ is computed in radians as gamma=(acos(svd(orth(Y1)' orth(Y2)))) .
The mean square error used to evaluate subspace estimation accuracy was We showed that in the geometric Basis manifold formulation, this subspace distance corresponds to a minimum point-to-fiber submanifold geodesic distance (see figure 6,section 2.3.1). The tangent spaces of B are naturally partitioned as B Y the space of subspace perturbations and A Y the space of rotations. We showed that, if P A and P B are the orthogonal projection operators from T Y B to A Y and B Y respectively, then, to first order, distSS( Y , Y ) = P B ∆ , where ∆ is the estimation error vector ∆ = exp −1 Y 0 Y . Given an orthonormal basis a k , b k , for A Y 0 and B Y 0 respectively, an arbitrary tangent vector ∆ expands as and the parameters (α, β) are said to be the normal coordinates of the point Y ∈ B.
We used this change of variables to express the log likelihood function as L(α, β) ≡ L(Y (α, β)) and to compute the full FIM for this coordinate choice in theorem 10. Since we developed the CRB on subspace estimation accuracy by computing FIM in this normal coordinate parameterization (see section A.1). In particular, we showed that the subspace block of the full FIM was where and that the off diagonal submatrices, (F αβ , etc.), vanished. From this we showed that the intrinsic lower bound on p-dimensional subspace estimation accuracy, CRB SS , was as the number of independent snapshots K → ∞. This is confirmed in the numerical simulations that follow in section 5.1.1.
The mean square error used to evaluate rotational estimation accuracy was where distR 2 ( Y , Y ) = µ 2 and µ is the p-tuple of rotational angles between Y and Y .
Using the same normal coordinate parameterization, we showed that the intrinsic lower bound on rotational estimation accuracy was so that R ≥ CRB R . Denoting the rotational error in the EVD estimate as svd R we have as K → ∞; this is confirmed in the numerical simulations that follow. These rotational parameters are considered nuisance parameter in the usual subspace signal processing application, but are required to properly parameterize the likelihood function.

Simulation Results
In chapter 4 we showed that the EVD of the SCM was the ML estimator of Y ∈ B for the statistical model (740). In this section the performance of the EVD estimator is compared to the intrinsic CRBs for subspace and rotational accuracy established in chapter 3. A Monte Carlo simulation based on the statistical model (740) for N = 5 and p = 2 is considered. We note that for this unconstrained case, both CRB SS (750) and CRB R (753) are independent of Y . In the standard simulation approach Y would be fixed with n 1 and n 0 drawn from random distributions. Because the relevant CRBs are independent of Y , in the simulation approach used here, Y is also chosen randomly from the uniform distribution on B 5,2 . In MATLAB this random semi-unitary matrix is generated as Y=orth(randn(N, p)+i randn(N, p)); .
The random source vector, n 1 , is drawn from CN (0, σ 2 s I p ) and the additive noise, n 0 , is drawn from CN (0, σ 2 I N ). The input SNR is then defined as SN R = σ 2 s σ 2 . Once σ 2 , σ 2 s , Y , and the number of independent snapshots K are specified, CRB SS is computed using (750). Note that in the constrained cases considered in the next sections, that is Y ∈B ⊂ B, the subspace accuracy bound does depend on the choice of Y , and so Y must be fixed in the later simulation examples.
One thousand (N trials = 1000) Monte Carlo trials are performed, each of which consists of generating a normal N × K data matrix Z = z 1 · · · z N and computing the SCMR = K −1 ZZ H . Using the MATLAB "eig" function, the estimate Y is formed as the p dominant eigenvectors vectors,Ỹ , of R (741). The sample RMSE subspace distance, denoted SS , is then computed over the N trials simulation trials as where Y n represents the estimate for the n-th trial. Likewise, the sample RMSE rota- The results comparing the accuracy of the EVD estimator to the CRB SS are shown in Figs. 9 and 10 as functions of the SNR and sample support. In figure 9, the number of samples is fixed at K=10, and the SNR is varied. For this small sample support case, the EVD-based estimator achieves an accuracy that is a small constant fraction above CRB SS (750). In figure 10 the SNR is fixed and the number of snapshots is varied. As anticipated, EVD-based estimator closely approaches CRB SS as the number of snapshots, K, becomes large.
The intrinsic lower bound on rotational estimation accuracy, CRB R (753), is a function of the difference between eigenvalues, but does not depend on the particular choice of Y ∈ B. The EVD rotational estimation performance, measured as (757), is assessed using the same general Monte Carlo simulation approach used to assess subspace estimation performance. In the first test case, the additive noise power σ 2 and the larger signal eigenvalue λ 1 are fixed and smaller of the two signal eigenvalues is varied as λ 2 = c · λ 1 , c ≤ 1. Results for this test case are shown in figure 11 as a family of curves, with each curve representing a different SNR (i.e., noise power). The curves in the lower portion of the figure are high SNR, while the curves in the upper portion are low SNR. As the figure indicates, for fixed SNR the larger the power difference between the two sources (power ratio, c, is small), the better we are able to estimate the rotation.
In the second test case λ 1 , λ 2 and σ 2 are fixed and the sample support is varied. Results for this case, shown in figure 12, demonstrate that the EVD-based estimate of rotation asymptotically approaches the lower bound, i.e., the EVD is an asymptotically efficient estimator of rotation.

Constrained Problem: Y ∈B ⊂ B
In many problems of interest the set of N × p semi-unitary matrices, appearing in the model (740) The rotational space A Y is identical in both decompositions and S Y < B Y , so that B Y , the space of subspace perturbations for the full tangent space T Y B, can be decomposed as Motion in the S Y direction corresponds to subspace perturbations that remain on the submanifold, while motion in N Y corresponds to subspace perturbations that exit the submanifold in a normal direction. A basis {s k } and {n k } for S Y and N Y , respectively, forms a basis for B Y termed here an S-N basis.
The FIM for the constrained case is computed in same way as in unconstrained case. The subspace block of full FIM (471) for the general constrained problem is where E s is defined in terms of a basis on S Y , s k = Y ⊥ S k , as Figure 9. RMSEs of the EVD-based subspace estimator and the CRB of (435) versus SNR for the estimation problem of (740) on B. The RMSE of the EVD estimate is the the redline; CRB SS is the solid (gray) curve. The RMSEs of the EVD-based subspace estimators are a small constant fraction above the subspace CRB over all SNRs shown. Results are shown for K = 10 and K = 128. Figure 10. RMSEs of EVD-based subspace estimator and CRB SS (750) versus sample support for the estimation problem of (740) on B 5,2 . CRB SS is the solid (gray) curve; the subspace RMSE values appear as red dots. Results are shown for three SNRs, 1 dB, 7 dB and 13 dB. As expected, the RMSE of the EVD estimate approaches the subspace CRB as the sample support becomes large, i.e., this maximum likelihood estimator is asymptotically efficient.   (750) versus sample support for the estimation problem of (740) on B 5,2 . CRB SS is the solid (gray) curve; subspace RMSE values appear as red dots. The signal eigenvalues are fixed as λ 2 = 0.5λ 1 . As expected, the RMSE of the EVD estimate approaches the rotational CRB as the sample support becomes large, i.e., this maximum likelihood estimator is asymptotically efficient.
for an appropriately chosen set (N − p) × p matrices S k . We use the notation crb SS to denote the subspace CRB for a general constrained submanifold assumption (the subspace CRB for the unconstrained assumption is denoted CRB SS ). Ignoring curvature, the bound on subspace estimation accuracy is

Parametric Constraint General and Modal
When the signal matrix is defined by some underlying parameterization, H = H(θ), the eigen decomposition of the true signal covariance effects a transformation between the (θ, P ) and (Y, Λ) parametrizations By definition, the eigenvector matrix Y is an element of Y ∈B ⊂ B. Both parameterization serve as coordinate systems on the product manifoldB × D, the domain of the log likelihood function.
The (θ, P ) parameterization of the problem has a total of q + p 2 independent parameters. The dimension ofB is equal to the dimension of the tangent space Note this value is consistent with the (θ, P ) parametrization.

Changes in the source correlation P induce rotations corresponding to motion in
A Y direction, as well as changes in the eigenvalue space, but no motion in S Y and therefore no change in subspace. Changes in θ result in motion confined to the subspace S Y direction along with changes in the eigenvalue space.
As discussed in section 3.9.4, the differentiable signal model H(θ) defines a natural coordinate basis for and where D k is the derivative matrix of H with respect to θ k , evaluated at the point of interest. In the modal case, H = [h(θ 1 ), · · · h(θ q )] and the Eś basis matrix (761) has the form (520) where D is the N × q derivative matrix and G is a selection matrix.
We showed that substituting this form into general form (760) and manipulating terms yielded (513), repeated here: The right hand side is recognized as the parameter space FIM for the multi-dimensional modal problem [1], [2]. The natural coordinate basis, in general non-orthogonal, is brought to orthnormal form by the linear transformation where J is computed in the standard vector space way (see (511) ). The components of the subspace FIM with respect to this orthonormal basis are (513) The CRB on ss for this parametrically constrained problem, then, is

Asymptotic ML Estimators for Y ∈B
In this geometric approach, our objective is to estimate the N × p semi-unitary matrix Y representing a point on the submanifoldB defined through the signal covariance R s by (763). From the geometric perspective, the underlying parameterization provides a particular coordinate system on the submanifold. Since the parameterization also defines the submanifold, these parameters are termed here the natural coordinates.
For the unconstrained case, the p principal eigenvectors and eigenvalues of the SCM as computed by EVD and denoted here as (Ỹ ,Λ), respectively, are maximum likelihood estimates of Y ∈ B and Λ ∈ D. We consider the compressed likelihood Using a partitioned (α, β) normal coordinate system with origin at the unconstrained MLEỸ , nearby points may be represented as Y = expỸ ∆(α, β) (see (746)). The constrained ML criterion (771) in terms of these normal coordinates is In this expression expỸ ∆(α, β) ∈B represents the submanifold membership constraint condition, Y ∈B , expressed in terms of the exponential map based atỸ and the tangent vector ∆(α, β).
The Taylor series the expansion of L(α, β) is given by where we have used the fact that L αβ = 0 and that the first derivatives in α, β vanish at the unconstrained MLE,Ỹ .
As a consequence of the assumption that source correlation matrix P is unconstrained, the setB is such that if Y ∈B then Y Q ∈B, Q unitary and arbitrary. Thus the parameter α, which defines Q through Q = expm a k α k I N,p , is likewise unconstrained. Since in (773) (see (637)) whereVỸ is the set of complexN b -tuples that satisfy the submanifold membership condition, that is Using an S-N basis for space BỸ , we showed that these constraint-satisfying complex In this expression, E s is the basis matrix for SỸ , s is a free real q-tuple, and b n defines the tangent vector fromỸ to the closet point onB. Denoting this closest point, or minimum subspace distance point, as Y M D , we have The second term in the sum (776), E s s, corresponds to free subspace motion along the submanifold.
Substituting (776) into (774) results in an unconstrained minimization in s which has the solutionŝ where L ss and L s | Y M D are given by (659) and (660) respectively. The solution to this quadratic approximation of (771) is then expressed as an N × p semi-unitary matrix as This estimation approach requires that the submanifold membership condition, expressed as (776), be known. This, in turn, requires that the S-N partitioning of the space BỸ = SỸ ⊕ NỸ is known and that the minimum distance point Y M D is also known.

Relation to Newton's Method
When the submanifoldB is defined by some known underlying parameterization, a practical approach to solving the above constrained ML formulation is to leverage a suboptimal parameter estimate, sayθ 0 , to establish a reference point Y 0 ∈B. Using the differentiable signal model, a basis is constructed for S Y 0 expressed as the matrix Eś.
The ML tangent vector based at Y 0 then is given by and The resultant ML estimate of Y ∈B isŶ = exp Y 0 ∆. Comparing (782) with (780), note that the term b n = E nn is absent from (782). This corresponds to fact that here, the reference point Y 0 lies on the desired submanifoldB, so no movement in the normal direction, represented by the term b n in (780), is required.
The ML tangent vector based at Y 0 is alternately expressed as The component estimate δθ represents a change in the underlying θ and the equivalent parameter estimate isθ This approach (783) is recognized as the a 1-step Newton's refinement if we make the following correspondences The overall performance of this approach relies on the performance of the unspecified estimator that provides the initial parameter estimateθ 0 . For large initial errors, multiple iterations may be required and convergence is not guaranteed. For harmonic estimation problems on regular arrays, R-D Unitary ESPRIT may be used to generate the initialθ 0 estimate.

ML Estimation onB damped
The above algorithm for the constrained optimization problem relies on a suboptimal parameter estimator to establish the reference point Y 0 that initializes Newton's method. In section 4.3.1 we introduced a new algorithm for the important special case of the damped exponential signal model on regular arrays. The submanifold corresponding to this signal model is denotedB damped ⊂ B. This new algorithm used the structure of B damped in a more integrated and optimal way and did not require a suboptimal reference solution. Given this model, we showed that, for an appropriately defined vector valued function r(·) onB damped , Y ∈B damped ⇐⇒ r(Y ) = 0. The submanifold membership condition in terms of the tangent vector ∆ ∈ BỸ based atỸ then is where r(b;Ỹ ) ≡ r(expỸ ∆). The function notation r(b;Ỹ ) is based on the fact that elements of BỸ have the general form ∆ =Ỹ ⊥ B for a fixedỸ ⊥ and b = vec(B).
The Taylor series expansion of the constraint function r(b;Ỹ ) is The first order approximation of the constraint function, then, is expressed in matrix form as where for convenience we define the matrix H as ( H is a function ofỸ ⊥ ). A critical feature of this approximation is that H is constructed in such a way that its null space well approximates the submanifold tangent space SỸ IfÊ s denotes the matrix whose columns are a basis for null( H), then notionally whereĒ s is a basis for SỸ . Since the space SỸ isq dimensional, this requires that the null space of the approximation H beq dimensional as well, or rank( H) =N b −q.
Using this approximation for H, and (677) the set of constraint-satisfying tangent vectorsVỸ (775) is approximated as Elements in this set are given by theq-parameter family of least squares solutions ands is a complex freeq-tuple. This solution is in the same form as (776), the result developed using a reference point Y 0 .
Given the constraint-satisfying tangent vector specified by (680), the final estimate Y of Y ∈B damped is found using the complex form of (778) (see theorem 24 ) whereb =b n +Ê s .ŝ The estimate (798) may be approximated to first order asỸ + ∆.

Application to the harmonic retrieval problem
Multi-dimensional harmonic retrieval on regular arrays appears in a number of different applications. In the geometric approach, this estimation problem is framed in terms of the submanifold corresponding to harmonic signal model, denotedB harmonic , with the set relationshipB The eigen decomposition of the true signal covariance effects a transformation between the (θ, P ) and (Y, Λ) parametrizations, that is [Y, Λ] = eig(H(θ)P H H (θ)). The inverse transformation has the general form and, in particular, For harmonic signals on regular array geometries R-D Unitary ESPRIT [3], restricted tō B harmonic , is the mapping ϕ.
If Y is an ML estimator of Y ∈B ⊂B harmonic , then by the invariance principle [4],θ = ϕ( Y ) is an ML estimator of θ ∈ R p . In the standard approach, the subspace estimateỸ is the p-dominant singular vector matrix of the SCM as computed by the EVD. This is the ML estimate for the unconstrained signal model corresponding to the B manifold. It is not ML with respect to theB harmonic ⊂ B assumption, and so the invariance principle is not applicable here.
A subspace-based parameter estimation algorithm is formed by extending the domain of the mapping ϕ (802) to B; as noted, R-D Unitary ESPRIT withỸ as input is the prime example of such an approach. SinceỸ is not ML with respect to thē B harmonic assumption R-D Unitary ESPRIT is not an ML parameter estimator. To improve subspace-based parameter estimation, it is therefore desirable to find an ML estimator of Y ∈B ⊂B harmonic , or failing that, one that is closer to the constrained subspace CRB than standard EVD estimateỸ . In chapter 4 we introduced a subspace estimation algorithm for the damped exponential signal assumption,B damped that was derived from the ML criterion. This estimation algorithm, termed M LESI, may be viewed as an implementation of a geometric Newton's method on B that avoids the need for explicit initialization by incorporating the submanifold constraint condition (i.e., shift-invariance) in a more integrated way. In the most benign scenario the subspace accuracy bound for theB damped assumption is 10 log 10((N − p)) below that of the unconstrained B assumption.

Ratio of the Bounds
The ratio of bounds for the different signal or submanifold assumptions indicate the processing gain available to a subspace estimator using the full constraint information, relative to the standard EVD. The potential gain processing is defined as Gain: = 20 log 10 CRB SS crb SS (803) where CRB SS is the bound based on the unconstrained assumption and crb SS is the bound computed based on the constraint information (e.g., for harmonic signal model crb SS = CRB harmonic ).
For benign scenarios consisting of widely spaced uncorrelated signals, the relative difference between the bounds is the square root of the dimension ratio of the space of subspace perturbations tangent space for different manifold assumptions. The dimension of these spaces for the different assumptions and the resultant ratios are summarized in the table below.
Thus, for benign scenarios on a 2-dimensional array, (M = 2), potential gains for theB damped andB harmonic assumptions are respectively, 10 log 10( N −p 2 ) and 10 log 10(N −p). The M LESI estimator, which is asymptotically efficient with respect to theB damped assumption, realizes a gain of 10 log 10( N −p 2 ) in this benign example.

Example: Correlated vs. Uncorrelated
The case of two highly correlated signals is often used to compare the performance of different estimators, since in this stressful situation, standard approaches like ESPRIT are severely degraded [3]. To better understand the mechanisms at play, we consider two orthonormal signals h 1 , and h 2 and the signal matrix The SVD of the signal matrix in this case is simply H: Consider the two source correlation matrices corresponding to correlated and uncorrelated sources: P corr has the eigen decomposition P corr = QΛQ H where and The signal covariance matrices for correlated and uncorrelated cases are given, respectively, by and Thus, for the correlated case (806) the parameter ρ represents the correlation coefficient, while in the uncorrelated case 1 + ρ and 1 − ρ are the powers of the two sources. By construction, the signal covariance matrices have identical eigenvalues, that is σ(R uncorr ) = σ(R corr ). In a Grassmann manifold formulation, the eigenvector matrices Y 0 and Y 0 Q, that define the same subspace belong to the same equivalence class, and are therefore represented by the same point (as such, the (Grassmann) geodesic distance between them is zero). In the Basis manifold formulation used here, these two matrices belong to different equivalence classes and are therefore represented by distinct points; the geodesic distance between them is non-zero.
The bounds for the subspace estimation problem depend on constraint assumptions of the signal matrix, H which transform into constraints on Y ∈ B. For the unconstrained case, the B manifold applies and the subspace CRB is given by (750). For the two signal case, p = 2, and high SNR this bound is This CRB depends on the problem's eigenvalues but not on Y and so CRB SS for both the R s,corr and R s,uncorr cases, is identical.
If, on the other hand, H is constrained, then theB ⊂ B submanifold formulation applies and crb SS = tr(F −1 ss ) where the subspace FIM, F ss , has the general form (760).
If E s;u is the basis matrix for the uncorrelated case, then the corresponding matrix for the correlated case is E s;c = (Q ⊗ I N −p )E s;u where Q is given by (808). The ratio of unconstrained bound to the constrained bound measures the potential processing gain available to an estimator using the full constraint information so that Example: For harmonic signals theB harmonic submanifold assumption applies. The subspace CRB for the p = 2 uncorrelated case, P uncorr (806), is and for the correlated case, P corr (806), The potential processing gains (812) for these two cases are and Gain Uncorrelated: For the uncorrelated case, we see that the gain is equal to the ratio of dimensions of the for the corresponding uncorrelated case.
Because both Y 0 and Y 0 Q map to the same parameter value θ, the distinction is not relevant in the harmonic parameter extraction step (802). The rotation Q (or equivalently the parameter α) is a nuisance parameter with respect to the θ parameter estimation objective. However, as the difference between the CRBs for the correlated and uncorrelated problems indicate, the distinction between Y 0 and Y 0 Q is relevant to the overall estimation problem.
The ratio between the intrinsic bounds for the different constraints, uncorrelated and correlated, is For ρ = 0.9999 as in the example above this ratio is 37 dB. This dependence on the parameter ρ is reflected in the θ parameter space CRB . The parameter space FIM at high SNR is given by and the bound on the parameter error variance is For the orthogonal signal example under consideration D H P ⊥ D = I p , so that the CRB on the parameter total RMS estimation error is reduces to The parameter CRBs for the uncorrelated and correlated cases (806) are, respectively, and The ratio between the constrained bounds for the uncorrelated and correlated cases is As expected, this result is consistent with the subspace bound ratio given by (818).

Performance Metrics
The performance evaluation of the various estimators is based on two related criteria. First, for the estimation packages that provide estimates of the parameter θ directly, we use the total RMS parameter estimation error. For the harmonic signal estimation problem on M -dimensional arrays (e.g., DOA estimation), the total RMSE in the spatial frequency domain, denoted θ , is where δθ(n) =θ(n) − θ denotes the error vector in the estimate for the n-th trial. For the M -dimensional harmonic case with p modes, δθ(n) is M p × 1.
The second criterion is the subspace distance between the true signal subspace, represented as Y (θ), and the estimate Y ∈ B. The RMSE is In this expression Y (n) is an N × p matrix that represents the estimate for the n-th trial.

Relationship between parameter errors and subspace angles
The relationship between these two error measures, distSS 2 (Ŷ , Y (θ)) and δθ t δθ, is developed in this section. The basic result we wish to establish is The error tangent vector ∆ = exp −1 Y Y (θ) is in TỸB < TỸ B and the subspace component is∆ ≡ P B ∆ ∈ S Y . This vector expands as In this expressionś k denotes the natural coordinate (non-orthgononal) basis and s k is an orthonormal basis. If J denotes the transformation matrix between s k andś k , (J orthogonalizes s k ) then s = J −1ś (see (505)) and as a result The development of the subspace estimation accuracy bound for the Basis manifold formulation relied on the fact that, to first order Figure 13. Scatter plot of the subspace distance error computed directly via (826) and indirectly using (831). The scatter lies along a line with unit slope confirming equivalence of the two formulas.
In addition, since to first order δθ ≈ś, substituting this result into (829) and using (830) yields To demonstrate this relation we show a scatter plot, figure 13, of the left and right hand sides of (831) over 1000 trials for a stressful example. The scatter lies along a straight line confirming equivalence of the two formulas.
From (831)  so that the two measures differ by a scale factor, that is In contrast, for closely spaced sources the eigenvalues of J vary widely and this simple relation does not hold. This situation is illustrated in examples 5 and 6 considered in section 5.2.5.  [3], [3].

Simulation Results
In this section we compare the performance of the above estimation packages to the intrinsic CRBs for the 3 different signal assumptions (800) Note that for P = I, (σ(R s )) = N p = 64 · 2 = 128.
Using a Monte-Carlo simulation, a N × K data matrix is generated according to (740) and provided as input to each of the estimators for each trial. We measure the performance of the various estimators using the RMS subspace distance SS (826) and the total RMS parameter error θ (825), where appropriate, over a set of 1000 realizations for fixed SNR. The subspace distance CRBs for the 3 signal models (800): harmonic, damped exponential, and unconstrained are plotted versus SNR in figure 14 along with the SS for the EV D, M LESI and M LESIESP subspace estimators.
This set of plots is referred to as the subspace error plot set. Following the discussion    table 4 are denoted as µ 0 , the spatial frequencies in this example are s · µ 0 ; the scale factor s is the horizontal axis labeled, "sep factor". Observe that the proposed estimator M LESI shows good agreement with the bound CRB damped down to vary small angular source separation. spatial frequencies in this example are s · µ 0 ; the scale factor s is the horizontal axis in the figure . Observe that the estimator M LESI shows good agreement with the bound CRB damped down to vary small angular source separation.

Example 4 Correlated vs Uncorrelated
This scenario examines the highly correlated signal case discussed in section 5.2.3.
Examples like this one are often used in the literature to evaluate proposed estimators since the performance of standard algorithms like UE is poor in this situation. This particular example with ρ = 0.9999 (correlation coefficient), corresponds to test case considered in [3][ figure 2]. A pair of widely spaced signals are considered as in the first example (see table 4). The number of snapshots is K = 10. The two source correlation matrices that correspond to the correlated and uncorrelated test cases (806) discussed in 5.2.3 are considered. The eigenvalues of both P uncorr and P corr are (1 + ρ, 1 − ρ) and for the choice ρ = 0.9999 these eigenvalues are (1.9999, 0.0001). The unconstrained subspace CRB, CRB SS (813), is identical for both cases since the eigenvalues are identical.
The constrained CRB, crb SS , differs for the two different source correlation matrices.
For the P = P corr (816) case the available gain is while for P = P uncorr , the available gain is Simulation results for the P = P corr and P = P uncorr test cases are overlaid in figure 18.
Both the EVD performance and CRB SS are identical for the two cases, while the bounds for the constrained model,B damped , differ as does estimator performance. At sufficiently high SNR the gain values, (836) and (835), are confirmed. For the highly correlated case, higher SNR is required for the constrained bounds to be achieved. Figure 19 shows UE and UEN along with M LESIESP and M LESIESP N for the correlated example. As evident, UE is severely degraded in the highly uncorrelated case; the one step Newton's method (UEN) estimator shows dramatic improvement and achieves the bound at high SNR. Performance is close to that of the M LESIESP N result.  (806). The eigenvalues of both P uncorr and P corr are (1+ρ, 1−ρ) and for the choice ρ = 0.9999 these eigenvalues are (1.9999, 0.0001). For the unconstrained assumption, CRB SS is identical for both cases since the eigenvalues are identical. The bounds for the constrained assumption (i.e., CRB damped and CRB harmonic ) differ for the two choices by the amount expected. Figure 19. RMSE θ (825) for the M LESIESP and M LESIESP N , U E and U EN estimators and the parameter space bound CRB θθ (820) versus SNR for two different source correlation matrices as given in (806).Since the system eigenvalues are the same for both cases CRBS is the same for both. The constrained bounds (e.g.; CRBH)) differ significantly between the two cases. 10log10(σ(J)) = (3.84, 3.79, 3.76, 3.66, 3.61, 3.54, 3.51, 3.49) (838) Figure 20 shows the subspace performance plot for this scenario. We observe that M LESI achieves the CRB for theB damped assumption (CRB damped ), while both Since the M LESIESP estimator very closely approaches CRB harmonic at moderate SNR, the Newton's refinement in the M LESIESP N estimation package has little room to show improvement in this example except at the lowest SNRs. Figure 21 shows the parameter space error plot set which shows the same performance ordering. In the more stressful example that follows (example 6) this performance ordering (839) is maintained only at the highest SNRs.
Following the discussion in section 5.2.4 we plot in figure 22 the RSS subspace error, distSS(Y (θ 0 ), Y ), versus the RSS parameter space error, √ δθ t δθ, for each of 1000 realizations at fixed SNR of 21 dB. Recall that these errors are related Because the eigenvalues of J are all approximately equal (838), so that J −2 ≈ c 2 I, we   for 1000 realization at an SNR of 21 dB the for closely spaced source example specified in table 5. Each gray dot represents the error measured for a single trial for the M LESIESP estimator. The green dot represents the corresponding RMSE. The black dot represents the CRB in for the two different measures, (CRB θθ and CRB harmonic ). The red dot represents the RMSE for UE w.r.t the two measures. have This fact gives the scatter plot the appearance of a straightline. In example 6, consisting for four very closely spaced sources, the eigenvalues of the Jacobian have a large spread (see (843)) so this simple relationship is not maintained.

Example 6 : 4 very closely spaced, equal power sources at large angles
This example consists of four very closely spaced sources with locations specified in table 6. The number of independent snapshots is K = 512. The eigenvalues of signal In the SNR range approximately 65 to 90 dB we see that M LESI < M LESIESP . The implication is that UE parameter extraction step of the M LESIESP package has increased, rather than decreased, the subspace error. The refinement step (M LESIESP N ) reduces the error to a level below that of M LESI so that in this SNR range we have performance ordering M LESIESP N < M LESI < M LESIESP .
As SNR is decreased a point is reached at which the UE parameter estimation step of M LESIESP is poor enough that the 1-step Newton's method yields a error larger than M LESI. In this SNR regime We observe that M LESI estimate maintains good performance, as measured by the subspace error, well below the SNR threshold (approximately 80 dB) at which the UE parameter extraction step of the M LESIESP estimator begins to degrade. In contrast, while the UEN achieves the CRB harmonic bound at the highest SNRs shown it begins to deviate from the bound at SNRs less than 90 dB. This compares to a deviation point for M LESIESP N of 68 dB. For reference we include standard UE which serves as initialization for UEN. As is evident UE is far from bound even at the highest SNRs.
A scatter plot the parameter estimates (1000 trials) at an SNR just above the breakdown level is shown in figure 25. To better illustrate performance differences a histogram of the estimation errors associated with one of the two DOA parameters, for one of the sources, is shown in figure 26. The plot on the left in this figure shows the M LESIESP estimator and the one on the right shows the UE estimator. Figure 24 shows the parameter performance for this scenario. Although the relative ordering of the different estimators is preserved with respect to the two measures,subspace error and parameter error, the two performance plots, appear quite different. This apparent difference is explained by the error relation (841).

Interpretation of Errors as Measured in Different Coordinates
In example 5 Jacobian was J ≈ c −1 · I so that distSS 2 ( Y , Y 0 ) ≈ c −2 δθ t δθ. This fact gave the error scatter plot the appearance of a straight line (see figure 22). In the present example the eigenvalues of J (843) have a large spread so the relation between the two error measures is no longer so simple. Two parameter error vectors, say δθ 1 and

A.1 Coordinate Representations
The notion of coordinates plays an important role in manifold theory [1]. For the case of a sphere embedded in a 3-dimensional Euclidean space, two different coordinate systems were discussed in chapter 1. The first, called extrinsic coordinates, represented points on the sphere by their (x, y, z) Cartesian coordinates on the ambient 3-D space.
The second set, called normal coordinates, was defined in terms of the tangent space at a given point and the exponential (geodesic) map (see section 2.1). The transformation from the (u, v) polar normal coordinates to the (x, y, z) was given by In this section we consider coordinates for the basis manifold B that are analogous to these extrinsic and normal coordinates defined for the sphere example. Normal coordinates are a special type of intrinsic coordinates where the coordinate lines are themselves geodesics generated by the exponential map. The more abstract coordinate-free representation of points on manifolds was not utilized in the main text and coordinates were used at the outset. In this section we proceed more formally and introduce the N × p matrix representation of points as the result of a coordinate mapping. This coordinate function, denoted y, is defined as mapping from St (N,p) to C N ×p The coordinates of the abstract point Y ∈ St (N,p) are then represented as where Y is a N ×p complex matrix. This coordinate mapping is invertible (on its image) so that The representations Y , which reference the ambient space, are called extrinsic coordinates on B and are distinguished by the fact that the number of components in the coordinate N -tuple is greater than the dimension of the manifold (this is analogous to using 3 (x, y, z) Cartesian coordinates to specify a point on a 2-dimensional sphere).
The number of values required to specify the complex N × p matrix Y is 2N p; this compares to the dimension of St (N,p) as 2N p − p 2 . As was the case with the sphere, the Riemannian normal coordinates are a local coordinate system in a neighborhood of a point Y 0 obtained by applying the exponential map to the tangent space at Y 0 . By the exponential map, any arbitrary choice of basis for T Y 0 B yields coordinates on B near Y 0 . These coordinates are formed as follows. Let ∆ ∈ T Y 0 B denote the tangent vector between Y 0 (thought of as the origin of the coordinate system) and an arbitrary Y ∈ B, that is Given an arbitrary, abstract, partitioned orthonormal basis, a k , b k , for For a fixed orthonormal basis, any such tangent vector ∆ is specified by the pair (α, β).
We will sometimes write ∆(α, β) to denote the vector whose components are (α, β) when the basis is understood.
Define the invertible mapping, X , which,for the given basis, assigns to the tangent vector ∆ its components . Since the tangent space for the problem of interest is naturally partitioned this mapping has the form and, referring to (A.7), is given by This mapping is invertible .10) and the inverse map constructs the tangent vector ∆ from the pair (α, β) according to The normal coordinate function denoted ϕ is then defined by the composite mapping This function assigns to each point Y ∈ B the real tuples (α, β) ∈ R Na × R N b by 14) The composite coordinate function ϕ first computes ∆ = exp −1 Y 0 Y, the tangent vector ∆ from Y 0 to the point Y. The function X then represents ∆ with respect to an arbitrary fixed basis for the tangent space. The coefficients of ∆, with respect to this basis, are (α, β). The real tuples (α, β) are said to be the Riemannian Normal coordinates of Y.
The coordinate mapping ϕ is invertible (A.15) and the point in B corresponding the coordinates (α, β) is given by the inverse coordinate map The exponential mapping at the point Y 0 is a mapping from the tangent space Noting (A.11), the inverse coordinate mapping ϕ −1 then is Normal coordinates are not unique since the choice of the orthonormal basis (A.7) used to represent the tangent vector was not unique. Note that the total number of the normal coordinate components is equal to the dimension of the manifold (i.e., N b = 2p(N − p) and N b = p 2 so that N a + N b = 2N p − p 2 which is equal to the dimension of St (N,p) ).
In a normal coordinate system, the Christoffel symbols (see section A.4.1) of the connection vanish at the point Y 0 , thus often simplifying local calculations. In such coordinates, the covariant derivative reduces to a partial derivative (at Y 0 only), and the geodesics through Y 0 are locally linear functions. This feature comes into play when we later compute the 2nd covariant derivative of the likelihood function.

A.1.3 Coordinate Transformations: Normal Coordinates and Extrinsic Coordinates
The transformation between these normal coordinates and extrinsic coordinates is given by the composite map and using (A.11) figure 3).

A.2 Curves on Manifolds
A curve on B is defined by a smooth map from reals R to B, c: R → B. This curve is expressed in extrinsic coordinates via the composite mapping When the curve "c" is understood, this is composite mapping is often written as This same curve may be represented in normal coordinates by Similarly, when the curve "c" is understood, this is composite mapping is written as (α(t), β(t)) = ϕ(t) = ϕ(c(t)) (A.26) The transformation from the normal coordinate representation of c(t) to extrinsic coordinate representation is given by This corresponds to the computational form where the matrices A(t) and B(t) are defined with respect to the basis vectors {a k =

A.2.1 Geodesic Curves
A geodesic curve emanating from the point Y 0 in the fixed direction ∆ is given by Representing the tangent vector ∆(t) ≡ t∆ with respect to an orthonormal basis as the corresponding normal coordinate representation of a geodesic curve c(t) is then given by the straight line in R Na × R N b as X (∆(t)) = (α(t), β(t)) = (tα, tβ) .
The corresponding extrinsic coordinate representation of the geodesic curve c(t) is y(c(t)) and is given by where we have used (A.29), (A.30) and (A.33).

Geodesic Coordinate Curves
A geodesic coordinate curve corresponding to the α k coordinate is given in normal coordinates by where i k is an N α × 1 unit vector with 1 in the k-th location and zeros elsewhere and is represented as a k = Y 0 A k then the geodesic (A.35) is represented in extrinsic coordinates as Likewise, a geodesic coordinate curve corresponding to the β k coordinate is given in normal coordinates by (0 Na , t · i k ). If the unit tangent is represented as b k = Y 0⊥ B k then the geodesic coordinate curve is represented in extrinsic coordinates as For later reference the expanded form of this curve is The first order approximation is and a corresponding one for the α geodesic coordinate curve set, will be important when we compute the covariant 2nd differential of the likelihood function defined on B.
Referring (A.37) to (A.39) and the extrinsic representation of tangents to these geodesic coordinate curves are The right hand side of both expressions are N × p complex matrices.

A.2.2 Functions on Manifolds
A real function on B is the mapping L: B → R, and the value of the function at Y ∈ B is denoted L(Y). The extrinsic coordinate representation of L is denoted L y and is defined by The value of the function at Y is L y (Y ) and this equals L(Y) for Y = y −1 (Y ). Likewise, in the (α, β) normal coordinates, the representation of L is denoted L ϕ and is defined by The value the function at (α, β) is L ϕ (α, β) and this equals L(Y) for The functional forms L ϕ (α, β) and L y (Y ) are related by In later sections we will denote L ϕ , L y and L by the same symbol "L" since the meaning will generally be clear from the choice of arguments. That is when Y = y(Y) and (α, β) = ϕ(Y). Henceforth we deal with the extrinsic coordinate representation Y = y(Y).

A.2.3 Coordinate Curves
Let c (i) (t) denote the geodesic coordinate curve emanating from Y 0 in the a i direction. The normal coordinate representation of this curve is (t · i k , 0), that is Therefore, we can compute the directional derivative in the direction a k by For computational purposes we can write this in terms of the extrinsic coordinates where we have used the notation

A.2.4 Calculus on Manifolds : Partial Derivative
Let x be a coordinate system on the manifold M, that is then L x ≡ L • x −1 is a function defined on R N . The standard practice of using the same symbol for both the independent variable and a function is used here. In elementary calculus, the partial derivative of L(x) with respect to x k is defined as This is equivalent to This curve is also expressed in the alternate form where e k is a q × 1 unit vector with 1 at the k-th location.
Applied to the coordinate function Y = Y 1 , · · · , Y M we have

A.3 Elements of Differential Geometry A.3.1 Natural Coordinate Basis
Let x be coordinate system with origin at the point Y , that is x(Y) = (0, · · · , 0).
t, · · · , 0)(that is,t in the k-th location and zeros elsewhere). The set of tangents define a basis for T Y B termed a coordinate basis. Such coordinate basis vectors are often denoted using the notation ∂ ∂x k rather than e k .
Ifx is an alternate coordinate system then we writeé k ↔ ∂ ∂x k . In the previous section we introduced the notation ∂ ∂x k to stand for e k when referring to coordinate basis vectors. Similarly here dx k stands for e k , that is e k ↔ dx k . (A.72) Using this notation the relation (A.71) becomes for some set of coefficients w k . For a given ω and basis these components are An alternate representation of the tensor F (A.78) is .82) and noting (A.79) we haveF .83) or in matrix formF This equation represents the change of basis for a (0,2) tensor (covariant 2-tensor) (see [2, eqn 28] ).
The expression for a quadratic form is as follows. Let b = e i β i , then Expressing the summation on the right hand side in matrix form the result is Using (A.84) and (A.87) the quadratic form in the alternate basis is As noted in [3] "...the tensor product of a tangent vector with itself is equivalent to the outer product given a particular choice of coordinates." For the tangent vector ∆ = e k β k the tensor product is The components of the tensor C ≡ ∆ ⊗ ∆ are (A.90) or in matrix form

A.3.3 Tensor Fields
Vectors defined at all points on a manifold are called vector fields. "Intuitively, a vector field is best visualized as an "arrow" attached to each point of a region, with variable length and direction. One example of a vector field on a curved space is a weather map showing horizontal wind velocity at each point of the Earth's surface." Tensor fields are defined in an analogous way. The 9 element stress tensor appearing in fluid and solid mechanics is standard example (1 normal and 2 shear stresses on each of the 3 faces of the cube).
Generic tensors fields are denoted here as T, and the value of the field at a particular point P is T(P). The most relevant types for our discussion are listed below.

A.3.4 Differentiation on Manifolds
Quoting from [3], "Differentiating real-valued functions defined on manifolds is straight forward because we may subtract the value of the function at one point on the manifold from its value at a different point. Differentiating first derivatives again or tangent vectors, however, is not well defined (without specifying additional information) because a vector tangent at one point is not (necessarily) tangent at another. Subtraction between two different vector spaces is not intrinsically defined. The additional structure required to differentiate vectors is called the affine connection, so-called because it allows one to connect different tangent spaces and compare objects defined separately at each point". The connection defines and is defined by the geodesics. The 2nd covariant differential computation described below relies on the geodesic definition.
1st Covariant Derivative The covariant differential of a function L along tangent vector v is denoted vari- , or ∇ v L and simply represents the standard directional derivative, that is where L(t) ≡ L(x(t)), and x(t) is an arbitrary smooth curve with tangent On the curve, we have L (x(t)) = L(x 1 (t), x 2 (t), · · · , x N (t)) and by the chain rule The directional derivative of the real valued function L in the direction of the coordinate tangent vector e i is defined by For convenience we use the notation ∇ i = ∇ e i when the basis is understood. Define the operator in coordinates Define ∇L to be the linear operator,∇L : The covariant differential of a function is an element of the cotangent space T * Y B, that is ∇L ∈ T * Y B, and is represented in coordinates as

A.4 Covariant Derivative
The covariant derivative generalizes the notion of the a directional derivative in an invariant, coordinate free way.

Definition Covariant Derivative:
The "covariant derivative", ∇ v T, of tensor field T along the arbitrary curve P(t) is defined as where T(t) = T(P(t)) and v is given by (A.97) .
The "ll translated to 0" subscript stands for the operation of parallel translating the tensor T at P(t) to the point P(0). Once referenced to the same vector space, that is the one at P(0), the subtraction operation is well defined. The geodesic gives meaning to parallel translation operation.
As before, define ∇T to be the linear operator that gives ∇ v T when it acts on v For a given basis,e k , this is Let T be a tangent vector ((1,0)-tensor) field (A.92). Then using the Leibniz rule The factor ∇ j T i represents the derivative of a set of scalar functions, so that The second term in (A.108) is the derivative of a vector (covariant basis, e i ) field. This is The "parallel translated to 0" denotes the operation of bringing the vector at T P(t) M to the vector space T P(0) M, enabling the subtraction operation in the same vector space.
This translation in the direction v = e i v i is given by The factor in parenthesis on the right hand side represents the matrix components of the covariant derivative, and is denoted variously as T k ; i or ∇ i T k Geodesics parallel translate their own tangent vectors, so that in a normal coordinate system we have Γ m ij (0) ≡ 0. If x is a set of normal coordinates then The covariant derivative of an element of the cotangent space is defined in a same way.
In particular, we have .117) so that the covariant derivative of the (0, 1) tensor (A.93) is These are the components of covariant (0,2) tensor (A.94) expressed as The components are Going forward, we use the notation [∇ 2 L] to denote the component matrix of the Hessian tensor when the coordinates are understood.

A.4.1 2nd Covariant Derivative Differential of a function
"For arbitrary manifolds, the Hessian is the second derivative along geodesics. In differential geometry it is the second covariant differential of L". The Hessian of a function L is defined as the quadratic form where c(t) is the geodesic emanating in the direction of ∆ (i.e.,ċ (0) = ∆) [4, equation 2.54].
Let x be a normal coordinate system with origin at the point P. In such a coordinate system the Christofel symbols vanish at P so that As with any bilinear form, the off-diagonal terms may be computed using the standard process of polarization The first term on the left hand side is given by where φ(t) is the geodesic emanating in the direction e i + e j . The second term is similarly computed, with φ(t) the geodesic emanating in the direction e i − e j .

Example: Cartesian Coordinates on the Plane
The usual Cartesian coordinate lines are geodesics on the flat plane. The components of the second covariant derivative ∇ 2 L are where ϕ x (t) is the geodesic emanating in the direction e x ϕ x (t) ≡ (x + t, y) .
To compute the polarization form (A.125) we require the geodesic curve in the e x + e y and e x −e y directions. Denoting the geodesic in this direction as ϕ x+y (t) ≡ (x+t, y +t) we have ∇ 2 L[e x + e y , e x + e y ] = d 2 L(ϕ x+y (t))) dt 2 | t=0 (A.131) and similarly where ϕ x−y (t) denotes the geodesic tangent to the vector e x − e y . The standard Hessian matrix is define as (A.133)

A.5 Computing Derivatives on B
In this section we restate the derivative definitions for functions defined on the B manifold using the notation just established. We recall that tangent spaces for this formulation naturally partition as we denote the partitioned basis as The geodesic coordinate curve emanating in the b k direction is given in extrinsic .40)). The components of the first differential with respect to this basis are The components of the Hessian are computed using (A.122) and the diagonal elements corresponding to the b k basis vectors, for example, are To compute these quantities, substitute the expanded form of Y (k) (t), given by (A.39), into the given function and compute. This Taylor series expansion has the form .137) and The cross terms of the full Hessian are found using polarization (A.125). The computation of the cross terms require vectors of the form ∆ = a i +b j ; these have the explicit form The

A.6 Inner Product Space
A vector space is a collection V of objects called vectors, which may be added together and multiplied ("scaled") by numbers, called scalars in this context. The set of scalars is denoted F; these are often taken to be real numbers, but there are also vector spaces with scalar multiplication by complex numbers, rational numbers, or generally any field. The operations of vector addition and scalar multiplication must satisfy certain requirements, called axioms. The vector space is then the pair (V, F) When the scalar field F is the real numbers R, the vector space is called a real vector space. When the scalar field is the complex numbers C, it is called a complex vector space. These two cases are the ones used most often in engineering.
An inner product space is a vector space with an additional structure called an inner product. This additional structure associates each pair of vectors in the space with a scalar quantity known as the inner product of the vectors. Inner products allow the rigorous introduction of intuitive geometrical notions such as the length of a vector, the angle between two vectors, and orthogonality between vectors (zero inner product).
Formally, an inner product space is a vector space (V, F) with an inner product given by the mapping that satisfies certain axioms.
A.6.1 2 by 1 complex vectors over the Reals (C 2 , R) Let V be the collection of all complex n-tuples with addition defined in the usual "vector" way. We can define this as a vector space over either the real or complex field.
We first consider the real case, F = R. For n = 2 a set of basis vectors for this space (real dimension = 4) is ( Note that e 1 and e 3 are linearly independent since e 1 = s · e 3 for any s ∈ F ≡ R. The standard inner product (A.140) for this space is The idea of orthoganility depends on the choice of inner product; the basis defined above is an orthonormal one since Re(E H E) = I.
For an arbitrary basis e k , k = 1, · · · 4 the components of v with respect to this basis A.6.2 2 by 1 complex vectors over the Complex (C 2 , C) We now consider the same set V with scalar field given by the set of complex numbers, that is F ≡ C.
A.8 Subspaces of (C N , R) and (C N , C) In this section we consider subsets of C N , N even, which are particular vector subspaces of (C N , R). Let V be a subspace of (C N , R) of dimension 2q ≤ N defined by the span of the set E of the form , · · · , e q ] is a linearly independent set. This same set V may be represented as a q dimensional subspace of (C N , C). In this formulation V is represented by the span of the basisĒ, whereĒ is related the basis E appearing in the real formulation by (A.161).
An arbitrary vector v ∈ V can be expressed in the (C N , R) formulation as On the right hand side of this expression the real 2q-tuple v has been partitioned into two q-tuples as Carrying out the multiplication yields Thus, in the complex formulation v =Ē ·v, and so V is complex dimension 2. Let V ⊂ C 4 be defined by the span of the following set of vectors in (C 4 , R) To verify this, we need to show that an arbitrary v ∈ V can be expressed as a linear combination of e 1 and e 2 when the scalar field is complex, Then, choosing complex scalar components This confirms V is a 2-dimensional subspace of the 4-dimensional space (C 4 , C).
This relation between the dimension of the real and complex vector space representations does not hold for arbitrarily defined subspaces. To see this, consider the subset W ⊂ C 4 be defined by the span of By construction W is a vector subspace of (C 4 , R). Unlike the previous case (A.167), this set basis vectors is also linearly independent in the complex vector space formulation.
A.9 Transforming a Basis for (W, R) Let w k and v k be two basis sets for (W, R) and suppose that they are related by the Using the matrix form with and E w ≡ w 1 · · · w n (A.176) we have The real coefficient matrix T then is Let x ∈ (V, R) have the representations given below with respect to the basis sets v k and w k , The transformation of the basis vectors (A.177) defines a transformation of the components. Using (A.177) we have

A.10 Orthogonalizing a Basis
A basis v i is orthonormal with respect to the inner product , Using (A.175) this relation is expressed in matrix form as Given a non-orthogonal basis w k the real matrix J that orthogonalizes this set is the real symmetric matrix To verify this let E v = E w J, then Note that Re(E H w (:, i)E w (:, j)) = w i , w j so that in matrix form (A.187) A.11 Specifics of the Tangent Space Problem A.11.1 N by p complex matrices over the Reals The set of N × p complex matrices denoted C N ×p , along with the usual matrix addition and scalar multiplication operations defines a vector space. Typically, this space is considered as a vector space with the scalar field as the set complex numbers, F = C.
Instead, here the consider the scalar field to be the set of reals, F = R. The vector space (C N ×p , R) has real dimension 2N p. The standard inner product space on this space is where V and W are the N × p complex matrices corresponding to the symbols v and w.
The standard basis for this space is given by the set e k = reshape(e k , N, p); k = 1, · · · , N p (A.189) .190) where e k is a unit 2N p-tuple with 1 at the k-th location.
An arbitrary complex matrix w ∈ C N ×p is represented with respect to this basis as w = The SVD of an arbitrary matrix X ∈ C N ×p may be expressed in the general partitioned form as .194) In this expression Y and Y ⊥ are full rank semi-unitary matrices of size N × p and (N − p) × p, respectively, and Y H Y ⊥ = 0. Expanding the above yields .195) where C is complex (p × p) given by C = DV H and B is complex ((N − p) × p) given by B = D ⊥ V H . Expressing the (p × p) matrix C as the sum of Hermitian (S) and anti-Hermitian(A) components as C = A + S, yields (A.196) This partitions the space into three mutually orthogonal spaces since .197) for any for S Hermitian and A anti-Hermitian.
The three spaces are defined by collections of the form Y A, Y S and Y ⊥ B are denoted A Y , N Y and B Y , respectively. The space (C N ×p , R) then is decomposed as .198) In particular, the B Y space is the collection of all N × p matrices of the form Y ⊥ B, where B is an (N − p) × p arbitrary complex matrix. This is space has real dimension and is a subspace of the full space, that is B Y < C N ×p .

A.11.3 Algebra of the B Y space
The inner product on B Y is defined by the restriction of the inner product (A.188) defined on C N ×p . It is denoted : , : B Y and is given by .199) where P B is the orthogonal projection P B : , be an orthonormal basis for B Y . Then any ∆ ∈ B Y expands as The components of ∆ with respect to this orthonormal basis are computed in the usual yields the representation ∆ = Y ⊥ B. Vectorizing both sides of this sum yields the alter- where b = vec(B) and Using this expression, (A.203) is expressed in matrix form as , where the each matrix B k has 1 or i in the kth location and zeros elsewhere.
For the case N = 5, p = 2, N b = 12 and the standard basis is defined by the set (A.208) In this case, the matrix E b appearing in (A.206) is Then with respect to the standard basis (A.207) evaluates to The inverse relation (A.205) evaluates to where β R and β I represent the firstN b and lastN b elements of β, that is

Change of basis
Letb j be an alternate basis for B Y . This alternate basis expands in the standard orthonormal basis b k asb If these basis vectors are represented in numerical form as Vectorizing both sides of (A.215) yieldś where E b is given by (A.206) and whereÉ b = vec(B 1 ) · · · vec(B N b ) .
Let ∆ = Y ⊥ B be an arbitrary vector that is represented with respect to the b k basis as ∆ = b k β k . Then, referring to (A.205) where b = vec(B), so that the components of ∆ transform aś The real transformation matrix appearing in (A.213) is defined by and using the concrete form of the vectors b i andb j In matrix form, the above is

A.12 Partitioned space and basis
In the signal processing applications encountered later, the space B Y is partitioned into two orthogonal subspaces and n k = Y ⊥ N k be given orthonormal basis vectors for S Y and N Y , respectively. These basis elements are related to the basis b k by In matrix form (A.223) is and therefore Vectorizing both sides of the above yields the matrix form where we have used (A.206) and where T ≡ T s T n .
The matrix T s (A.229) may be expressed in terms of the matrices E b and E s as Similarly, T n is given by where for convenience we have defined The matrix representation of orthogonal projectors P S : B Y → S Y and P N : B Y → S Y with respect to this set of basis vectors is Note that if s k ,n k and b k are all orthonormal sets then (T t s T s ) = I q and (T t n T n ) = I N b −q and hence X s = T s and X n = T n .

A.12.1 Tensor Transform in partitioned frame
In this section, and throughout the remainder, the subscripts β, α, s and n are not indices, but are used to denote sub-matrices of a larger matrix as shown below. We use i, j, k, l, m as indices unless otherwise noted.
Let L be a covariant 2-tensor (see section A.3.2), that is, L is a function defined on Denote the coefficient matrix with respect to the basis b i for B Y as L ββ , so that Recall that B Y = S Y ⊕ N Y and the transformation (A.84) to the partitioned form is The quadratic form β t L ββ β is expressed in terms of the partitioned basis as where we have used (A.236).

Special Form
If the real coefficient matrix L ββ has the special form where Λ bb is aN b ×N b real diagonal matrix, then the above sub-matrices evaluate to In the above expressions we have used E s = E b T s and E n = E b T s (A.232).
A sum of the form L s (s, n) ≡ L ss s + L sn n appears later in Newton's method. In this expression s and n are real tuples. Using (A.254) and (A.256) this sum, which we denote L s (s, n), is given by

Alternate Complex form of Quadratic
Furthermore, since in the complex case under consideration N b = dim(B Y ) = 2p(N − p) the basis matrix has the form Using this (A.253) becomes For the real N b -tuple β partitioned as β = β R β I the quadratic form β t L ββ β then is Using b = E 0 β = β R + i · β I this can be expressed equivalently as the first term on the right hand side of log likelihood function (342) is Using the identity det(I + A) = 1 + tr(A) + h.o.t., the log det (R + δR) term in the log likelihood function (342) is expressed as Taking the natural log yields = log(det(R)) + log ( (B.7) In addition, since E[R] = R we have E I −RR −1 = 0 so that Using these two expectations along with (B.7) yields the second desired result (408).

B.1.2 Proof of Lemma 9
The model covariance R evaluated along the geodesic curve Y (t) is Substituting the geodesic in the expanded form given by In this form δR(A, B; t) represents the change in R along the geodesic Y (t). Since Y (t) is determined by its tangent vector ∆ = Y A + Y ⊥ B and the path parameter t, we have used the notation δR(A, B; t). The term δY (t) in (B.11) is given by (A.40) as where the first-order term is and the second-order terms are Since differentiation and expectation commute using the form (406) we have Substituting (409) for δL(t), using (408) and taking derivatives yields For ∆ = Y A, with the coefficient matrix A anti-Hermitian, Using this we have Substituting for C 1 and C 2 , the terms on the right hand side evaluate to tr(C 2 1 ) = tr(AΛΛ −1 s AΛΛ −1 s ); (B.27) Noting that tr(C 2 1 ) = tr(C 2 2 ) (B.26) is Next we evaluate the terms on the right hand side of the above. Using the fact that The off diagonal terms of are computed via the polarization approach as B.42) and expanding the first term on the right hand side can be expressed as This follows since (B.27) has the form The second term on the right hand side of (B.44) then is As shown below, each of the two terms in the sum vanish for i = j so that FIM submatrix F αα with respect to the Standard Basis for A Y To evaluate the terms in the sum, consider the specific form of the basis vectors A k .
Let A (ij) denote matrix with 1 at the i, j location and −1 at the j, i location and zeros elsewhere. Make the identification A 1 = A (12) , A 2 = A (13) , etc as shown below.
Example: Recall that N a = p(p − 1) (213)) so that for p = 3 we have N a = 6.
The standard basis matrices are A k ; k = 1 · · · 6 are For convenience in the calculations that follow recored the the ΛA products We see that ΛA 1 Λ −1 s A 2 = −ΛA 2 Λ −1 s A 1 so that their sum which appears in (B.52) vanishes. In addition so that the second terms in the sum (B.52) also vanishes.

Compute Diagonal
The second term,ΛA k Λ −1 s A k , in the sum (B.46) for both k = 1 and k = 2 is In general for k → ij Then following (B.60) the product The first and third terms in the sum (B.46) involve the products A k A k . These are In the general case, S (ij) is a symmetric matrix given by yields For convenience, define γ ij by the right hand side of the above. Rearranging terms yields γ ij = λ i λs; j − 2λs; iλs; j + λ j λs; i + σ 2 (λs; j + λs; i) λs; iλs; j . (B.72) Using λs; i = λ i + σ 2 we have This can alternatively be expressed as Squaring the above yields where Replacing the B by B k and substituting (B.83) into (411) yields the diagonal elements The off diagonal terms are computed using a polarization approach Noting that Using the identity Remark: Compute Cross Term F αβ To compute the cross terms recall (411) The polarization method is used to compute the cross terms of the bilinear form. Polarization for the cross terms Squaring and taking the trace yields (B.112) and The term X denotes the factors involving A only and B only quadratic (squared) terms.
where i m is the standard unit vector consisting of 1 in the m-th location and zeros else- where. This tangent vector ∆ can be represented with respect to arbitrarily chosen orthonormal basis set as In this expression the real N a -tuple α, and the real N b -tuple β, are the stochastic components. Then We compute the terms, P A ∆ · i m and P B ∆ · i m , separately. Let the basis vectors have the numerical form a k = Y A k and b k = Y ⊥ B k . For m = 1 the 6 terms in the sum (B.123) are

Let the basis vectors for
Then (B.123) is the sum of (B.124) to (B.129) This expression and the corresponding ones for P A ∆ · i 2 , · · · P A ∆ · i p can be written in the matrix form as We first develop the formulas for the case p = 2, N = 5 and then generalize the result. In this case N − p = 3 and N b = 2p(N − p) = 12. For m = 1 the P B ∆ · i m term is computed as follows. Using the standard basis, the B k matrices are such that 4, 5, 6, 10, 11, 12 . (B.136) The non-zeros terms corresponding to the k = 1, 2, 3 and k = 7, 8, 9 basis vectors are and for k = 7, 8, 9 (with B 7 = i · B 1 , B 8 = i · B 2 and B 9 = i · B 3 ) Using the above, the sum (B.135) for m = 1 can be expressed as and X 1 is ((N − p) × p(N − p)) . For the example calculation at hand p = 2 and the X 1 matrix is Similarly for m = 2 the non-zero terms in the sum corresponding to the k = 4, 5, 6 and k = 10, 11, 12 basis vectors are and for k = 10, 11, 12 Y ⊥ B n+9 i 2 = i · g n ; n = 1, 2, 3 (B.149) Using the above the sum (B.135) for m = 2 can be expressed as where M 2 = X 2 i · X 2 and X 2 = 0 I N −p .

p arbitrary
In the general case, for arbitrary p, the X k matrices are (N − p) × p(N − p) and are given by It follows that the M k matrices are (N − p) × N b , N b = 2p(N − p) and are given by The factor Y ⊥ M k is then N × N b . (B.153) Using (B.151) we note that for p = 3 the X 2 matrix, for example, is Substituting (B.131) and (B.135) into (B.121) yields To establish the bound use the fact that C F −1 (see theorem 10) so that Since the off diagonal sub-matrices (F αβ and F βα ) of the matrix F vanish, this becomes For convenience, this is expressed as Consider the first term on the right hand side of (B.161). For k = 1 using (B.132) and (439) we have yields (B.164) Using the notation Y = s 1 s 2 s 3 we have Using (439) for γ ij , that is Next, to evaluate the second term on the right hand side of (B.161), recall that (Note that the 2 has cancelled.) As a result we have (B.174) Using the notation Y ⊥ = g 1 · · · g N −p this can be expressed as which follows since The columns of Eś are given by Eś(:, k) = vec(Ś k ) whereŚ k is (518) Using this we have Using the selection matrix identity (B.260) the factor on the right hand side of the above can be expressed as Factoring out the scalar term yields Applying the selection matrix identity (B.256) we have This result can be restated in matrix form using the Hadamard product as The columns of Eś are given by Eś(:, l 1 ) ≡ vec(Ś l 1 ) whereŚ k iś The indices on the right and left-hand sides are related by (B.191) Using this expression the product is given by Using the selection matrix identity (B.263) we have Substituting this into (B.193) yields From the selection matrix identity (B.256), the factor above simplifies to .196) and back substituting yields This can be expressed in full matrix form using Hadamard product notation Using (B.188) to substitute for T T H above yields Identifying parts, we see that source correlation matrix P is equal to P = T ΛT H so that The signal-plus-noise model covariance is and its inverse is Substituting this into P s = (P H H R −1 HP ) yields substituting (B.202) into the above expression yields the desired result (527)

B.1.8 Proof of Theorem 17
The general form of the Fśś matrix is (471) This can be expressed in component form as The factor on the right hand side is equal to Recall that the matrix Eś is given by (518) where j kk is the vectorized selection matrix, that is j kk = vec(J kk ). Substituting (B.212) into (B.211) yields Using the Kronecker product identity (B.250) the previous expression becomes Noting that the factor ((T * Λ OP T T t ) ⊗ D H P ⊥ D)j jj is of the form (C t ⊗ A)b , and using the identity (C t ⊗ A)b = vec(ABC), we can express the above as . Note that where we have used the identity J ii RJ jj ≡ [R] ij · J ij (see (B.260)). Substituting the above into (B.220) and factoring out the scalar term yields Using the matrix identity (B.256) (note the transposition) Substituting the above into (B.210) and using the Hadamard product notation this is expressed in matrix form as Using the the equivalence of forms lemma (527) Fśś for the modal model M ≥ 1 case is where P s = (P H H R −1 HP ) (see (526) ) .
The general form of the Fśś matrix is (471) The factor E H s (Λ OP T ⊗ I N −p )Eś is given in component form as Let R be any M p × M p matrix so that the product is the p × p matrix given by where l 1 = p(m 1 − 1) + k 1 and l 2 = p(m 2 − 1) + k 2 In this expression R l 1 l 2 is a scalar and J k 1 k 2 is the p × p selection matrix J k 1 k 2 = e k 1 ⊗ e t

List of References
[1] S. Smith, "Covariance, subspace, and intrinsic cramér-rao bounds," IEEE Trans. Signal Process., vol. 53, no. 5, p. 16101630. The directional derivative in the ∆ = Y A + Y ⊥ B direction therefore is Substituting for δL(t) and differentiating yields the desired result To compute the directional derivative in the coordinate directions recall (410) For notional simplicity, in the remainder of the calculation we drop the subscript on the matrix B k . The factor δR 1 (0, B)R −1 is therefore (see (B.80)) Rearranging terms within the trace operator results in tr(δR 1 (0, B)R −1 ) = 0 (C.9) which follows since Y H Y ⊥ = 0.
The second term in (C.6) is a function of the data through theR, the SCM. Reordering (C.6) within the trace operator, and substituting for δR 1 (0, B) this term is Defining C ≡ BΛΛ −1 s the above is = σ −2 tr((CY HR Y ⊥ + C H Y H ⊥R Y )) (C.14) = σ −2 2Re tr(CY HR Y ⊥ ) . WritingR in eigen decomposition partitioned form the above becomes In this expression, Y 1 is an N × p matrix whose columns are p arbitrarily selected eigenvectors ofR and Y 2 is the N × (N − p) matrix of the remaining eigenvectors. The above equation is satisfied if Y is chosen as for Q an arbitrary p × p unitary matrix. To verify this substitute (C.31) into (C.30) To maximize, we choose the set of p vectors corresponding to the largest p eigenvectors.

C.3 Proof of lemma 23
Consider the damped exponential signal waveform h(n) = z n where z = ρ exp(iθ). Then where q = 2M p and The natural coordinate basis is (518) s k = Y ⊥Sk wherȇ S ρ;k = Y H ⊥ D ρ J kk T ; k = 1 · · ·q (C.51) S θ;k = Y H ⊥ D θ J kk T k = 1 · · ·q (C.52) whereq = M p. The basis matrix E s has the partitioned form E s = E s;θ E s;ρ (C.53) where E s;θ ≡ vec(S θ;1 ) · · · vec(S θ;p ) (C.54) and E s;ρ ≡ vec(S ρ;1 ) · · · vec(S ρ;p ) . (C.55) Since J kk selects the kth column, using (C.48), we see that in vectorized form vec(S θ;k ) = i · ρ k vec(S ρ;k ) . (C.56) The columns of the first set are scaled version of the second set, and therefore spanned byĒs. Therefore, ifĒ s is the orthogonal basis for E s;θ ,that is E s ≡ orth(E s;θ ) (C.57) then i ·Ē s is a basis for the set E s;ρ . The real vector space S Y therefore has the orthonormal basis given by E s = Ē s i ·Ē s (C.58) or expressed alternatively as where E 0 ≡ I iI .

C.4 Proof of theorem 24
Substituting (C.58) into the first derivative defined by (660)   The expanded form of (C.79) is As a result, (687) for Y Q is Therefore if X k (Y ) = 0 then X k (Y Q) = 0.

C.6 MLE using Reference Solution and Geometric Newton's Method
The discussion in section 4.2 made use of a reference point Y 0 on the submani-foldB. Given an S-N Basis at this point we established the submanifold membership condition (641) as (655) . When the constraint submanifold is defined by a parametric model the reference point Y 0 may be specified by a reference parameter θ 0 via the singular vector matrix of H = H(θ 0 ). The basis vectors that define the tangent space T Y 0B are derived from the differentiable signal model H(θ). The reference parameter θ 0 is typically provided by a computationally efficient suboptimal estimator (e.g., ESPRIT) or is the result of a previous iteration as in a tracking application. Using this reference point approach, the estimator in section 4.2 may be transformed into a geometric form of Newton's method, with the q-tuple θ parameter interpreted as a coordinate system on B.
The formulation of the S-N basis approach leading to the solution form (655) required a basis for the space SỸ . Notionally this space was defined by parallel translating the basis vectors for S Y 0 , where Y 0 ∈B, to the unconstrained MLEỸ (in general Y / ∈B). Using this, the constrained MLE Y was expressed in terms of exponential map atỸ (664) and the tangent vector ∆ (665).
Given a reference solution, it is preferable to develop the ML estimate using an exponential map based at Y 0 rather thanỸ . That is, we seek the estimate in the form and whereś k is natural coordinate basis at Y 0 . The components δθ represent the change in θ required to move from Y 0 to Y , that is θ = θ 0 + δθ (C.83) where θ 0 and θ correspond to Y 0 and Y , respectively. In the following sections we show that the ML tangent vector estimate is given by The main point of this section is to justify the notation, L s | Y M D ≡ 2Re(E H s Λ bb b n ) (see eqn. (660)), used earlier.

Constrained ML Criterion and Solution in S-N Coordinates
Let the partitioned basis s, n for B Y = S Y ⊕ N Y have the numerical form s k = Y ⊥ S k and n k = Y ⊥ N k . These basis elements are related to the basis b k for B Y by a linear transformation This can be expressed in equivalent matrix form as (see (A.232)) where E s , E n and E b are defined in terms of the matrices S k , N k and B k by (653), (654) and (654), respectively.
Given an arbitrary ∆ ∈ B Y , represented as ∆ = N b k=1 b k β k , the components β  (607)) the matrices L ss and L sn are The product L snn in (C.93) is where we have used b n ≡ E nn .
We observe that (C.96) is equal to (660) Evaluating Derivatives and L snn andĉ The first derivatives of L in the neighborhood of the unconstrained MLEỸ may be expressed using a Taylor expansion. By definition all the first derivatives of L vanish at Y . Since the ML tangent vector∆ atỸ is such that P A∆ = 0 (i.e., α = 0), we need only consider the expansion along tangent vectors in BỸ (see eqn. (585)).
Define the parallel translated difference vector as ∆ Y 0 , that is The translated components are (s − s 0 , n −n) Y 0 . Since the basis vectors forỸ were formally defined by paralleling translating the S Y 0 basis vectors defined at Y 0 toỸ , the components of ∆ Y 0 are (s − s 0 , n −n) Y 0 . If n =n then (ŝ, 0) or (ŝ Y 0 , 0).
The ML estimate can be expressed relative to the reference Y 0 by parallel translating the ML tangent vector fromỸ to Y 0 : The component with respect to the basis at Y 0 is thereforeŝ Y 0 ≡ŝỸ − s 0 . Substituting (C.110) forŝỸ and noting (C.112) yieldŝ The corresponding ML tangent vector then iŝ C.117) and the resultant estimate of Y ∈B is given by Y = exp Y 0∆ Y 0 .

Parameter Estimation
Given the signal model H(θ), a natural coordinate basisś k can be computed for S Y 0 in a straightforward way. As shown in section 3.9.2, a natural coordinate basisś k for the tangent space S Y 0 can be expressed in terms of the derivative matrix D of the signal matrix H evaluated atθ 0 . This, in turn, defines the matrix the Eś that appears in Lśś and Lś (see eqn. (520) ).
The natural coordinate basisś k is related to the orthonormal basis s k used in the ear- Establishing a Reference point Y 0 The above discussion supposed a reference point Y 0 ∈B rotationally aligned with Y ; that is ∆ = exp −1 Y Y 0 was such that P A ∆ = 0. This rotationally-aligned reference is established as follows. Given a prior parameter estimateθ 0 , the semi-unitary matrixY 0 is computed as the left singular value matrix of H(θ 0 ). Y 0 = svd(H(θ 0 )) . (C.125) The point Y 0 is the rotationally aligned version ofY 0 and is given by where the p × p unitary matrix Q is given by where the unitary matrices U 1 and V 1 are defined by where we used the fact that P (θ 0 )H # = H # .

C.6.1 Estimation for M-Dimensional Modal Signal Model
When the signal matrix H is specified by a modal form (section 4.3), the above results take a simple form. As shown in section B.1.3, a natural coordinate basisś k for the tangent space S Y 0 can be expressed in terms of the derivative matrix D of H(θ) at θ 0 . This basis defines the matrix the Eś that appears in definitions of Lśś (C.120) and Lś (C.121). The Eś matrix is defined by (546) for the general multi-dimensional case (the 1-dimensional case is given by (520)). where G = G 1 , . . . , G M is a selection matrix with submatrices G m defined by (549). This is well approximated by where R s ≡ỸΛ sỸ H . (C.141) Proof: Represent the vector∆ ∈ BỸ as∆ = Y 0⊥B . Substituting the modal expression for the Eś matrix, given by (551), into the general expression for Lś, given by (C.121), yields where, to simplify notation, c ≡ 2K σ 2 . Repeated application of the identity vec(C t ⊗ A)vec(b) = vec(ABC) yields Making the identification∆ =Ỹ ⊥B yields the first desired result (C.139) USing (C.136) into the above yields the second desired result (C.141).

Direct computation of Lś
Previously we computed the derivative Lś at Y 0 using an indirect Taylor series approach based on the unconstrained MLEỸ . In this section we compute the derivative directly in the standard way and show that the two approaches yield the same result. The directional derivative of interest is In lemma 20 we showed the first derivative of the likelihood function in the direction of the arbitrary tangent vector ∆ = Y 0⊥ B k , ∆ ∈ B Y 0 is (590) Let ∆ (k) denote the tangent to the coordinate curve θ (k) (t) to t = 0. This tangent in general has components in both A Y and S Y Denote the projection of ∆ (k) onto S Y < B Y as the vectorś k , that isś k = P B ∆ (k) andś k ∈ S Y < B Y .

C.6.2 Geometric Newton's Method Refinement Algorithm Summary
Let the eigen-decomposition of the SCM be R =Ỹ ΛỸ H +Ỹ ⊥ Λ ⊥Ỹ H ⊥ . Assume that an initial parameter estimate θ 0 is given. The refinement δθ is computed as follows.