Smooth Local Subspace Projection for Nonlinear Noise Reduction Smooth Local Subspace Projection for Nonlinear Noise Reduction

Many nonlinear or chaotic time series exhibit an innate broad spectrum, which makes noise reduction diﬃcult. Local projective noise reduction is one of the most eﬀec-tive tools. It is based on proper orthogonal decomposition (POD), and works for both map-like and continuously sampled time series. However, POD only looks at geometrical or topological properties of data and does not take into account the temporal characteristics of time series. Here we present a new smooth projective noise reduction method. It uses smooth orthogonal decomposition (SOD) of bundles of reconstructed short-time trajectory strands to identify smooth local subspaces. Re-stricting trajectories to these subspaces imposes temporal smoothness on the ﬁltered time series. It is shown that SOD-based noise reduction signiﬁcantly outperforms the POD-based method for continuously sampled noisy time series.

Noise reduction in nonlinear or chaotic time series is problematic due to the inherent broad spectrum of the deterministic component in a signal.Projective noise reduction based on proper orthogonal decomposition has been shown to be effective for low noise levels.Here, we study a generalization of the local projective noise reduction method based on smooth orthogonal decomposition, which accounts for both topological and temporal characteristics of the time series.Through the analysis of synthetic chaotic time series contaminated by various levels of noise, we show that our method significantly outperforms the original and works even for low signal-to-noise ratios.Thus, it provides an effective tool for noise reduction in continuously sampled chaotic time series.

I. INTRODUCTION
Many natural and engineered systems generate nonlinear deterministic time series that are contaminated by random measurement/dynamical noise.Chaotic time series have inherently broad spectra and are not amenable to conventional spectral noise filtering.Two main methods for filtering noisy chaotic time series [1][2][3] are: (1) model based filtering 4 , and (2)   projective noise reduction 2 .While model based reduction has some merit for systems with known analytical models 4 , projective noise reduction provides a superior alternative 2 for experimental data.There are other methods like adaptive filtering or the wavelet shrinkage 5 method that can work in certain cases but require an opportune selection of parameters or some a priori knowledge of the system or the noise mechanisms 6 .
In this paper, we describe a new method for nonlinear noise reduction that is based on a smooth local subspace identification 7,8 in the reconstructed phase space 9,10 .In contrast to identifying the tangent subspace of an attractor using proper orthogonal decomposition (POD) of a collection of nearest neighbor points (i.e., as in local projective noise reduction scheme), we identify a smooth subspace that locally embeds the attractor using smooth orthogonal decomposition (SOD) of bundle of nearest neighbor trajectory strands.This new method accounts for not only geometrical information in the data, but its temporal characteristics too.As we demonstrate later, this dramatically increases the ability to filter low signal-to-noise-ratio (SNR) time series.
In the following section ideas behind nonlinear noise reduction are discussed, followed by the description of the SOD-based algorithm.Models generating test time series are introduced next, along with the methods used in the algorithm evaluation, which is followed by the description of results and discussion.

II. SMOOTH PROJECTIVE NOISE REDUCTION
Both POD-based local projective noise reduction 2 , and SOD-based smooth projective noise reduction work in the reconstructed phase space 9 of the system generating the noisy time series.The basic idea is that, at any point on the attractor, noise leaks out into higher dimensions than the dimension of the attractor's tangent space at that point.Thus, by embedding data into the higher d-dimensional space and then projecting it down to the tangent subspace of k-dimensions reduces the noise in the data.The differences between the methods are in the way this tangent space is identified.
For both noise reduction methods, the noisy time series {x i } n i=1 is embedded into a d-dimensional global phase space using delay coordinate embedding generating the vectorvalued trajectory {y i } n−(d−1)τ i=1 : where τ is the delay time, which is usually determined using the first minimum of the average mutual information 18 for the time series, by some other nonlinear correlation statistic 17 , or by visual inspection.
The selection of the global embedding dimension d, and the dimension of the tangent subspace k are important steps in both noise reduction schemes.For a noise-free time series, a method of false nearest neighbors 11,12 is usually used to estimate the minimum necessary embedding dimension D. However, the efficacy of this method is degraded by the presence of noise 13 .Alternatively, the embedding theorem provides 9 a measure of minimum sufficient embedding dimension which should be greater than twice the fractal dimension 14 of the attractor.The most commonly used estimate of the fractal dimension is a correlation dimension 15 , which is also not very robust with respect to noise.However, for moderate noise levels, one can still get rough approximation to the correlation dimension 16 , which could serve as the upper bound on the projection dimension.In practical applications, the use of methods that are robust with respect to noise (e.g., the minimum description length principle 13 or continuity statistic 17 ) are more appropriate.In this paper, however, we use synthetic time series which are artificially contaminated with additive noise.Therefore, the size of global embedding dimension is taken to be a slightly greater than the minimum required embedding dimension estimated using the clean time series and the false nearest neighbors method.

A. Local Projection Subspace
In the local projective noise algorithm 19 , for each point y i in the reconstructed ddimensional phase space, r temporarily uncorrelated nearest neighbor points {y j i } r j=1 are determined.The POD is applied to this cloud of nearest neighbor points, and the optimal k-dimensional projection of the cloud is obtained providing the needed filtered ȳi (which is also adjusted to account for the shift in the projection due to the trajectory curvature at y i 3 ).The filtered time series {x i } n i=1 is obtained by averaging the appropriate coordinates in the adjusted vector-valued time series {ȳ i } n−(d−1)τ i=1 .In contrast, smooth projective noise reduction works with short strands of the reconstructed phase space trajectory.These strands are composed of (2l + 1) time-consecutive reconstructed phase space points, where l is a small natural number.Namely, each reconstructed point y k has an associated strand s k = [y k−l , . . ., y k+l ], and an associated bundle of r nearest neighbor strands {s j k } r j=1 , including the original.This bundle is formed by finding (r − 1) nearest neighbor points {y j k } r−1 j=1 for y k and the corresponding strands.SOD is applied to each of these strands in the bundle and the corresponding k-dimensional smoothest approximations to the strands {s j k } r j=1 are obtained.The filtered ȳk point is determined by the weighted average of points {ỹ j k } r j=1 in the smoothed strands of the bundle.Finally, the filtered time series {x i } n i=1 are obtained just as in the local projective noise reduction.The procedure can be applied repeatedly for further smoothing or filtering.For completeness, the details of SOD and smooth subspace approximation are given in the following sections.

B. Smooth Orthogonal Decomposition
The objective of POD is to obtain the best low-dimensional approximate description of high-dimensional data in a least squares sense [20][21][22] .In contrast, SOD is aimed at obtaining the smoothest low-dimensional approximations of high-dimensional dynamical processes 7,23,24 .While POD only focuses on the spatial characteristics of the data, SOD considers both its temporal and spatial characteristics.
Consider a case where a field, y, is defined for some d state variables {x i } d i=1 (e.g., a system of d ordinary differential equations, d sensor measurements from some system, etc.).
Assume that this field is sampled at exactly n instants of time (e.g., n sets of simultaneous measurements at d locations).We arrange this data into a n×d matrix Y, such that element Y ij is the sample taken at the ith time instant from the jth state variable.We also assume that each column of Y has zero mean (or, alternatively, we subtract the mean value from each column).We are looking for a linear coordinate transformation of this matrix Y: where the columns of Q ∈ R n×d are new smooth orthogonal coordinates (SOCs) and smooth projective modes (SPMs) are given by the columns of Ψ ∈ R d×d .In this transformation, Q should contain time coordinates sorted by their roughness.
The SOD is accomplished by the following generalized eigenvalue problem: 36 where

C. Smooth Subspace Identification and Data Projection
Equation ( 3) can be solved using a generalized singular value decomposition of the matrix pair Y and Ẏ: where smooth orthogonal modes (SOMs) are given by the columns of Φ ∈ R d×d , SOCs are given by the columns of Q = UC ∈ R n×d and SOVs are given by the term-by-term division of diag(C T C) and diag(S T S).The resulting SPMs are the columns of the inverse of the transpose of SOMs: In this paper, we will assume that the SOVs are arranged in descending order (λ The magnitude of the SOVs quadratically correlates with the smoothness of the SOCs 7 .Please note that if the identity matrix is substituted for Σ in Eq. ( 3), it reduces to the POD eigenvalue problem that can be solved by the singular value decomposition of the matrix Y.
(c) For the same points ŷi , look up (r − 1) nearest neighbor points that are temporarily uncorrelated, and the corresponding nearest neighbor strands {s j i } r j=2 .
(d) Apply SOD to each of the r strands in this bundle, and obtain the corresponding k-dimensional smooth approximations to all d-dimensional strands in the bundle (e) Approximate the base strand by taking the weighted average of all these smoothed strands si = sj i j , using weighting that diminishes contribution with the increase in the distance from the base strand.4. Repeat the first three steps until data is smoothed out, or some preset criterion is met.

IV. EVALUATING THE ALGORITHM
In this section we evaluate the performance of the algorithm by testing it on a time series generated by Lorenz model 29 The chaotic signal used in the evaluation is obtained using the following initial conditions (x 0 , y 0 , z 0 ) = (20, 5, −5), and the total of 60,000 points were recorded using 0.01 sampling time period.The double-well Duffing equation used is: The steady state chaotic response of this system was sampled thirty times per forcing period and a total of 60,000 points were recorded to test the noise reduction algorithms.
In addition, a 60,000 point, normally distributed, random signal was generated, and was were applied to all noise signals using total of ten iterations.
To evaluate the performance of the algorithms, we used several metrics that include improvements in SNRs and power spectrum, estimates of the correlation sum and short-time trajectory divergence rates (used for estimating the correlation dimension and short-time largest Lyapunov exponent, respectively 30 ).In addition, phase portraits were examined to gain qualitative appreciation of noise reduction effects.
The largest Lyapunov exponent λ 1 31 characterizes the exponential growth of the distance between two points on the attractor that are very close initially.If this small initial distance between points is δ 0 and the distance at a later discrete time n is denoted as δ n , then δ n ∼ δ 0 exp (λ 1 t) as δ 0 → 0.Here we used a modified algorithm by Wolf et al. 32,33 , where for a set of reconstructed points y i on their fiducial trajectory, we look up the r temporarily uncorrelated nearest neighbor points and track their average divergence from the fiducial trajectory.
The correlation dimension characterizes the cumulative distribution of distances on the attractor 15,34,35 , and indicates the lower bound on the minimum number of variables needed to uniquely describe the dynamics on the attractor.Here, we used the modified algorithm described in 15 , where the correlation sum is estimated for the different length scales as: where Θ is a Heavyside step function, N is the total number of points, s is the interval needed to remove temporal correlations, and D 2 is the correlation dimension.
V. RESULTS

A. Lorenz Model Based Time Series
Using average mutual information estimated from the clean Lorenz time series, a delay time of 12 sampling time periods is determined.The false nearest neighbors algorithm (see    In addition, the rate of increase is considerably larger for the SOD algorithm compared to the POD, especially during the initial applications.As seen from the Table I, the SODbased algorithm provides about 13 dB improvement in SNRs, while POD-based algorithm manages only 7 ∼ 8 dB improvements.
The estimates of the correlation sum and short-time trajectory divergence for the noisereduced data and the original clean data are shown in Fig. 5.For the correlation sum, both POD-and SOD-based algorithms show similar scaling regions and improve considerably on the noisy data.For the short-time divergence both methods manage to recover some of the scaling regions, but SOD does a better job at small scales, where POD algorithm yields large local fluctuations in the estimates.
The qualitative comparison of the phase portraits for both noise reduction methods are shown in Fig. 6.While POD does a decent job al low noise levels, it fails at higher noise  levels where no deterministic structure is recovered at 80% noise level.In contrast, SOD provides smoother and cleaner phase portraits.Even at 80% noise level, the SOD algorithm is able to recover large deterministic structures in the data.While SOD still misses small scale features at 80% noise level, topological features are still similar to the original phase portrait in Fig. 3(a).

B. Duffing Equation Based Time Series
Using the noise-free Duffing time series and the average mutual information, a delay time of seven sampling time periods is determined.In addition, false nearest neighbors algorithm indicates that the attractor is embedded in D = 4 dimensions for 0 % noise (see Fig. 7).
Six is used for global embedding dimension (d = 6), and k = 3 is used as local projection dimension.The reconstructed phase portraits for the clean Duffing signal and the added     For the correlation sum, both POD-and SOD-based algorithms show similar scaling regions and improve substantially on the noisy data, with SOD following the noise trend closer than POD.For the short-time divergence rates both methods manage to recover some of the scaling regions, but SOD does a better job at small scales, where POD algorithm causes large local fluctuations in the estimates.
The qualitative comparison of the Duffing phase portraits for both noise reduction methods are shown in Fig. 11.For consistency all noise-reduced phase portraits are obtained after 10 consecutive applications of the algorithms.While POD does a decent job at low noise levels, it fails at high noise levels where no deterministic structure is recovered at 40% or 80% noise levels.In contrast, SOD provides smoother and cleaner phase portraits at every noise level.Even at 80% noise level, the SOD algorithm is able to recover some large deterministic structures in the data, providing topologically similar phase portrait.increase.In addition, SOD accomplishes this with considerably fewer applications of the noise reduction scheme, while POD works more gradually and requires many more iterations at higher noise levels.
Estimating nonlinear metrics used in calculations of correlation dimension and short-time Lypunov exponent showed that both methods are adequate at low noise levels.However, SOD performed better at small scales, where POD showed large local fluctuations.At larger noise levels both methods perform similarly for the correlation sum, but SOD provides larger scaling regions for short-time trajectory divergence rates.POD still suffers from large variations at small length/time scales for larger noise levels.
The above quantitative metrics do not tell a whole story, however; we also need to look at trajectories themselves.Visual inspection of the reconstructed phase portraits showed superior results for SOD.POD trajectories had small local variations even at low noise levels, and completely failed to recover any large dynamical structures at large noise levels.
In contrast, SOD method provides very good phase portraits up to 20% noise levels, and for larger noise levels the reconstructions still have clear topological consistency with the original phase portrait.However, SOD also fails to capture finer phase space structures for 80% noise level.
In summary, the SOD-based noise reduction procedure provides considerable improvement over the POD-based method for continuously sampled noisy time series.This was demonstrated using temporal, spatial and spectral characteristics of noise reduced signals.
The nature of SOD itself is well suited for identifying smooth local subspaces for noise reduction.However, precisely this feature makes SOD-based method not suitable for direct application to map-like noisy time series.Finally, the SOD-based algorithm is able to handle noise at both large and small scales, while POD is only good for large scale structures in the data.
are auto-covariance matrices for d states and their time derivatives, respectively.Eigenvalues λ i are called smooth orthogonal values (SOVs), and ψ i ∈ R d are individual SPMs (i = 1, . . ., d).The time derivative of the matrix Y is either known analytically or can be derived numerically Ẏ = DY, where D is some finite difference differential operator.
FIG. 1. Schematic illustration of smooth projective noise reduction algorithm

3 .
Shifting and Averaging: (a) Replace the points {ŷ i } n+l i=1+l at the center of each base strand by its approximation ȳi determined above to form Ȳ ∈ R [n+(d−1)τ ]×d .(b) Average each d smooth adjustment to each point in the time series to estimate the filtered point: xi = 1 d d k=1 Ȳ(k, i + (k − 1)τ ) .
mixed with the chaotic signals in 1/20, 1/10, 1/5, 2/5, and 4/5 standard deviation ratios, which respectively corresponds to 5, 10, 20, 40, and 80 % noise in the signal or SNRs of 26.02, 20, 13.98, 7.96, and 1.94 dB.This results in total of five noisy time series for each of the models.The POD-based projective and SOD-based smooth noise reduction procedures

Fig. 2 )
Fig. 2) indicates that the attractor is embedded in D = 4 dimensions for teh noise-free time series.Even for the noisy time series the trends show clear qualitative change around four or five dimensions.Therefore, six is used for global embedding dimension (d = 6), and k = 3 is used as local projection dimension.The reconstructed phase portraits for the clean Lorenz signal and the added random noise are shown in Fig. 3 (a) and (b), respectively.The corresponding power spectral densities for the clean and the noise-added signals are shown in Fig. 3 (c).The phase portraits of POD-and SOD-filtered signals with 5% noise are shown in Fig. 3 (d) and (e), respectively.Fig. 3 (f) shows the corresponding power spectral densities.In all these and the following figures, 64 nearest neighbor points are used for POD-based algorithm, and bundles of 9 eleven-point-long trajectory strands are used for SOD-based algorithm.The filtered data shown is for 10 successive applications of the noise reduction algorithms.The decrease in the noise floor after filtering is considerably more dramatic for the SOD algorithm when compared to the POD.

FIG. 3 .
FIG. 3. Reconstructed phase portrait from Lorenz signal (a); random noise phase portrait (b); power spectrum of clean and noisy signals (c); POD-filtered phase portrait of 5% noise added data (d); SOD-filtered phase portrait of 5% noise added data (e); and the corresponding power spectrums (f).

FIG. 6 .
FIG. 6. Phase space reconstructions for noisy and noise-reduced Lorenz time series after ten iterations of the POD-and SOD-based algorithms

FIG. 8 .
FIG. 8. Reconstructed phase portrait from Duffing signal (a); random noise phase portrait (b); power spectrum of clean and noisy signals (c); POD-filtered phase portrait of 5% noise added data (d); SOD-filtered phase portrait of 5% noise added data (e); and the corresponding power spectrums (f). 0

10 FIG. 11 .
FIG. 11.Phase space reconstructions for noisy and noise-reduced Duffing time series after ten iterations of the POD-and SOD-based algorithms Table I for the 5% and 20% noise added signals.While the SNRs are monotonically increasing for the SOD algorithm after each successive application, they peak and then gradually decrease for the POD algorithm.

TABLE I .
SNRs for the noisy time series from Lorenz model, and for noisy data filtered by POD- Table II for 5% and 20% noise added signals.Here, SNRs for both methods are peaking at some point and then gradually decrease.In addition, the rate of increase is considerably larger for the SOD compared to

TABLE II .
SNRs for the noisy time series from Duffing model, and for noisy data filtered by POD-