Multifractal Analysis of Unvoiced Speech Signals

In this thesis, we analyze the complexity involved in the production of unvoiced speech signals with measures from nonlinear dynamics and chaos theory. Previous research successfully characterized some speech signals as chaotic. However, in this dissertation, we use multifractal measures to postulate the presence of various fractal regimes present in the attractors of unvoiced speech signals. We extend prior work which used only correlation dimension D2 and Lyapunov Exponents to analyze some speech sounds. We capture the chaotic properties of unvoiced speech signals in the embedded vector space more succinctly by not only estimating the correlation dimension D2 , but by also estimating the generalized dimension Dq· The (nonconstant) generalized dimensions were estimated from phase space reconstructed vectors of single scalar variable realization of unvoiced speech signals. The largest of those dimensions is an indicator of the minimum dimension required in the phase space of any realistic dynamical model of speech signals. Results of the generalized dimension estimation support the hypothesis that unvoiced speech signals indeed have multifractal measures. The multifractal analysis also reveals that unvoiced speech signals exhibit low-dimensional chaos as well as "soft" turbulence. This is in contrast to the opinion that unvoiced speech signals are generated from what is technically known as "hard" turbulent flow, in which the dimension of a dynamical model is very high. Unvoiced speech signals may actually be generated from "soft" turbulent flow. In this dissertation, we also explore the relationship between the estimated generalized dimensions Dq and the singularity spectrum f(a). Existing algorithms for accurately estimating the resulting singularity spectrum f(a) from the samples of generalized dimensions Dq of a multifractal chaotic time series use either (a) linear


Abstract
In this thesis, we analyze the complexity involved in the production of unvoiced speech signals with measures from nonlinear dynamics and chaos theory. Previous Results of the generalized dimension estimation support the hypothesis that unvoiced speech signals indeed have multifractal measures. The multifractal analysis also reveals that unvoiced speech signals exhibit low-dimensional chaos as well as "soft" turbulence. This is in contrast to the opinion that unvoiced speech signals are generated from what is technically known as "hard" turbulent flow, in which the dimension of a dynamical model is very high. Unvoiced speech signals may actually be generated from "soft" turbulent flow.
In this dissertation, we also explore the relationship between the estimated generalized dimensions Dq and the singularity spectrum f(a). Existing algorithms for accurately estimating the resulting singularity spectrum f(a) from the samples of generalized dimensions Dq of a multifractal chaotic time series use either (a) linear / interpolation of the known, coarsely sampled, Dq values or (b) a finely sampled Dq curve obtained at great computational/experimental expense. Also, in conventional techniques the derivative in the expression for Legendre transform necessary to go from Dq to f (a) is approximated using first order centered difference equation. Finely sampling the Dq is computationally intensive and the simple linear approximations to interpolation and differentiation give erroneous end points in the f(a) curve. We propose using standard min-max filter design methods to more accurately interpolate between known samples of the Dq values and compute the differentiation needed to evaluate the Legendre transform. We use optimum (min-max) interpolators and differentiators designed with the Parks-McClellan algorithm. We have computed the generalized dimensions and singularity spectrum of 20 unvoiced speech sounds from the ISOLET database. The results not only indicate multifractality of certain unvoiced speech sounds, but also may lead to nonlinear maps that may be useful in improving the nonlinear dynamical modeling of speech sounds.
This new_ approach to f(a) singularity spectrum calculation exhibits computational reduction and improved accuracy. The proposed method also provides estimates of the generalized dimensions at D 00 and D_ 00 which are almost impossible to obtain from real data with limited number of data samples. Also, the asymmetric spread of a values with the corresponding f(a) around the maximum of f(a) reveal the inhomogeneity in the attractors of unvoiced speech signals just like the variations in the Dq values. The asymmetric spread of a values may also be an indication that the turbulent energy fields generated during unvoiced speech production are made of non-homogeneous fractals.
iii Acknowledgments It is indeed with pleasure and admiration that I want to acknowledge the expert guidance, consistent support and helpful suggestions of my major professor; Prof. G.   various periods that arise from a bifurcation process in equation (2.4) taken from [2,3). Basically, as r increases, stable fixed points of period 2n- 1  is computed from the growth of length elements (adopted from [6]). . 52 2.17 Values of Grassberger-Procaccia Entropy K 2 ,dE for the Lorenz Attractor (adopted from [7] 3.2 A cross-section of the human larynx (adopted from [8]) 61 3.3 Digital processing model for production of speech signals (adopted from [9]  Notice the intermittent and selfsimilar structure (adopted from [11]). Also notice the points in the pdf that tend to increase beyond bound while others shrink towards zero.
Phase of a signal Robert M. May 1 In this dissertation, a refinement is made to the nonlinear dynamical analysis of human speech signals. Previous research in this area has successfully characterized certain speech signals as chaotic. Evidence of low-dimensional chaos, which includes the existence of strange attractors, positive Lyapunov exponents, bifurcations and finite entropies, has been found in speech [14,15,16,17,18,19,20,21,22,23].
Due to the complexity involved in speech production, in terms of fluid flow and nonlinear interaction within the vocal apparatus, we analyze unvoiced speech signals using multifractal measures to determine the presence of various fractal regimes in speech attractors. These fractal regimes (multifractal structure) are not revealed by 1 The quote is from a review article on simple mathematical models (13]. 5 the estimation of only the correlation dimensions and Lyapunov exponents as done by Narayanan et al [14]. We therefore propose the estimation of the spectra of generalized dimensions to explain the multifractal nature of voiced and unvoiced fricative sounds. This method is based on the estimation of the generalized dimensions Dq from the phase-space reconstructed vectors of single scalar realization of speech signals. We also propose an improved method of evaluating the singularity spectrum, f(a), from the estimated generalized dimensions Dq. The new method is based on the best min-max digital filter that approximates the interpolation and differentiation operations needed to implement the Legendre transformation relating the generalized dimensions Dq to the singularity spectrum f (a). We later use this technique to determine the f (a) singularity spectrum for unvoiced speech signals.

Organization of Dissertation
This dissertation is organized as follows: In chapter 2, the Logistic Map [13] and the Lorenz equations [24] are used to introduce chaotic parameters, concepts, and properties such as phase-space, invariant density, bifurcations, ergodicity, etc. [4]. We also discuss the generalization of the correlation dimension D 2 to an infinite hierarchy of generalized dimensions Dq. This is important as prior work by Narayanan et al [14] used only D 2 in the analysis of speech signals whereas we will evaluate unvoiced speech signals using all Dq. We later quantitatively describe chaos in terms of measures such as dimension, entropy and Lyapunov exponents. In section 2.5, systematic steps relevant to the estimation of generalized dimensions from a single cc,mponent speech signal are introduced. These include the concepts of time delay embedding, correlation integral and local slopes.
Chapter 3 contains a brief overview of speech production and processing. In the chapter, we discuss the importance of speech in human beings as well as the different classes of sounds produced by human beings. In section 3.3, we describe the different methods that have been used in speech modeling in relation to their relative advantages and disadvantages. In particular, we describe why the popular Linear 6 Predictive Coding model is inadequate to explain the production of certain sounds, and how newer models using non-linearities and chaos concepts are intriguing. Towards the end of the section, we hypothesize the presence of different fractal regimes in unvoiced speech sounds. This is based in part on the success and inadequacies of previous nonlinear and chaos models.
In chapter 4, we examine another measure, the f (a) singularity spectrum, to characterize multifractal behavior on chaotic attractors. The triadic Koch curve will be used to illustrate how a multifractal may be generated and quantified using the invariant density measure of chaotic attractors. We also explore the relationship between the generalized dimensions Dq and the J(a) singularity spectrum. We discuss some of the existing methods of using the Legendre transform of Dq to compute f(a). We focus on computational complexity and introduction of erroneous artifacts. We therefore propose a new signal processing method of evaluating the Legendre transform in a more accurate fashion than popular techniques [12] .
In chapter 5, we present the results of applying multifractal analysis to unvoiced speech sounds from the ISOLETE database [25] . This database consist of letters of the English alphabet spoken in isolation. There are about 7800 spoken letters; 2 productions of each letter is spoken by 150 speakers. We selected 40 male speakers and female speakers making the unvoiced speech sounds, "s" and "z". The selection criteria used were availability of unvoiced speech samples and adequate number of speakers. The unvoiced speech sounds "s" and "z" were chosen so as to include both voiced and unvoiced fricatives. The broad speech selection is to ensure that the use of generalized dimensions Dq to analyze unvoiced speech signals is statistically conclusive. For each speech sample, an estimation of the generalized dimensions Dq as well as the singularity spectrum! (a) will be performed.
In chapter 6, we discuss the results of the multifractal analysis of chapter 5. The chapter ends with conclusions from foe analysis and a brief description of future work in this area of research. 7

Chapter 2
Chaos and Dynamical Systems

.1 Introduction
In the last 15 years, the study of chaos and nonlinear dynamical systems has provided theories and concepts to explain some of the strange behavior exhibited in large and complex systems such as weather prediction and -biological population growth.
This has been accomplished by the introduction of simple system of equations which mimic the strange behaviors found in complex systems. Chaos has provided scientists with tools to understand, visualize, and analyze behaviors using simple system of equations which may provide insight into some of the unexplained characteristics of larger systems. An example of such a complex system is speech which is characterized by many unanswered questions. Speech is a nonlinear phenomena that has been quantitatively described in terms of fluid fl.ow and turbulence (26]. Recently, however, speech has been found to exhibit low-dimensional chaos [27,23,28,14].
This dissertation seeks to obtain a more complete, low-dimensional chaotic description of speech signals. However, before springing into this large task, we focus on qualitatively defining chaos in terms of its properties and characteristics. We provide examples from both the Lorenz system and Logistic map. Later in this chapter, the problem of quantitatively describing chaos in terms of chaotic measures such as dimension, Lyapunov exponent and entropy is addressed.

What is Chaos?
Chaos is a "strange behavior" observed in either very simple, nonlinear or complex dynamical systems. The term chaos is used to describe the apparently random, aperiodic, but bounded, behavior of a deterministic system. A system is deterministic if its behavior over a long period of time is completely determined by the knowledge of the time evolution equations, the parameters describing the system and the initial conditions. With such knowledge, all values of the system over a long period of time or the steady state values can be computed or "determined" a priori. The fascination with chaos therefore arises from reconciling the determinism in its generation with the apparent randomness that is produced. Therefore, the formal definition of a chaotic system is a deterministic system that appears to exhibit random behaviors [2,1,29,30,4).
Chaotic systems exhibit sensitive dependence on initial conditions. This property is explained by the exponential divergence of the steady states (chaotic orbits) of chaotic system generated from two closely spaced initial conditions. That is, no matter how close are the initial conditions of separate implementations of a chaotic system, the trajectory of those implementations will eventually diverge exponentially, making them appear totally dissimilar. Chaotic signals are also characterized by broadband spectra and, therefore, rapidly decaying autocorrelation functions. These characteristics are reminiscent of random noise which makes it non-trivial for nonexperts in chaos to distinguish chaos from apparent randomness [2).
However, it is fascinating and intriguing that chaos sometimes arises from very simple systems (with only a few degreea of freedom) which are noise free. In the next section, we describe an example of a relatively simple system of coupled nonlinear equations that has been used in explaining some complex behaviors found in weather patterns.

Lorenz System
One of the earliest example of chaos in a simple system that is capable of explaining complex behavior in a larger system was by Lorenz. In 1963, Edward Lorenz, the father of modern chaos, gave one of the earliest descriptions of a chaotic process [24].
He developed a system of coupled-nonlinear differential equations to model weather patterns. The standard form of the Lorenz coupled equations is given as The parameter CT compares the rate of energy loss from the fluid due to viscosity to the rate of energy loss due to thermal conduction. Finally, R is proportional to the temperature difference between the bottom and top fluid layer, i.e., Rex 6T = Tw-Tc.
These kinds of experiments were first studied in 1900 by Bernard and a theoretical foundation for the experiment was later provided by Rayleigh in 1916, hence the famous Rayleigh-Bernard convection [1,4]. Depending on the magnitude of 6T, the model is capable of predicting various weather patterns. For example, when 6T is not too large, heat is transferred from the bottom plate to the top plate by thermal conduction 2 whereas when 6T is very 1arge, heat is transferred to the top plate by convection 3 . Other fluid circulation patterns could also be generated by tweaking 1 This is also referred to as the 3-dimensional manifold of the attractor. 2 Conduction is the transfer of heat due to the temperature gradient of a fluid at rest. 3 Convection is the transfer of heat by fluid motion between regions of unequal density that may result from non-uniform heating. the parameter R. The phenomena experienced in the Rayleigh-Bernard experiments were also noticed in the earth's atmo3pheric patterns [24]. Since the equations in Unfortunately for the science of chaos, the unprecedented work of Lorenz was published in a prestigious but obscure journal where it languished until the mid-70's when the interest in the study of chaos exploded. Since then, the study of chaos has attracted the attention of biologists [11], mathematicians, physicists, economists [31,32], scientists and engineers [33,34]. In recent years, due to availability of high speed computers and refined expuimental techniques, it has become clear that examples of chaotic phenomena are tibundant in nature and that they have farreaching consequences in many branches of science and engineering [1,35].
In the next section, we use the so called "Logistic Map" 4 , which is a first order, nonlinear difference equation, to intr0duce various concepts and terms in chaos. Robert May, a population biologist, first investigated some of the complex behavior exhibited by this simple model [13]. The map will also be used to introduce some of 4 Also called the Quadratic Map.

11
the properties that are typical to chaotic systems. These properties can be loosely classifie~ into two category of properties: (1) those, such as invariant density, that describe the temporal evolution of a chaotic system and (2) those that characterize the spatial distribution of points in the phase-space. Although the Logistic Map is used in the introduction of the concepts and properties of chaos, it is important to realize that these concepts are not particular to the Logistic Map, but rather, are common to almost all chaotic systems regardless of their underlying equation or mode of generation.

Characteristics of the Logistic Map
One of the simplest !-dimensional nonlinear systems is the Logistic Map. It is an iterative map that relates the present sample x[n] with the next sample x[n + 1] by a simple map In functional form, this could be written as (2.5) where g(x) = rx(l -x). Let gn denote the nth iteration of g, so that g 0 (x) = x, g 1 (x) = g(x), g 2 (x) = g(g(x)), .. . gn(x) = g(gn-1 (x)). This is called a sequence of iterations and the iterative function gn is sometimes called the iterated map function [36]. Each iteration of the function g produces an iterate and the sequence of values generated by this procedure, i.e., {gn}~=l will be called the trajectory or orbit. is greater than 1, then the expression diverges to infinity; hence, to observe the rich dynamical behavior of the logistic map, the magnitude I x[n] I must be less than or equal to 1. The logistic map attains a maximum value equal to~ at x[n] = ~. Plugging these 2 2 values into equation (2.4) we see that the map only possess interesting properties when r ~ 4. On the other hand, when r < 1, the iterates of the logistic map converge to zero. Thus, a rich dynamical behavior is only observed when r is between 1 and 4.
The dynamical properties of any chaotic system, and in particular the logistic map, can be well understood when we consider the behavior of the map at fixed points, x•. By fixed points, we mean . (2.6) or equivalently, That is, a fixed point is a convergence point of the map; repeated iterations of the map do not change the outcome. If we substitute the fixed point x• into (2.4) and solve for x•, we obtain with two potential solutions, Although the logistic equation has been determined to have rich dynamical behavior when r is between 1 and 4, the dynamical behaviors are different as r scans the range [1,4). The various dynamical behaviors unfold when the fixed points are examined. Therefore, in the next section, we will carefully consider the fixed points x• as they affect the dynamical behavior of the logistic map for different values of r in the range from 1 to 4.

Importance of fixed points
The stability of a fixed point is determined by the slope of the function g(x) at that fixed point. If the magnitude of the slope of the equation evaluated at a given fixed point, is less than 1, then that fixed point is called a stable fixed point, i.e., the trajectory of any initial condition approaches x• as the iteration proceeds. If the 13 magnitude of the slope when evaluated at the fixed point is greater than 1, then we have an unstable or repelling fixed point. In functional form, the magnitude of the slope of the logistic map is Evaluating the slope at the fixed point x* = 1 -~, we obtain r I 2-r 1. (2.10) When r is between 1 and 3, i.e., 1 < r < 3, the fixed point is stable since the magnitude of the slope in equation (2.11) is less than 1. Evaluating the magnitude of the slope in equation (2.10) at fixed point x* = 0, we obtain I r I, which means that the fixed point x* = 0 is unstable for 1 < r < 3. Thus, in the region where r is between 1 and 3, i.e., 1 < r < 3 the logistic map has only one stable fixed point.
Given any initial condition in this range of r, the iteration of the map converges to that one fixed point, i.e., x* = 1 -~. Hence, the orbit of the logistic map from any r initial condition with this range of r settles to a point. It is therefore referred to as a stable cycle of period 1. The behavior of the logistic map in this range of r is depicted in Fig. 2.2 with r = 2.8. With any initial condition 0 < x[O] < 1, the trajectory or orbit of the logistic map when 1 < r < 3, converges to a stable cycle of period 1 orbit, after the initial transients have passed.
As soon as r > 3, the magnitude of the slope in equation (2.11) at the fixed point 1 x* = 1 --is greater than 1. Therefore, from our definition of stability, this fixed r point is now unstable and two new stable fixed points appear [37,38]. The new stable fixed points are determined by solving for x* in the expression and evaluating these points when the magnitude of the slope of g(g(x)) < 1. The phenomenon by which one stable fixed point becomes unstable and yields two new stable fixed points is referred to as period doubling [2,3,1,4]. When r is slightly greater than 3, the trajectory of any initial condition settles into a stable cycle of period 2 around the two stable fixed points calculated from equation (2.12).
14 The two stable points of g(g(x)) continue to be stable fixed points until r = 3.449.
When r is slightly greater than r = 3.449, then the magnitude of the slope of g(g(x)) at these two stable fixed points is greater than 1. Hence, the two new fixed points become unstable and yield four additional stable fixed points. The four new stable fixed points can again be determined by solving for x• in the expression Again, in this range of r, the trajectory of any initial condition settles into a stable cycle of period 4 around the four stable fixed points of equation (2.13). In contrast to the behavior of the trajectory in the interval, 1 < r < 3, the trajectory in the region r > r c never repea.ts; it is totally unpredictable without infinite precision knowledge of r and the initial conditions. It is the existence of an infinite number of different orbits which never repeat and the lack of apparent predictability that is termed chaos [4,29,2,3]. Due to the dependence of the behavior of the logistic map on r, the parameter r is aptly called the tuning parameter. For the    logistic map, the erratic (chaotic) behavior marked by infinite periodicities starts when T =Tc= 3.57 and ends at T = 4.
In the next section, we formally define bifurcation and how it is produced. We will also comment on periodic windows observed in the bifurcation diagram when T >Tc· The bifurcation diagram for the Logistic map is shown in Fig. 2.4.

Bifurcation
The sudden change in the behavior of the logistic map as T is varied in the range, 3 < T < 4, is called bifurcation. It is the splitting that occurs when one stable fixed point become unstable and yields two new stable fixed points. The process of bifurcating from stable periodic cycles to an infinite number of unstable ones is called the bifurcation route to chaos.
The bifurcation route to chaos is not exclusive to the logistic map or I-dimensional systems [41]. It is common to all non-linear first order difference equations of the form where g(x[n]) has only a single maximum in the interval 0 < x[n] < 1. Recall that the logistic map has a single maximum at :.c[n] = ~· Bifurcation is also possible in higher dimensional systems [3,40]. It is important to notice the presence of various periodic windows in the bifurcation diagram of  [40]. For the Logistic map, the period 3 window around T = 3.83 has been proven by Li and Yorke (42] to imply chaos. The bands of periodic windows in the bifurcation diagram are sometimes called crises [3,40].
For a full discussion of periodic windows and crises, refer to [43].
The bifurcation route to chaos is certainly not the only route leading to chaos.
Two other different routes to chaos have been discussed extensively in chaos literature. These are the intermittency route to chaos and the transition from quasiperiodicity to chaos. For a full discu3sion of these routes to chaos, the reader is referred to Schuster [2] and Newhouse [29]. They are not discussed here because these chaotic concepts are beyond the scope of this dissertation.  [2,3). Basically, as r increases, stable fixed points of period 2n-l become unstable and give rise by bifurcation to new stable fixed points of period 2n, n = 0, 1, ....

Lyapunov Exponent
One of the hallmarks of chaos is depicted in Fig. 2    The divergence in the orbits becomes apparent at about sample number 16. 21 at 4) and is given as 15) which is depicted in Fig. 2.6. In section 2.10, the invariant density will be used to estimate the exponential divergence (Lyapunov exponents) of the orbits of two close initial conditions.

Ergodicity
The time series generated by the Logisitic map in the most chaotic state has many more properties. The most important one that will be discussed here is ergodicity.
To demonstrate this, we compute 1500 samples of a single realization of the map with a random initial condition. Next, we compare the single realization time series with an ensemble time series generated as follows: 1500 initial conditions xi [O] were drawn from a uniform distribution. Each initial condition is used to generate a corresponding time series xi[n], But, for each initial condition, only the 300th iterate property is said to be ergodic. Up until now, we have only examined the temporal evolution of the trajectory of a chaotic system. We will now investigate some of the spatial properties of chaotic systems.

Spatial Properties
Although the trajectories of chaotic systems tend to move exponentially apart with time, the orbits remain within some bounded region by wrapping around each other without intersecting or repeating [1]. The geometry just described is strange and may be fractal in nature. Fractals are characterized by self similarity. That is, they exhibit the same intrinsic form no matter how many times they are magnified. This fractal set is referred to as the strange attractor and the set of initial conditions that evolve to an attractor is called its basin of attraction.
A space which is useful for conceptually visualizing the strange attractor of a chaotic dynamical system is the phase-space or state-space. In chaos literature, phase-space and state-space are used interchangeably. It is an abstract space whose coordinates (which completely specify the state of the system) are the degrees of freedom of the system's motion. Shown in Fig In the last 15 years, extensive study of chaos has provided new concepts and tools to understand and categorize the complex chaotic behavior that were previously described as random noise. In the next section, we describe some of the concepts and theories needed to quantify chaos. Sorae of these measures and their extensions are later used in this dissertation to quantify chaos in the acoustic waveforms generated during the production of unvoiced speech signals.

Measures of Chaos
In this section, we introduce measures such as dimension, entropy, Lyapunov exponents and generalized correlation function as quantitative measures to characterize  That is, the pair (x(t), z(t)) in equation (2.3) form the co-ordinates of a point along the horizontal and vertical axis, respectively. The curve shows the trajectory of this pair as time, t, evolves. 27 either the temporal evolution or spatial distribution of points in the chaotic motion generated by dynamical systems.

Dimensions
Each chaotic system can be characterized by its strange attractor as defined in section 2.4. The dimension of a strange attractor is the dimension of the fractal set to which the long term behavior of the corresponding chaotic system is attracted (settles down) to in phase space. Dimension has been one of the most studied quantifiers of dynamical systems [3,4). This was spawned by the realization that the attractors associated with chaotic dynamics have fractiona1 5 dimension d, in contrast to non-chaotic systems, which always have integer dimensions. Therefore, a strange attractor may, for the purpose of simplicity, be thought of as an attractor in phase-space with dimension d < D, where dis non-integer. D is the number of degrees of freedom of the system's motion. In the example of the Lorenz system in section 2.2, Dis equal to 3, since the system's motion is generated from 3 Cartesian co-ordinates x, y and z. However, its attractor dimension dis equal to 2.06 [1).
For a chaotic system, it is important to emphasize the existence of these two different dimensional spaces in the definition of an attractor. The first dimensional space D, also referred to as the Euclidean dimension, is the space onto which the attractor is projected. The other dimension, d, is a lower-dimensional space to which the trajectory, or orbits of the dynamical system, is attracted. For chaotic attractors, dis always non-integer and less than D [1,3,4).
In the next section, we define a th.~oretical method of evaluating the dimension d. We also discuss some of the problems of implementing this theoretical method.

Hausdorff dimension, d
The relevant definition of dimension d which is due to Hausdorff [44,45) is as follows: Cover the attractor (A) of ad-dimensional object in phase space with a total number N(€.) of d-dimensional volume elements such as spheres, cubes, hyper-cubes etc., where f. is the diameter or width of the volume element. The process of covering 5 The idea of fractional dimension was first suggested in the literature by Mandelbrot [28].
28 the attractor of a chaotic system with volume elements is sometimes referred to as "partitioning" [1]. Let N(€) be the number of non-empty volume elements needed to cover A. As€ is made smaller and smaller, the sum of the occupied volume elements approaches the volume of A. Being a purely geometric measure, d should be independent of the width of the volume elements used in covering the attractor. (2.17) If the limit does not exist, then dis undefined. Theoretically, as € -7 0 and N(€) -7 oo, a log-log plot of logN(€) versus log(~) is a straight line whose slope is the Hausdorff dimension d. The Hausdorff dimension measures the "intertwined" and "wrapped around" structure of the fractal attractor [4].
The calculation of the Hausdorff dimension d by numerically partitioning the phase-space with volume elements and counting the number of points in each volume element is exceedingly hard, cumbersome and impractical for higher dimensional systems [46,47). In the next section, we will discuss the correlation dimension D 2 introduced by Grassberger and Procacda in 1983 [44]. D 2 is also a useful measure of the intricate structure of the strange attractor and its calculation yields an estimate of the Hausdorff dimension d. However, in contrast to the Hausdorff dimension d, the correlation dimension D 2 is relativdy easy to estimate.

Correlation dimension, D 2
If the chaotic attractor is analyzed from a probabilistic context, we can define the probability of finding a point on the attractor in a certain volume element. Assume the trajectory of a dynamical system consists of sequence of M-points. If the points are partitioned in phase space by volume elements, then the probability Pi of finding a point of the attractor in the volume element number i, (i = 1, 2, 3, ... , N(t=)), is given by where Mi is the number of points of the attractor in a particular volume element number i. A graph of this probability Pi of the ith volume element as a function of i gives the probability distribution for the attractor. In many chaotic systems, the probabilities Pi do not depend on any particular starting point on the attractor as long as the trajectory of the chaoti'.:: system is allowed to evolve for a long time.
The limit of the p/s as i tends to infinity is called the invariant density or invariant distribution that was discussed earlier in section 2.3.4.
Grassberger and Procaccia (GP) [44] in 1983 introduced a simple and efficient means of estimating the correlation dimension D 2 from experimental time series.
The GP method of estimating correlation dimension D 2 is related to the spatial correlation between different points on the attractor. In the next section, we will N(E) relate (1) the probability, LP;, that two points on ~he attractor lie within a volume i=l element of diameter fd to (2) the spatial correlation between points on the chaotic attractor. ·-....

XM= XM YM .•. ZM
then the D-dimensional vectors X 1 , X 2 , •.. , XM are now vector points on the attractor of a dynamical system in the phase-space. These vector points are spatially correlated since they lie on the same attractor in phase-space [2].

N(E)
The spatial correlation of a time series can be measured by the expression L p~ This is the probability that two vector points X; and Xj, i # j, on the attractor are separated by a distance smaller than f.. In functional form, this can be written as approximately 1 lim M 2 {number of pairs, X; and Xi, whose distance II X; -Xi II is less than f.} M-+oo where 8 is the Heaviside function 6 , f. is a distance measure limit between vectors and II . 11 is a normed distance measu~·e between two vectors. This corresponds to the correlation integral C2(E) [40,2]. We emphasize that this result is rigorously true only when both Mand f.-1 approach infinity. For a full discussion of this derivation, see Grassberger et al [44] and Ding et al [48). 6

{1 iff>O
The Heaviside or unit step function is defined as 0(f) = 0 if f < 0

32
When the distance between any vector point with itself, i.e. the identical pair x, = Xj, is discarded in order to avoid statistical error [49], the correlation integral is given as The correlation integral C 2 (E) counts the number of point pair (X,, Xi) on the attractor whose distance is smaller than f. llX 1 -Xiii is sometimes referred to as the inter-point distance between two vector points X, and Xi on the attractor. There are three norm metrics commonly used in calculating the correlation integral C 2 (E). They include: the Manhattan or L 1 norm

24)
i and the max or L 00 norm (2.25) [50]. In this dissertation, the max or L 00 norm is used in the chaos analysis of unvoiced speech so as to reduce computational complexity. However, it has been shown historically that the choice of the norm metric does not severely affect the results of the estimated correlation dimension [50,51].
The correlation integral can be used to determine the correlation dimension. The correlation integral C 2 (E) has been shown by Grassberger and Procaccia [44] to be proportional to ED 2 , i.e., or equivalently, the correlation dimension In the next section, we review a generalization of correlation integral C 2 ( f) and correlation dimension D 2 to a hierarchy of generalized correlations and dimensions, / C 9 (f) and D 9 respectively.

7 Generalized Dimension, Dq
The expression for the correlation dimension D 2 in (2.28) can now be generalized to a hierarchy of generalized dimensions indexed by the parameter q, or where 1 1 C 9 (f) is the generalized correlation integral; for q = 2, it is equal to the correlation integral in equation (2.22). When q > 0, the ith volume elements with large probability Pi have a greater influence in the determination of D 9 in equation (2.29) whereas when q < 0, those volume elements with small Pi are accentuated in their contribution towards D 9 • Now by substituting (2.31) into (2.30), the generalized dimensions, D 9 , can be written as [3,52,53) (2.32) In contrast to the correlation dimension D 2 , the generalized qth dimension D 9 determines the various fractal regimes present in the attractor when the attractor is non-uniform. The generalized dimensions Dq in equation (2.32) are the order-q Renyi generalized dimensions (GD) which were formulated in the context of chaotic dynamics by Hentschel and Procaccia in 1983 [54]. Dq is a measures of the inhomogeneity or multi-fractality of an attractor whenever the Dq values are not equal for all q. However, when the Dq are the same, the distribution of vector points on the attractor is fairly homogeneous. A chaotic attractor with non-constant values of the generalized dimension is called a multifractal.

7.1 Box-counting Dimension
Note that when q = 0 in equation (2.29)

7.2 Information Dimension
We can also take the limit as q goes tc1 1 in equation (2.29) to define By applying L'Hospital's rule to equation (2.29), one can obtain - [3] N(t} L Pi log pi Due to the form Pi log pi in equation (2.35), which has been used as a measure of information capacity in information theory [55], D 1 is aptly referred to as the information dimension.
Another property of the generalized dimensions Dq is that they decrease with increasing q. In general, it has been shown in [54] that Thus, for example, D2 provides a lower bound for D 1 , D 1 provides a lower bound for Do and_ so on.
In the next section, we systematically describe . how the generalized dimensions can be estimated given an experimental time series. These steps are very crucial since in most experimental settings, it may be too expensive or almost impossible to obtain all the D-dimensional components of the experimental data.

Taken's Embedding Theorem
Our analysis of the generalized dimensi0ns up to this point is based on the assumption that we have all of the actual D-dimensional components in equations (2.19-2.20) in the manifold of the chaotic system. However, it is possible to analyze from a chaos point of view a scalar or !-dimensional component of this signal. In general, the chaotic properties of any dynamical system can be reconstructed from a single component by using Takens embedding theorem [56]. If the embedding space is where xi is the state of the system ccrresponding to the discrete time index i. xi can now be regarded as an embedded vector point in phase-space. Each Xi is given by where T is a chosen time delay, dE is the embedding dimension and ME = N -dE + 1 is the total number of the embedded vectors. ME is equivalent to M in equations 36 (2.31) and (2.32). The generalized correlation integral is now given as (2.39) where the additional subscript dE is used to denote the level of embedding used in equation (2.38).
In theory, for noise free data, the particular choice of the delay time T plays no role in the reconstruction of the attractor. However, in the absence of noise free data, which is always the case in experimental settings, it is important to optimize the time delay parameter T and then find a practical embedding dimension dE smaller than D which has been proven to be sufficient for proper embedding (56]. Information theory has been of practical value in finding the optimum T (57]. An optimum time delay often chosen, due to Fraser and Swinney, is the first minimum of the mutual information measure [57]. Often, in chaotic time series analysis of experimental data, only one component of the D-dimensional system described in equation (2.19) is available. This is particularly crucial in speech production where any attempt to insert transducers in the vocal cavity for the purpose of recording more than one single component of speech will alter or impair the speech produced [58,26]. For most speech analysis, the single component that is available, i.e., that can be measured, is the acoustic pressure waveform recorded by a microphone in front of the mouth. Therefore, in order to have a chaotic analysis of speech or other experimental data, a method of reconstructing the multi-dimensional dynamics of the system from a scalar or !-dimensional representation is very crucial.
The chaos analysis of speech in this dissertation is performed on experimental data which requires reconstructing the nonlinear dynamics from a single representation or component of a speech signal. Therefore, we shall henceforth parameterize all expressions for estimating the generalized correlation integrals of speech signals as well as the generalized dimensions by using the optimum Taken's [56] embedding dimension dE. In the next section, an important method of determining the proper delay time needed for the reconstruction of experimental data is explained. 37

Mutual Information
If we consider a time series u(t) with its delayed version u(t+T), Fraser and Swinney [57] demonstrated that the optimum delay time T opt corresponds to the delay equal to the occurrence of the first minimum of the mutual information between the time series and its delayed version. If we assume that the time series can be sampled, then we can treat the time series and its delayed version as discrete random variables U and V.
Mutual information is a measure of the amount of information about a random variable V given by the random variable U. If we now define discrete probabilities and then associated with these probabilities are a measur~ of uncertainties Hu(€) and Hv(l) where Similarly, we can define joint probabilities when the discrete time sequence is embedded using (2.38) with dE = 2. i.a., The corresponding joint uncertainty fl uv ( €) between the discrete random variables U and V is then defined as where the sum in equation (2.45) may now be interpreted as the sum taken over all the non-empty 2-dimensional volume elements of width € 2 used in covering the discrete time sequence in the embedded space.
The mutual information between the two random variables is now defined as

I(U, V) = lim Hu(€)+ Hv(€) -Huv(€)
E-tO (2.46) in the limit as € get smaller and smaller. This mutual information gives the average amount of information about the delayed version V ,...., u(t + r) when U ,...., u(t) is observed. In essence, this method can be used to calculate the uncertainties for a series of possible delay times.
We Other methods for determining the optimum delay time have been proposed; they are variations of the one discussed above. In many cases, these methods give comparable results and are only attractive because some are easier to implement than others. The methods include: the variations of the Fraser-Swinney algorithm by Liebert-Schuster [59) and . In this dissertation, however, we use the Fraser-Swinney method to determine the optimum delay time r because we found that the method gives accurate results for conventional attractors.

Numerical Implementation of the Generalized Correlation Integrals GCI
A straightforward implementation of the generalized correlation integral in equation (2.39) for each q is extremely time consuming since it involves counting all the pairs of vector points (Xi, Xi) on the attractor. This is a tedious process which may be on the order of O(M 2 ), where M, which is usually very high, is the number of reconstructed vector points of the attractor in phase-space. However, it has been noted in [60,49] that for large widths, f, a small fraction of the pairs (Xi, Xi) yields sufficient statistics, so it is not absolutely necessary to compute all M 2 pairs. This observation leads to a reduction in the search over all neighbors of a vector point.
The search is now limited to a few Maeed subset of pairs which are either randomly or uniformly selected from M. Algorithmically, the expression in equation (2.39) can now be rewritten as versus log € for finite data, contains four distinct regions [5,49]. In this section, we describe each region and their significance in dimension estimation. In region I, the reference distance, €, is too small compared to the average inter-point spacing of vector points on the attractor. Therefore, there is no information about the attractor in this region and Cq,dE ( €) as well as the local slopes are zero 7 . In region II, which is also called the noise region [49], the local slopes are erratic because noise in the data is smeared out in the phase-space, i.e., the noise is not confined to a fractal self-similar attractor. The noise level in the experimental data can even be estimated since the local slope in this region increases and reaches the noise level at the value of the embedding dimension [5]. Region III, is the "linear region" of the generalized correlation integral and it is marked by a plateau-like structure in the local slopes plot [5,49,48,1]. In this scaling range, the true fractal nature of the underlying attractor is revealed in tht sense that the local slopes are constant for increasing embedding dimension dE. The average of the local slopes as a function of the embedding dimension dE at this plateau are the estimate of the generalized 7 In order to avoid estimating the log (0) which is mathematically undefined for this region, a method which scales the inter-point distance between vector points on the attractor such that the minimum inter-point space £min = 1 and the maximum inter-point spacing Ema:z: = 10000 is adopted. In essence, £ in equation (2.49) can only be integer values between 1 and 10,000. A variation of this method of converting the inter-point spacing into integers may be found in [47]. Al~o in A~pendix A, a completely new derivation of the numerical computation of log Cq,d 8 (£) which av01ds taking the logarithm of zero is provided. of log€ showing the distinct regions that are typical for chaotic attractors (adopted from [5]).
dimension of the attractor. In region IV, we experience the opposite of the lack of inter-point distance of region I. However, in this case, the reference distance is now too large compared with the inter-point distance between vector points on the attractor. As such, the plateau-like structure in region III is lost due to finite data size.
The "plateau" technique above is used for estimating all the generalized dimensions Dq· When q < 0, the duration of the plateau-like structure in region III is reduced to almost a point convergence whereas the duration is increased when Fig. 2.14(a) shows a typical plot of the correlation integral

Other GCI-GD Algo:rithms
In the last five years, a number of algorithms for efficiently calculating generalized correlation integrals and estimating generalized dimensions Dq have been developed [61,47,60]. Most of these methods suitably structure the vector points on the attractor so that optimal neighbor-search algorithm such as K-tree [62], minimal spanning tree [61] and efficient box-counting method [47] may be used to count the inter-point distances between a vector point on the attractor and its neighbors. In some cases, automated procedures fm extracting the generalized dimensions have also been proposed [60,63].
In the next section, we discuss one 0f the major quantitative measures of chaotic behavior. The existence of a positive Lyapunov exponent qualifies any dynamical  the resulting estimated correlation dimension D 2 in the plateau region between 6 < log t < 7.5 is 2.06.
system as being chaotic.

Lyapunov Exponents
While the generalized dimensions D 9 in equation (2.30) characterize the spatial distribution of points on an attractor in phase-space, the Lyapunov Exponents (LE) describe the dynamics of the temporal evolution of the trajectories of chaotic systems [6,64]. The notion of exponential divergence or convergence of two closely spaced orbits of the chaotic system is made formal in the definition of the Lyapunov exponent.
Let two closely spaced trajectories on an attractor start with an initial separation dd(O). Then, the trajectories diverge (or converge) so that their separation after a time t, denoted by dd(t) is given by The parameter >.is the Lyapunov exponent. From equation (2.50), we see that the two orbits will diverge when >. is positive and converge when >. is negative. This exponential separation is a function of the slope of the nonlinear mapping function (such as g in equation (2.14)) that generated the initially closely spaced trajectory points that are now being exponentially separated as a function of time on the attractor.
The average exponential separatiori is given as A more practical formula for the average Lyapunov exponent is where N is the number of samples in t:oe discrete time series. (2.51) In general, for a D-dimensional chaotic system, there will be D LE's governing the behavior of the system. The set of these Lyapunov exponents >.i, i = 1, 2, ... , D, constitutes the Lyapunov spectrum. Lyapunov exponents are usually ordered in a 47 decreasing fashion: A1 ~ A2 ~ A3 ... ~ An. Any dynamical system containing at least one positive LE is defined as being chaotic. For the system to be well behaved in steady state, it must have one LE that is strictly negative. Positive LE are a hallmark of chaotic behavior indicating a strong instability within the attractor [3].
The presence of one positive Lyapunov exponent indicates simple chaos whereas the presence of more than one positive Lyapunov exponent indicates hyper chaos [3].
In the next section, we provide an example of Lyapunov exponents calculation for the Logistic Map. The result will show that, indeed, the Logistic map is chaotic when r = 4.

Logistic Map LE
The Lyapunov exponent for the logistic equation is a function of the tuning parameter Recall that for the case of r = 4, the iterate will be generated from (2.53) Therefore, the functional map, g(x) = 4x(l -x), has the derivative where p(x) is defined in equation (2.15). If we make the substitution x = sin 2 (7r;), we obtain [1]  Since the value of the Lyapunov expon-:!nt is positive, i.e., log 2 > 0, the Logistic map at r = 4 is indeed chaotic [1).
An algorithm to compute only the maximum Lyapunov exponent was developed by Wolf in 1983 [6). Since then, othtr algorithms that estimate all the Lyapunov exponents have been developed. New algorithms that greatly reduce computational complexity have been proposed [65,66,67,68,69,70). In this dissertation, we implement the Wolf algorithm 8 to prove that unvoiced speech possesses positive Lyapunov exponents and, hence, is chaotic. For a complete characterization of unvoiced speech in terms of all Lyapunov exponents, the reader is referred to Narayanan et al [14).
A brief summary of the Wolf algorithm for estimating the largest Lyapunov exponent from experimental data is as follows:

Wolf Algorithm [6]
Given a discrete time sequence x[n], where n = 1, 2, ... N, one can reconstruct the trajectory using Takens embedding theorem of section 2.8, provided one has (2.59) where each Xi is an embedded vector point in phase-space defined in equation (2.38).
To estimate the largest Lyapunov exponent, the Wolf algorithm monitors the longterm evolution of a single pair of nearby orbits. where L is the maximum initial separation between a vector point on the fudicial trajectory and its N neighbors, f. is the radius of hyper-sphere in IRd 8 , and dE is the embedding dimension. If we define L(l) as the initial separation between X 1 and any neighbor within f. radius, then after a certain time t 1 , L(l) will have evolved to L' (2) which is a distance from X 2 larger than f.. We then replace this non-fudicial point with a point closer to the fudicial point X 2 • The replacement point which is at a distance L(2) from X 2 is chosen such that it is closer to the nearest fudicial point X 2 and the angular separation~ between L'(2) and L(2) is small. Again, after a certain time t 2 , this new separation L(2) between X 2 and any neighbor will have evolve to L' (3) which is the distance between the fudicial point X 3 and the new non-fudicial point. If the distance L' (3) is less than f. as shown in Fig. 2.16, the non-fudicial point is not replaced. This point is now re-named L(3) and allowed to evolve at the next time instance. If, however, the distance L' (3) is larger than f. , the point is replaced as defined earlier. The evolution and replacement procedure is repeated until all the embedded vector points along the fudi~ial trajectory have been exhausted.
The largest Lyapunov exponent is then estimated by Amax= ~log L(l -l), (2.61) where N is the total number of replac~ment steps required in a certain time series.
The important point of this algorithm is that, through a simple replacement procedure that attempts to preserve orientation and minimize the size of the replacement vectors, it is possible to monitor the long-term behavior of a single principal axis vector.

Entropies
Entropy is another important measure in chaos analysis; it is similar to generalized dimensions in the way it is estimated in practice. In information theory [71,55), the Shannon entropy S is proportional to the uncertainty H in equation (2.42) earlier used in the derivation of the mutual iniormation between two random variables. The Shannon entropy can be expressed as where Pi is the probability of finding the system in the ith state [72]. S is a measure of the amount of information required on average to describe the system.
The Kolmogorov Entropy K is a measure by which chaotic motion can be characterized in phase-space [1]. When related to Shannon's entropy S, the Kolmogorov Entropy K is proportional to the rate at which information about a dynamical system is lost in the course of time. For a chaotic system, K is equivalent to S when the probability Pi is related to the likelihood of finding a vector point of the attractor in a certain volume element in phase space as defined in equation (2.18 When K equals zero, we have an ordered system whereas when K is infinite, we have a random system. When K is a finite constant that is not equal to zero, we have a chaotic deterministic system [7].
In the next section, we define another entropy measure K 2 which is close to K, but easier to estimate because it can be formulated in terms of the correlation integral of section 2.6.1.

2.11.l Entropy K2
In 1983; Grassberger and Procaccia [7] defined a new quantity K 2 which has the properties that it is less than the Kolmogorov Entropy K for a chaotic system, infinite for a random system, but finite for a chaotic system, i.e., Further, K 2 > 0 is a sufficient condition for chaos [7].
If we now define a generalization of K to the set of order-q Renyi entropies (73] as then with q = 2 and dE = 1, the joint probabilities p(Xi from which it is fairly straightforward to extract K 2 . The most important property of K2, however, is that it can be written in terms of the spatial correlation C 2 (c) between vector points on the attractoi·, as defined earlier in equation (2.22) in the calculation of the generalized correlation integrals. Therefore, an algorithm used for estimating the correlation integral and dimension will also yield the entropy K 2 with minor modifications. In the next section, we discuss a method of estimating K 2 from the correlation integrals at different embedding dimension.

Numerical Implementation of K 2
If we plot log C 2 ,dE ( f) in equation (   The received acoustic signals are converted into neural impulses that are transported to the brain. This suggests that a lot of inherent signal processing is involved in speech communication. Speech communication in humans can be intrinsically described as discrete in nature by a concatenation from a finite set of symbols [77). Every sound can be classified in symbolic form by phonemes. Each language has its own peculiar set of phonemes, usually numbering between 30 to 50. The English language, for example, can be represented by a set of about 42 phonemes.
In the next section, we will examine the different classifications of speech and their mode of production. The classifications will highlight the direction of our work and the class of speech we intend to focus upon in our analysis.

Classification of Speech
The different sounds produced by humans are classified as voiced, unvoiced, and mixed voiced or unvoiced plosives. In the subsequent three subsections, a presentation of these different classifications of speech with particular attention to air flow in the specific organ of the human body involved in speech production is given.

3.2.l Voiced Speech
Voiced sounds, which normally include consonants and vowels such as "e" (the sound of the underlined letter in ~ven), are generally produced when air from the lungs is forced through the trachea in Fig. 3.1. The air is then modulated by the vibration of the vocal folds (also referred to as the vocal cords) located in the larynx. The vocal folds are two folds of tissue stretched across the opening in the larynx in Fig. 3.2 such that the front end of the folds are joined to the thyroid cartilage and the rear end to the arytenoid cartilages. Under muscular control, the arytenoids can move far apart, leaving openings in the vocal fold for activities such as breathing (77]. They can also be tightly closed together when one holds one's breath or during swallowing. It is also possible to hold the arytenoids, such that the vocal folds are almost touching. If air is now forced through this small openings, the vocal folds will start to vibrate, resulting in a modulation of the air. The rate of the vocal fold vibration is commonly termed the pitch or fundamental frequency, F 0 [77].

Unvoiced Speech
Unvoiced sounds, which normally include sustained consonants, are the result of air turbulence that is produced when air is forced through a constriction in the vocal tract at high velocity [77]. Such constrictions may be formed in the region of the larynx, as in the case of the "h" sound (8] or at many other vocal tract locations, such as the roof of the mouth, between the different parts of the tongue, or between the teeth and lips. Sustained consonant sounds that are produced due to air turbulence excitation, such as "s, f, z", are called fricatives [77].
When fricatives are being produced, the air becomes turbulent at a critical Reynolds number 1 , Recritical. The square of the Reynolds number can be expressed as where Uc is the volume velocity at the constriction, ac is the area of constriction, pis the fluid density andµ is the viscosity [78]. For air flow in tubes with rough surfaces, ReCf"itical values are around 2000 [79]. Typical flow rates during fricative production are in the range of 200-500 cm 3 per second and the supraglottal constriction areas, i.e., the trachea, bronchi, lobes of the lung and diaphragm range between 0.075-0.4 cm2. The flow rate and area data in the average human suggests that Re values for speech production ranges from the number 2700 to 12625 [80] .

Mixed Voiced Speech
Mixed voiced and unvoiced plosives, such as "b", "v", are produced primarily from the build-up of pressure that occurs when the vocal tract is closed at some point for

Speech Modeling
The use of speech offers many great advantages in the areas of man-machine interactions, such as telephone, communications and artificial intelligence. However, human speech production processes are so complicated, as alluded to earlier in section 3.2, that it is almost impossible to fully mechanically emulate human speech. Therefore, an ideal man-machine interaction cannot be fully realized. In order to advance man-machine communications, speech models have been developed [77]. Applications of digital signal processing techniques and processors have accelerated speech model developments. The digital representation of speech signals (in terms of speech models) in communication systems offers many advantages in speech data compression, quality, reliability and intelligibility in communication. Consequently, many mathematical models have been proposed to represent the speech production system [78].
Speech production models and their analysis can be generally classified into three main categories. They include: • Linear Models: This is the linear representation of speech in terms of a sourcesystem model. The vocal tract is approximated by a straight tube of varying cross-sectional areas. The linearity of the model is inherent in the assumption that the vocal tract can be modeled over a short time interval by a linear system consisting of an input signal, a transfer function and an output signal.
Over the years, many variations of the source-system model have been proposed. They include the source-system model introduced by Fant [82,83], the fractal-excited speech coders by Maragos [84], the linear combination of harmonic sinewaves of Kumaresan et al [85], and many more [86]. • Nonlinear Dynamical Models: These models analyze and/or describe speech in terms of nonlinear dynamical system (chaos). They include the dynamical approach to speech processing by Tishby [27], analysis of vocal disorder with chaos [17], chaos analysis of fricative consonants [14] and multifractal descrip- We shall now discuss some of these existing speech modeling techniques in relation to their relative advantages, disadvantages, and limitations. We will also introduce the dynamical systems approach to speech processing.

Source-System Model
In this section, we describe the source-system model. We examine each section of the block diagram of Fig. 3.3 to highlight various digital processing algorithms that have been developed to represent different parts and functions of the human speech production mechanism. We discuss the relative advantages as well as problems of the source-system model. We also examine some other models that were later proposed to avert some of the problems encountered in the source-system model.
The block diagram describing the source-system model of a speech production system is shown in LPC is an analysis by synthesis technique. In the analysis stage, the predictor coefficients are extracted from the speech signal. These coefficients are then used to synthesize speech. In most cases, the synthesized speech is close to the original speech signal from which the predictor coefficients were extracted.
The recursive model used in the source-system model assumes that the vocal tract can be modeled over a short time interval by a linear system described by the transfer function Gh Source-system models have been used most extensively for speech analysis and synthesis because of the model's mathematically elegant linear representation of a rather complex natural system. LPC is generally regarded as capable of producing synthetic speech of high quality. Yet, in close listening, it has been found that with LPC, speech loses some of its naturalness during the analysis and synthesis procedures. This occurs even when the speech from which the predictor coefficients are obtained is of high quality [88]. The performance of LPC for speech synthesis becomes poor when some parts of the modeled speech signal contain unvoiced segments or when a speech segment was produced from the nasal cavity [89]. Most importantly, the performance of the LPC is also closely related to some of the supporting algorithms that are needed in conjunction with the LPC. For example, the periodic pulses that excite the vocal tract transfer function in Fig. 3.3 are spaced at one pitch period apart. This implies that in the voiced section of a speech signal, a pitch period extraction algorithm correctly identifies the location as well as the correct distance of separation between the pulses. This can become very tricky and could involve synchronous or asynchronous speech synthesis where the train of pulses are specially placed to excite the voice speech segment. Also, a voicing determination algorithm (VDA) (which could range from a simple thresholding analysis to more complex algorithms based on pattern recognition) [90] is essential for LPC speech synthesis. A voicing determination algorithm VDA is primarily used to segment the speech signal into sections that may require either voice excitation (impulse train) or unvoiced excitation (broadband noise) during LPC synthesis. Therefore, any degradation in the performance of the pitch extraction or voicing determination algorithms ultimately affects the overall performance of the LPC.
In order to avoid some of the problems of LPC highlighted above, many improvements have been developed and new algorithms are being proposed. There have been several attempts to improve the naturalness of LPC synthesis. They include: • The algorithm of Rosenberg et al [89] attributed some of the inadequacies of the LPC method to the high peak factor of the synthetic speech waveform and proposed a time-varying non-impulse source for exciting the LPC to produce voiced sounds. The non-impulse source reduces the peak factor of the signal and at the same time introduces a certain amount of controlled irregularity in the excitation of the voiced speech.
• The recently introduced method of Evangelista [91] proposes replacing the train of pulses in Fig. 3.3 with waveforms generated from pseudo-periodic )-r processes. These waveforms have approximate )-r spectral behavior around the harmonics of a certain assumed fundamental frequency F 0 . The generated, selfsimilar, pseudo-periodic signals have fluctuations or low level amplitude signals between each approximate pitch period which is typical of natural sound [91].

Canonical Autoregressive Model
In order to avoid all the rigorous tasks of pitch period determination, speech segmentation and speech synchronization explained in section 3.3.1, Kay and Nagesh [82] proposed to model speech (voiced and unvoiced) as the sum of sinusoids plus an AR process, i.e.,

Nonlinear Models Fluid Flow Model
In the discussion that follows, the fluid flow approach of Teager and Kaiser [58,87] which addresses the anomalies of the source-system model will be examined. The discussion centers on the inability of the source-system to fully explain other types of sounds that human beings are capable of producing. Examples of the anomalies of the source-system model are given and a model that is capable of resolving the anomalies of the source-system model is proposed.
Teager et al [58,76,87] in their work over the years have discarded the simplified source-system description of speech production. They contend that with our vocal tract, human beings are capable of producing a variety of sounds such as clicks, whistles, snores, barnyard sounds and other noises that do not fit the source-system description, because the source of these sounds are not glottal [58]. A major part of their disregard for the source-system model comes from the model's inability to explain some of the nonlinear interactions between the walls of the vocal cavity.
These nonlinearities, which are largely responsible for other sounds disregarded in the source-system model, are obviously more subtly difficult to physiologically observe or record since in electronic speech analysis, any attempt to insert transducers into the vocal cavity for the purpose of recording speech will alter or impair the speech production.
An example of some the anomalies that the general source-system model cannot explain is the Myna bird which has the ability to mimic repeated noises as well as to produce long passages of speech in a voice that is almost indistinguishable from that of its trainer. Yet, the bird has an anatomically short cochlea that is not suitable for frequency selection. The bird also lacks the conventional vocal tract as described in humans. Teager and Teager [58), also question the linear speech model with respect to: • The inability of humans to produce sound when the mouth and throat are parched or after eating a dry cracker.
• The change in the human voice due to illness or loss of teeth that has absolutely nothing substantial to do with the vocal tract.
In an attempt to provide an explan&.tion for the questions raised by their quest for the ultimate speech production model, Teager and Teager did not propose an overall model. An overall model would be incapable of representing all the phonemic subgroups encountered in human speech vvcalization at once just like the source-system model. Instead, they pose a phonemenological explanation that can be separately applied to each phoneme. This is because each sound is generated at various locations of the vocal cavity by taking advc..ntage of the local shapes, flow instability due to the movement of the larynx, tongue, and lips, as well as the mental state of the speaker.
The proposed Teager model is capable of providing a complete description of the speech production system. However, it is rather cumbersome to implement, since even a simple sentence in the English may be comprised of a fairly large number of phonemes.

Physiological Model
In contrast to the simplified LPC, where the properties of the recorded speech signal are being modeled, Schroeter and Sondhi proposed a model for speech that is dependent on the properties of the vocal apparatus i.e., the pressure in the lungs, tension in the vocal cord, the coupling of the nasal cavity, etc. They proposed a three-part model that separately considers the geometry of the vocal and nasal tracts, wave propagation in the tracts as well as the sound sources (vocal cord vibration and turbulent air flow) and their interactions with the tract. Based on X-ray measurements of the articulatory system, a vocal tract geometry that maps the positions of the elements of the articulatory system, i.e., tongue, lips, glottis, etc. was defined. The nasal tract was modeled by a modification of the method of Maede [93] in which a side cavity, to simulate the nasal sinuses, was attached to an acoustic tube representing the nasal cavity. This model was efficiently implemented in [94] by coupling a Helmholz resonator to the nasal tract (78]. With this model, the acoustic properties of the vocal and nasal tracts were computed under the assumptions that: • the acoustic wave propagation inside the vocal tract can be simulated as plane waves.
• the vocal tract is flexible; it can be straightened out and approximated by a variable-area tube.
The turbulent excitation of the vocal tract is handled by automatically injecting noise of correct amplitude into the appropriate place within the vocal tract.
This Schroeter/Sondhi model has the advantage of automatically adapting to sudden changes in pitch and glottal waveforms since it is derived from a physiological point of view. The extracted model parameters are also suitable for low-bit rate speech coding. The unfortunate part is that although this model discards the linear source-system model, linearity is inherently built into the model. This is due to the assumption that the acoustic wave tube (a model of the vocal tract) can be approximated by a straight variable-area tube in order to solve the otherwise nonlinear equations which would have resulted. Therefore, the assumption in this model contradicts the highly controversial argument that a complete description of 70 speech requires the solution of nonlinear, partial differential equations with varying boundary conditions [76,27,58).

Nonlinear Dynamical Models Teager Model
The evidence of nonlinearity in speech signals has been in debate for over thirty years.
The early models of the vocal tract by Ishizaka and Flanagan [95) have nonlinearities incorporated into them. Teager and Teager in their earlier work described the human voice as a "complex wind instrument that has nonlinear regenerative oscillators whose amplification methodology can be related to the interaction separated flows (along the vocal tract walls) with frequency selective cavities" [58). This suggests that sound is generated throughout the vocal tract, from the lobes of the lungs to the lips, acting as nonlinear coupled system. Recently, evidence of nonlinearity in speech, in support of Teager's work, has been discussed by Maragos, Quatieri and Kaiser [84,18,96).

Dynamic Model
Speech as a nonlinear phenomena has been qualitatively described in terms of fluid flow and turbulence [58,76). In speech modeling literature, the word turbulence is often use in a qualitative sense to mean non-laminar flow. However, in dynam- ical modeling of what is technically defined as turbulence, the dimension of the chaos model is expected to be so high as to discourage such an approach. Yet, the turbulent velocity field of fluid flow has been extensively investigated to reveal a low-dimensional chaotic structure [15,23,97).
Hence, there has been considerable interest towards applying nonlinear dynamical system theory and principles to speech analysis and modeling since speech production involves the turbulent flow of air through the vocal tract. In fact, evidence of lowdimensioned chaos such as strange attractors, positive Lyapunov exponents, and bifurcations, have been found in newborn infant cries [20). Voice fluctuation and vocal tract instabilities have also been modeled with chaos [16,23).
Dynamical speech modeling and analysis can be broadly classified into two groups. 71 They include the following: • global or local approximation of the chaotic attractors of speech signals [27,98].
This, in most cases, ends up being a neural network-like model with chaotic input parameters.
• models or analysis techniques that are only well suited to specific types of speech, i.e., fricatives [14].
We will now introduce some of these nonlinear dynamical systems approach to speech modeling and analysis.

Neural network model
Tishby [27] suggested the possibility of modeling speech as an output of a chaotic

Other Models
Other studies have used nonlinear detE:rministic techniques in the analysis and characterization of the vocal fold vibration in the normal and pathological voiced speech [21,23,16]. Many more studies have also performed chaos analysis on special speech sounds such as fricatives [14,16,19].

Introduction
In chapter 2, we reviewed how the correlation dimension D 2 was generalized to an infinite hierarchy of generalized dimensions, D 9 [100]. Any chaotic attractor with nonconstant, monotonic decreasing generalized dimensions, i.e. D 9 ~ Dq' for q < q' is amultifractal [101,102,61,11,3,1]. In this chapter, we examine an alternative characterization of multifractal behavior in chaotic dynamical systems called the singularity spectrum f(a) [103,101]. We use a generalization of the Koch curve [104] to introduce various concepts and terms related to the scaling behavior of multifractals.
Finally, we shall discuss some of the problems encountered in evaluating the f(a) singularity spectrum from the generalized dimensions D 9 based upon the fact that both /(a) and D 9 are related by a Legendre transformation [100].
The generalized dimensions D 9 , -oo < q < oo, characterize the different fractal regimes present in a chaotic attractor [54].

Singularity Spectrum f(a)
The 3), f(a) is sometimes referred to as the fractal dimension of volume elements with scaling exponent a. 76 In the next section, we will expand on the geometric distribution of points on chaotic a. ttractors by using the Koch curve to describe multifractal scaling properties.
It is important to note that the f (a) singularity spectrum, albeit explained with the aid of the Koch curve, is not unique to this curve. In fact, the singularity spectrum method of characterizing strange attractors has been used extensively with experimental data. Examples include the study of turbulent energy dissipation in fluid flow by Meneveau et al. [101], the hydrological characterization of river basins by Veneziano et al. [105], and zooplankton biomass variability by Pascual et al. [11].

Multifractal characterization of the Koch curve
In this section, we introduce the concept of multifractals in terms of the scaling exponent a and singularity spectrum f(a). We examine how multifractality is produced with the aid of the triadic Koch curve, and discuss the relationship between the f (a) singularity spectrum and and the generalized dimensions Dq that were reviewed in chapter 2.
Consider the uniform triadic Koch curve (104] whose ends are at 0 and 1, i.e.,  The vertical-axis corresponds to the scaled pdf of the length segments and therefore the area below the density curve provides the probability in any subinterval (adopted from (10]). The pdf has been scaled to clearly show the resulting intermittency as the non-uniforn triadic Koch curve is iterated. 79 k increases.
Recall that associated with the line segment at iteration k = 0 is a uniform pdf.
Also, if we now associate the same uniform pdf with each new line segment generated during the k iterations, then all the line segments at the kth iteration will be identical in length, and, therefore, we still have a uniform pdf after each iteration. Notice that the pdfs associated with the iterations k = 1, 2 in Fig. 4.l(b) are all uniform. This is the uniform triadic Koch curve (104]. If we now invoke the expression in equation shows the non-uniform pdf associated with the generalized triadic Koch curve at the eighth iteration.
An example of the singularities just described is found in the invariant density of the Logistic map in Fig. 2.6. The singularities in the invariant density of the Logistic map occur at x = 0 and x = 1. The scaling exponents for the Logistic map are a = ~ and 1, respectively [3].
In the case of non-uniform triadic Koch curve, the probability Pik(E) of the number of points on the attractor that falls in volume element i after the kth iteration is related to the pdf assigned to the different length segments by PlP~-i and Ek = ( ~) k [106]. If we now substitute Pik(E) and ck into equation (4.2), one obtains [1] . . (l)(cr;)k where aik is the scaling exponent for the ith volume element at the kth iteration. Also, the number of volume elements with probability Pik(f.) after k iterations is 3) and use Stirling's approximation (log k= k log k-k) [107] in evaluating the factorial terms, one obtains A plot of aik versus f(aik) for a large number of iterations k will produce the f(a) singularity spectrum for the Koch curve. A plot of a versus f(a) in equations (4.6) and ( 4. 7) with p 1 = 0.35 and k = 200 is shown in Fig. 4.4.
In the next section, we will discuss some of the characteristics of the singularity spectrum f (a) using the plot of a versus f (a) for speech data. Notice the intermittent and self-similar structure (adopted from (10]). Also notice the points in the pdf that tend to increase beyond bound while others shrink towards zero. The pdf has been purposely scaled to show the intermittent nature of the resulting pdf.

Characteristics of f (a)
The plot in Fig. 4 In the next section, we consider the relationship between the f(a) singularity spectrum and the rest of the generalized dimensions Dq. We also describe how the generalized dimension Dq values are estimated from real data. This will become important as we need to compute the transformation from Dq to f(a).

Relationship between J(a) and Dq
Covering the attractor of a multifractal chaotic system with volume elements of diameter€, can be used to evaluate the generalized dimensions Dq in equation (2.29)  as where and Pi is the probability of finding vector points on the attractor in the ith volume element. The sum in equation (4.9) is over all N(€) volume elements of diameter€ needed to cover the attractor.
If we now substitute equations ( 4.2) and ( 4. (4.10) The first factor, c f(o), in the integrand in equation ( 4.10) gives the number of volume elements with scaling index a whereas the second factor, tq 0 , is the qth power of the probability associated with the scaling index a.
In evaluating the integral in equation ( 4.10), we follow the argument in [ 100] that since t is supposed to be very small, the numerical value of the integrand is largest when the combined exponent oft, qa -f(a), is smallest. That is, for each q, there is a corresponding a, termed a•= a(q) for which (4.11) and [ d~' (q<> -!(<>)) l I.=·· > 0.  and (4.14) which reveals that the f (a) singularity spectrum curve must be convex.
If we now substitute the value a = a• = a(q) into equation (4.10) to evaluate the integral, then one obtains [100] ( 4.15) If we also substitute this Bq(t) into equation (4.8), one obtains [100] (ql)Dq = qa -f(a) (4.16) where the subscript * has been dropped. This change from the variables q and Dq on the left hand side of equation (4.16) to a and J(a) on the right hand side is an example of the Legendre transformation [1]. For a rigorous derivation of equation (4.16), see Benzi et al. [97], Meneveau et al. [101] and Halsey et al. [100].
The f(a) singularity spectrum and the generalized dimension Dq are theoretically smooth functions of a and q. The Dq in equation ( 4.8) are related to the singularity spectrum f(a) by the Legendre transformation [100,3] given from equation (4.16) in the following steps: Define Hence, generally, the number of computed samples of Dq, I qmax ;q qmin I, is kept small by making the range I qm= -qmin I small or by coarsely sampling Dq using large 6-q.
In the next section, we examine some of the problems of published procedures for evaluating the singularity spectrum f(a) given sampled generalized dimensions Dq.

Evaluating /(a) from Dq
In this section, we discuss the importance of evaluating the f(a) singularity spectrum for real systems. We also examine some of the problems of existing algorithms using the Legendre transform of Dq to compute J(a) in equation (4.19). We propose a new method that is based on digital filter design theory [109).
Based on the work of Grassberger et al. [44) and others [100), and the development of faster search algorithms mentioned in [47,61), it has become increasingly attractive to evaluate Dq from attractors arising from physical systems. Therefore, f(a) curves are usually determined by the Legendre transform [101) of equation (4.19). However, such an operation involves either finely sampling Dq using small l:::.q defined in the previous section or interpolating the Dq curve since, in most cases, the Dq are empirically calculated or coarsely sampled values along a smooth Dq curve. The Legendre transform also involves the evaluation of the derivative ~ in the expression for a(q) and f(a(q)) in equations (4.18) and (4.19). Accurately approximating the derivative and Legendre transform in equations ( 4.17) and ( 4.19) using a finite range of sampled value::; of Dq is non-trivial. It becomes especially troublesome when the Dq versus q curve has discontinuous derivatives, as can happen when there are "phase transitions': in the attractor [110,111). Phase transitions in a chaotic attractor introduce a small amplitude, high frequency content into the otherwise smooth, low frequency Dq curve. In order to illustrate some of the problems of the existing methods of computing f(a) from samples of Dq, we introduce the classical Cantor set in the next paragraph.

Cantor Set
The Cantor set starts with a unit interval as shown in Fig. 4.1 with k = 0. We make the same assumption that a uniform pdf is associated with the unit interval just as in section 4.3. In the first iteration, the unit interval is replaced with two intervals, €1 and €2, each of length k· Each of these interval is assigned equal fraction of the uniform pdf, i.e., p 1 = ~-At the next iteration, each of the two intervals from the previous iteration is rescaled. Each interval is replaced by another two intervals of 87 length ~ with equal fraction of the uniform pdf. At the next stage of the construction, the rescaling procedure is repeated on each of the newly created sub intervals. The process is repeated ad infinitum. If t 1 = 0.408 and t2 = t~ with p 1 = ! for each of the intervals, then the generalized dimensions Dq of the Cantor set has been derived in closed form [ 10]  The corresponding singularity spectrum f(a) plotted in Fig. 4.5 (b) was evaluated using a 5th order min-max differentiator 1 [109). Notice also that the maximum value 1 The differentiator was designed assuming passband cut-off frequency Fp = 0.10 and stopband   · · · · · · · · · · -~ · · · · · · · · · · · · r · · · · · · · · · · · 1 · · · · · · · · · · · ....

Analysis of existing methods of evaluating f(a) from Dq
In this dissertation, we shall concentrate on the second category of algorithms dis- One source of error is the use of linear interpolation methods. Linear interpolation corresponds to sending the D 9 data through a linear interpolation algorithm, or equivalently, an interpolation filter whose frequency response is a sine-squared type filter [115]. This can be a very poor approximation to the ideal lowpass filter needed for interpolation if the D 9 are not highly oversampled.
An ideal interpolation filter will pass the input signal spectra components whose frequency is between 0 and the cut-off frequency Fe while completely attenuating signal components in the range from Fe to ! where ! is the normalized frequency corresponding to the Nyquist frequency. Fe is aptly called the cut-off frequency. Fig.  4. 7 shows a schematic diagram of the difference between the ideal lowpass needed for interpolation and the approximation that is provided by a sine-squared type filter corresponding to linear interpolation. Notice that the sine-squared frequency response of linear interpolation is only a good approximation to that of the ideal interpolation filter when the cut-off Fe is very small. The sine-squared filter also has high lobes in the stopband region (Fa ~ F < !) which will not properly attenuate high amplitude signal components in that region.
A second source of error is the use of a first order difference equation to approximate differentiation in equation (4.1'/), :iiD 9 ~ Dq1t::..q -D(q'-I)t::..q· The centered first order difference operation on the D 9 is equivalent to a filtering operation that produces a poor sinusoidal approximation to the ideal jw frequency response of a differentiator. Fig. 4.8 also shows the distinction between the frequency response of the ideal jw differentiator and that of the approximation produced by the centered first order difference operation. In this case also, the sinusoidal approximation is only good at very low frequencies, or equivalently for generalized dimension sequences that are highly oversampled. At high frequencies, the sinusoidal approximation deviates significantly from the ideal differentiator.

Example -Effects of Coarse Sampling
In order to show the effects of coarse sampling on the performance of some of the existing algorithms, a test case was simulated. The known Dq samples of the Cantor set, shown in Fig. 4.5 (a), were downsampled or decimated by a factor of 4, that is, they were evaluated at only 19 relatively coarsely spaced (6.q = 4), values of q as shown in Fig. 4.9 (a), q = -36, -32, -28 . . . , 32, 36. The corresponding J(a) spectrum computed using the existing centered difference method of [11 , 101] on the coarsely sampled Dq is depicted in Fig. 4.9 (b). This is an invalid spectrum; it is not convex and it has more than one maximum point.    gives unambiguous estimates of D 9 at q = ±oo is desirable for real systems.
In the next section, we consider a signal processing method for evaluating f(cx.) from D 9 • We applied this method to the generalized dimensions of the Cantor set introduced in section 4.6 to show that it works well, even when some of the existing algorithms fail.

Improved Accuracy in the Singularity Spectrum
In this section, we propose an algorithm that more accurately estimates the J(cx.) singularity spectrum using only coarsely sampled D 9 • We interpolate between the coarsely sampled D 9 points and evaluate the necessary derivatives in equation (4.17- [109] to design the best min-max approximation to an ideal interpolator and differentiator, respectively. This is in sharp contrast to prior algorithms which either linearly interpolate coarsely sampled D 9 , using a sinc 2 interpolation filter or compute, at great expense, finely sampled D 9

4.19) by using the Parks-McClellan algorithm
using equation (2.29) [11]. These existing methods can be computationally intensive and may give spurious results.
In the next section, we discuss the min-max approximation to the ideal interpolator and differentiator. We will show examples of the frequency response of lowpass interpolation filters and differentiators designed using this method. We compare the frequency response of the interpolation filter and differentiation approximation with the ideal cases. Finally, we apply the min-max interpolator and differentiator to the coarsely sampled values of the D 9 of the Cantor set ideal "test case" to show that our proposed method works well when compared with existing methods.

Min-Max Digital Filter Design
The ideal frequency response H 1 (w) of a lowpass filter is shown by the dashed line in Fig. 4.7. It is impossible to realize an ideal lowpass filter in practice. This is because restricted to allow for a one-to-one transformation from Dq to f (a). 99 an ideal lowpass filter is non-causal and has an infinite duration impulse response.
For an indepth study of causality and stability in digital filter design, see Oppenheim et al. [116], Proakis et al. [109] and Jackson [117]. In digital filter design, a transition region is allowed so that there is no sharp discontinuity between the passband and stopband region as depicted in the schematic diagram for an ideal lowpass filter in Fig. 4  In the next section, we describe the design of a typical interpolation filter used in this dissertation. We also describe hov.-the various frequency bands are determined.

Interpolation Filter
Given any multifractal time series and coarsely sampled values of Dq estimated from equation (2.29) in the finite range q E [Qmin, Qmaz], we interpolate between these points by a factor L. This is accomplished by "stretching the signal", i.e. inserting L -1 zeros between each pair of given Dq samples [117]. The effect of "stretching" the given Dq samples is to create L -1 mirror images of the Fourier transform of the original Dq samples. To remove the mirror images, we pass the stretched signal spectrum through a lowpass filter whose passband edge is a function of L and the bandwidth of the original Fourier transform of the Dq· Assuming that the original spectrum of Fig. 4.11 (a) is bandlimited between 0 and F 0 in normalized frequency, the effect of "stretching" for L = 4 is the creation of mirror images of the original spectrum at Ef, F N and 3~N , where F N = ~ is the normalized Nyquist frequency, corresponding to JN = 2~q Hz. To remove the mirror images, we pass the stretched Dq spectrum through a lowpass filter whose passband edge is Fp = lf, whose transition bands between Fp and 2 FN 4 +Fo and whose stopband edge is between 2 FN 4 -Fo, and FN. Thus, zero insertion between Dq samples followed by lowpass filtering reduces to computing the convolution of the lowpass filter impulse response coefficients h[n] with the Dq curve with zeros inserted [117].
In particular, !::::.q = 4 and L = 4 for the coarsely sampled generalized dimensions     The frequency axis is normalized by ~~, for 6.q = 1. 6.q' = 4 6 q.  However, before discussing these measures of chaos, we start the next section with a description of some of the properties of the analyzed speech signals such as the speech recording method and the determination of the speech signal to noise ratio (SNR).

Speech Data
Jn this dissertation, the time series used in the multifractal analysis of speech consists of acoustic pressure waveforms of speech from the ISOLET (standard) database [25].
The ISOLET database consist of letters of the English alphabet spoken in isolation by American males and females. The database has 7800 spoken letters with two productions of each letter given by 150 speakers. Each speaker is identified by a character string specifying his/her gender, and initials followed by a number for uniqueness, i.e., "mbjtO" is the label of a sound pronounced by a male with initials "bjt". The number "O" is uniquely assigned to distinguish between speakers with the same initials.
Each subgroup consists of speech utterances · by 30 speakers. Each data file in the subgroups has a header which contains information about the file, e.g. sampling rate, number of samples, etc., followed by a series of 16 bit integers corresponding to uniformly spaced samples of the speech sound. The sampling rate for speech speech data in the ISOLET database is 16 kHz; equivalently, the sampling period T = 6.25 x 10-5 sec/sample. In the next section, we describe the data acquisition method for speech signals in the ISOLET databa.Se. We also describe a procedure for estimating the signal-tonoise ratio SNR.

Noise in speech signals
Due to errors from microphone movement and human agitation which are likely to occur in the acquisition of experimental speech data, the speech signals in the ISOLET database are assumed to be noisy. However, since the speech signals in the database were recorded under a controlled environment in a speech recognition laboratory, the noise level is very low. The data were recorded with a Sennheiser HMD 224 noise-cancelling microphone. The data acquisition was performed on an AT&T DSP 32 board interfaced to a Sun workstation [25].
The signal-to-noise ratio, SNR, of the speech data is defined as 10 log 10 ( a; ) <Tnv where a; is the signal power and <T~v is the noise power. It was estimated after subtracting the mean -signal value from each sample of the digitized speech signal so that the resulting speech signals have zero mean. The mean SNR for the ISOLET database is 31.5dB with a standard deviation of 5.6dB. This indicates a relatively noise free recording. The SNR for the ISOLET database was calculated as follows: Consider the \ "t"\ utterance from each speaker. The silence before the \ "t"\ burst is assumed to be relatively free of breath noise. Therefore, it reflects the typical The SNR value for the database is the average of the SNR's for all speakers. There is nothing special about the \ "t" \ utterance. Other speech utterances that would reflect the background noise such as \ "p" \ and \ "b" \ would have been appropriate.
In the next section, we highlight some of the questions and concerns that should be addressed in successful chaos analysis of experi~ental data. Satisfactory answers to these questions may be a measure of the reliability of the results provided by the multifractal analysis.

Concerns in chaos analysis of experimental data
In the chaos analysis of experimental data, there are some questions to be answered in order for one to be satisfied that the results of such an analysis are not erroneous and misleading. These questions may include: 114 • Is the data sequence length, N, long enough? Recall the mathematical definitions of dimension, entropy and Lyapunov exponent require theoretically infinite data length?. How long a data sequence N is enough?
• What is the noise level in the experimental data and how does this affect the estimated chaos parameters?
• How many dimensions should we use in the embedding space?
• Is the number of time series samples, N, used in estimating the correlation dimension D 2 also valid for estimating the generalized dimensions D q as I q I-+ oo?
Unfortunately, it is difficult to answer some of these questions definitively, partly because the theoretical foundations of the Taken's embedding technique in section 2.8 in relation to the estimation of generalized dimensions Dq in section 2.7 are not presently fully developed and partly because the answers to some of these questions depend to some extent on the experimental data being analyzed. Therefore, the relevant established criteria we will use as guidelines in the multifractal analysis of unvoiced speech signals are the choice of delay time [57], the optimum embedding dimension [120], the noise levels in chaotic signals [60] and the length of the data sequence required for chaotic analysis [49,50].
In the next section, we examine a few of the criteria listed above to show that the results obtained from the multifractal analysis of unvoiced speech signals are reliable since the sample size and signal-to-noise SNR of speech signals from the ISOLET database are within the tolerable range of values suggested to be used in literature [49].

Limitations of chaos analysis due to the properties of speech data
In order to answer some of the questions posed in the last section, we examine in this section the relationship between the length of the data record, N, and the validity of the estimation of the correlation dimension. We also discuss the effects of noise 115 levels on the generalized dimension D q using the method of generalized correlation integrals (GCI) in section 2.7.

Finite record length effect on Dq
In the chaos analysis of experimental data, if the length N of the time series is too small compared to the dimension of the time series, then the estimated value of the correlation dimension D 2 may be wrong [49]. Several authors have suggested different upper bounds on the correlation dimension D 2 that an attractor may have, given a limited amount of data [50,49,121,122]. In the multifractal analysis of unvoiced speech signals, we use the method of Kantz et al. [49] as a guide in the determination of the appropriate number of time samples needed for chaos analysis.
Consider the schematic diagram of local slopes versus log f in Fig. 2.13. At a point between regions II and III, the noise region ends and the scaling or plateau region begins. Let the value of f at this point of transition be called fmin· Assume that one needs about 100 pair of vector points [49] on the attractor; their inter-point distance is less than or equal to fmin· If we also assume that fmin is about 10% of the maximum possible inter-point distance on the attractor, then the correlation integral at fmin may be approximated as [49] (5.3) From equation (2.26), C 2 (t) , . . . . . , tD 2 , from which one can conclude that we need at least Nmin signal samples to be able to accurately estimate D 2 [49]. Combining equations (2.26) and ( [27,14], the minimum number of samples required for D 2 estimation in speech signals is between 400 and 30,000 samples. This implies that the number of data samples needed to estimate dimensions grows exponentially with the D 2 [122]. The number of data samples requirement may be trivial to some applications in 116 digital signal processing (DSP), but in the next paragraph we will discuss the futile effort of trying to increase the number of data samples Nin DSP applications with short data lengths.
Problems with increasing the sampling rate to increase the record length From a signal processing perspective, longer length data sets can be generated by increasing the sampling rate of the original time series, i.e., reducing the sampling time T or interpolating between the given samples of a time series. In nonlinear dynamics, however, increasing the number of samples in this way has generated no new information since no new phase space trajectories have been created. Therefore, the estimates of the spatial measures of chaos such as Lyapunov exponents, entropy and generalized dimensions D 9 , will remain essentially unchanged.
Alternatively, sustained fricative utterances may be used to generate considerably longer data sets [123]. This is the situation where the speaker tries to hold his/her breath as long as possible to sustain the pronounced fricative. It should be pointed out that sustained fricatives are produced by a relatively static vocal tract shape and do not involve any apparent laminar to turbulent flow transitions, such as those found in isolated or vocalic fricatives [14]. A better alternative which may be the object of future research, is to append as many isolated or vocalic fricatives of the same type from the same speaker together to generate a relatively longer time series.
In this dissertation, however, the multifractal analysis was performed on speech data with record length N ranging from 3500 to 11,000 points. Although these values of N fall within the range suggested by the Kantz criterion, we only use the estimates as a guide since it was designed for D 2 (i.e. D 9 for q = 2 only) and, to the best of our knowledge, there is no criterion relating the sample size N to the generalized dimensions D 9 as I q I-+ oo. We, therefore, estimate the generalized dimensions Dq of unvoiced speech signals with I q I$ 5 1 under the assumption that the upper bound on the minimum number of samples required for D 2 will also give accurate D 9 estimates for I q I$ 5. The range of q values in the estimated generalized dimension Dq of unvoiced speech will be examined to verify the validity of this assumption.

Noise level effect on Dq
Unlike the analysis of adequate data sequence length N, noise analysis has been performed on chaotic attractors where the generalized dimensions Dq were estimated as the noise level is increased [60,124,49]. Several authors have analyzed different chaos attractors and suggest various noise levels. In particular, Meisel and Johnson [60] reported generalized dimensions Dq which are within 1% of their analytical values when noise of variance 0.5% of the numerical range was added to the time series produced from the asymmetric snow-flake fractal [104].
From section 5.2, the average signal to noise ratio SNR in the speech data from the ISOLET database is about 31.5dB. This is a noise level with variance of about 0.005% of the numerical range, i.e., high signal-to-noise ratio SNR. The high signalto-noise SNR and reasonable record length N are good indicators that the estimated generalized dimensions for speech signals are fairly accurate.
In the next section, we describe th~ procedure for estimating the generalized dimensions Dq for unvoiced speech signals. We apply the procedure to all 40 speech data signals manually selected from the ISOLET database. The standard procedure is to increase the embedding dimension dE incrementally until the calculated dimensions appear to have converged [49,110].

Procedure for estimating the generalized dimensions from speech data
In this dissertation, the generalized dimensions [3], A numerical approximation to th~ generalized correlation integral is given in equation (2.47) [60). Also, a numerical implementation algorithm for the generalized correlation integrals, GCI, which avoids taking the logarithm of zero is derived in Appendix A.
A procedure for extracting the generalized dimensions Dq is as follows: For each speech signal data record: • Multiply each sample of the speech time series by a scaling factor such that the inter-point spacing, lij = llXi -Xiii of the embedded vectors, Xi, in equation 2.38 using the highest embedding dimension dEmuz, has a minimum and a maximum inter-point spacing as proposed in [125]. €max ~ 10000 = max lij ij • Estimate the optimum delay time r opt using the mutual information method [57] described in section 2.8.1. An example of the optimum delay time estimation using mutual information on unvoiced speech sound \ "s"\ from the ISOLET database is shown in Fig. 5    • Lacunarity: For many systems, the plateau region is flat and the correlation dimension D 2 can be easily determined from the plot of the local slopes versus log f. However, for some systems, a well defined plateau region is replaced by a region of ragged and almost periodic-like structure referred to as lacunari ty.
This occurs as a result of the fractal nature of the time series. Lacunarity has been studied extensively in the triadic Cantor sets [126,5] . will give a reasonable estimate of the correlation dimension D 2 [5,127].
• Mixed: By "mixed" , we mean that more than one type of local slope structure can be identified for certain speech signals. In most cases for unvoiced speech signals, we have a "mixed" combination of the "normal" and of the "lacunarity" structures. Just as in the case of "multiscale" structure, the mixed .structure also presents some interesting, but difficult problems.

7 Effect of high embedding dimensions on real data
In section 5.4.1, we examined the relationship between the estimated correlation dimension D2 and the number of available data samples N. In practice, one computes the generalized dimensions as a functi::m of the embedding dimension until convergence it reached. However, care must be taken to ensure that the local slopes in the plateau region are not corrupted by artifacts generated when the embedding dimension dE is too high for a fixed number of data points.
Consider the numerical implementation of the generalized correlation integral for         The effects of using overly high embedding dimensions on a fixed length data sequence for male speakers and female speaker were most severe in the unvoiced speech sound\ "s"\. This is due to the relatively small segment length N caused by the removal of the long silence region and as well as the quasi-periodic part in \ "s" \ as compared to the truncation of only the silence in the unvoiced speech "z".
In the next section, we consider the turbulent nature of unvoiced speech signals.
We describe a method by which the nature of the turbulence involved in unvoiced speech production can be characterized.   .. Jn the dynamical modeling of what is technically called turbulence or "hard" turbulence, the estimated correlation dimension D 2 is so high that such modeling is invalid and discouraged [14]. The multifractal analysis of unvoiced speech signal has revealed that what is generically referred to in the speech modeling literature as turbulence is actually "soft" turbulence [128].
The major difference between soft turbulence and "hard" turbulence is their relative value of the Reynolds number Re defined in section 3.2.2. According to Heslot et al. [128] in their study of helium gas at low temperatures, three regimes are possible depending on the value of the Reynolds number Re. There is a transition between the three states. A chaotic state is possible up to Re = 2.5 x 10 5 , a soft turbulent state from Re > 2.5 x 10 5 to Re = 4 x 10 7 , and hard turbulence at higher

Re.
Although the range of Reynolds number Re found in helium gas is higher than that of speech, the transitions from noise to soft turbulence to chaos reported in the helium gas can also be observed in the local slope plot of the "normal" structures of unvoiced speech signals defined in section 5.6. For example, in Figure 5

[14)
which discouraged such dynamical models in the turbulent region.
In the next section, we present the results of applying the procedure for estimating mn:In :'-ppendix D, more examples of the cori-elation dimension D 2 as a function of the embedding ension dE in the soft turbulent region will be given for rest of the unvoiced speech samples analyzed in this dissertation.     In this section, we present and discuss the result of multifractal analysis on unvoiced speech signals. The generalized dimensions Dq were obtained from unvoiced speech data using the procedure discussed in section 5.5. We estimate the generalized dimensions D q for q from -5 to 5 for all the unvoiced speech data.

Result of Dq estimation for unvoiced speech signals
In the multifractal analysis of unvoicE.d speech signals, using the procedures itemized in section 5.5, three different types of generalized dimensions Dq curves were identified. They include: • Normal: For a chaotic dynamical system, which exhibits multifractal behavior, the estimated generalized dimensions give a set of Dq values along a smooth D q curve that satisfies Dq < Dq' for q > q' (5.6) 146 as explained in section 4.7.
2. An example of such a smooth Dq curve is shown in Fig. 4.5 for the triadic Cantor set. In unvoiced speech multifractal analysis, whenever the expression for Dq in equation (5.6) is satisfied for I q I~ 5, the estimated generalized dimensions curve shall be referred to in this thesis as "normal".   6). Therefore, in unvoiced speech multifractal analysis, any estimated Dq curve that violates this expression within -5 ~ q ~ 5 shall be referred to in this thesis as "abnormal". Fig 5.29 shows an example of an abnormal Dq curve encountered in the multifractal analysis of unvoiced speech signals. A non-constant monotonic decreasing Dq curve causes significant errors in the calculation of the singularity spectrum, f(a), and hence, will not be used in the multifractal measure calculations.
In the next section, we use the min-max Parks-McClellan filter design approach of section 4.9 to evaluate the Legendre transform of the "normal" generalized dimensions Dq estimated in the last section in order to obtain an estimate of the f(a) singularity spectrum for unvoiced speech signals.

f(a) singularity spectrum for unvoiced speech
In chapter 4, we discussed the f (a) singularity spectrum formalism related by a Legendre transform to the generalized dimensions Dq. We also introduced the Park/McClellan min-max filter design method that can be used to improve the accuracy of the conventional interpolation and differentiation approximations needed to compute the Legendre transform betw<::en Dq and /(a). We now apply the min-max filter method to the generalized dimensions Dq of unvoiced speech signals.
In order to be consistent, the f (a) singularity spectrum of unvoiced speech signals will only be evaluated using monotonic decreasing Dq values in the range of q between -3 and 3. This limit was chosen because most of the generalized dimensions Dq      decreasing D q has been violated around q < -2 and q < -1.  157 curves were "normal", i.e. monotonic decreasing in the range of I q I< 3. Accurate Dq estimates for q < -3 required large data lengths than were available.
For each speech signal, the generalized dimensions D 9 were obtained using equation (2.32) at integer q values (6q = 1). The resulting sparse generalized dimension D 9 samples were then optimally interpolated by a factor of 4 using a 33rd order min-max lowpass filter with passband between normalized frequencies 0 and Fp = 0.065. The transition region is between the normalized frequencies Fp = 0.065 and F 11 = 0.185 and the stopband region is between normalized frequencies Fa = 0.185 and FN = 0.5. The derivatives in the Legendre transform in equations (4.18) and (4.19) were evaluated with a 5th order min-max differentiator whose frequency response is approximately linear in the passband region with the normalized frequencies between 0 and Fp = 0.15. The transition region is between Fp = 0.15 and Fa= 0.25.
The stopband extends from Fa = 0.25 to FN = 0.5. Fig. 5.32 (a) shows the optimally interpolated generalized dimensions D 9 for an unvoiced speech signal \ "s"\ uttered by a female speaker. This corresponds to the optimally interpolated D 9 curve between q from -3 to 3 for the speech signal whose D 9 curve is shown in Fig. 5.23. The original D 9 samples are shown in asterisks "*" and the interpolated D 9 values are depicted with plus signs "+". The original estimates appear at multiples of 6q = 1, while the interpolated samples appear at multiples of 6q' = 0.25. Fig. 5. 32 (b) shows the corresponding f(a) singularity spectrum. This is a valid spectrum with no spurious end or interior points. It is convex and has one maximum at D 0 = 6.02 which is exactly the same as the Do value from the interpolated D 9 curve of  : + + * : : : : : .
The variability and inadequacy of the correlation dimension D 2 , which may be due to variations in the Reynolds number Re of the underlying soft turbulence in the speech production for different speakers, has now been resolved by the new result that unvoiced speech sounds may be multifractal. From the multifractal analysis, each unvoiced speech sound is characterized by its own set of non-constant generalized dimensions Dq which clearly define the structure of its attractor. In all the unvoiced speech signals analyzed, the set of valid generalized dimensions Dq values were different for the various speakers.
From each group of unvoiced speech signals analyzed, the f(a) singularity spectrum was evaluated from the valid generalized dimensions Dq values with q from -3 to 3. By valid, we mean that the generalized dimensions Dq values estimated from unvoiced speech sound strictly satisfy Dq < Drt for q > q'. (5.7) in the range of q defined above. This is to ensure that f (a) singularity spectrum    Table 5.2: Shows a summary of the values of amin and Clma:z: for the unvoiced fricative speech sound \ "s"\ and \ "z"\. The values correspond to the estimates of D 00 and D_oo obtained by the min-max method described in chapter 4.

Chapter 6
Conclusions and Future work The complexity of speech production and modeling cannot be overemphasized. A lot of progress has been made in the linear modeling of speech [88,86,9]. However, nonlinear dynamical analysis and modeling of speech signals is still at its infancy.
Rapid progress can be made by abstracting the essential features of the nonlinear dynamical models underlying speech production. The extraction of these key features is the right direction towards finding the "ultimate" speech model. Therefore 175 The multifractal analysis also shows that unvoiced speech signals exhibit lowdimensional chaos as well as "soft" turbulence. Contrary to the opinion that unvoiced speech signals are generated from what is technically know as "hard" turbulent flow, in which the dimension of a dynamical model is very high [14,63], unvoiced speech signals may actually be generated from "soft" turbulent flow as discussed in section 5.8. Although, the estimated correlation dimensions D2 in the soft turbulent region increased with increasing embedding dimensions dE, the estimated D2 is low and the increase is gradual. Hence, this presents us with the possibility that dynamical modeling of soft turbulence in unvoiced speech signals may be a reality.
In this dissertation, we also explore the relationship between the estimated gen-  [16], then given an unvoiced speech signal, we can estimate the generalized dimension Dq using the GCI method described in chapter 2. With the J(a:) singularity spectrum evaluated from the Dq curve, it is now possible to obtain a lot of information about the properties of a nonlinear map that gives rise to the evaluated /(a:) curve using the proposed method of Feigenbaum et al. [132]. This could lead to simple maps that may be used to model how unvoiced speech signals are produced in the vocal cavity.
The resulting maps may be useful in understanding the complexity involved in speech production thereby leading to better nonlinear dynamical modeling of speech. Also, symbolic dynamic representation [133,134] of unvoiced speech signals from these nonlinear maps may be possible. With this plausible scenario, this dissertation has presented evidence for the potential of multifractal analysis to become an important tool in characterizing and describing unvoiced speech sounds.
The computation of the generalized dimensions Dq is on the order of O(N log N) for each q [47]. Therefore, as the range between Qmin and Qmax, defined in chapter 4, is increased and the number of speech sample N gets _longer, the estimation of Dq becomes more expensive. But with the advent of faster computers and with better efficient algorithm for finding the nearest neighbors of any arbitrary point on the attractor [61], the computation time required to estimate generalized dimensions Dq will be considerably reduced. With this computational reduction, the generalized dimensions Dq which are different and unique for different speakers may be a useful speaker identification or speaker verification criterion. In the next section, we describe some of the unanswered questions which this dissertation has raised.

Future Research
In this dissertation, the multifractal analysis of unvoiced speech signals has been performed on a few sounds in the alphabet of the English language spoken in isolation. This is not the ideal scenario since human being converse in continuously spoken sentences. The multifractal analysis of unvoiced speech signals in this dissertation is only the tip of the iceberg in the rich nonlinear dynamical study of speech signals. Therefore it would be very instructive to analyze unvoiced speech signals that are part of a spoken sentence. Continuous spoken sentences have more intervocalic structure than alphabets spoken in isolation. Also, in the evaluation of the f (a) singularity spectrum of unvoiced speech signal from their generalized dimensions Dq, many of the Dq values were declared invalid because the curves were not monotonic decreasing. It would also be interesting to extrapolate between this invalid Dq values under a monotonic decrease constraint. This may lead to further reduction in the computation of the generalized dimensions. Finally, we hope to perform multifractal analysis on existing biomechanica1 model of the vocal tract [21] in order to compare theoretical multifractal results from the biomechanical models to results obtained from multifractal analysis of unvoiced speech signals.
Future modeling of speech as a nonlinear dynamical system could profitably consider entropy. Schuster [2] stress the importance of entropy, especially the Kolmogorov-Sinai entropy (K 1 ), as a measure by which chaotic motion in phase space can be characterized. K 1 is the average rate of increase of disorder in a dissipative dynamical system, and contains information both about the time evolution (as the Lyapunov exponents) and the static structure of the asymptotic orbit (as do the generalized fractal dimensions Dq)· When chaotic orbits are multifractals, there exists generalized entropies Kq for which Kq > Kq' if q < q'. Implementing the calculation of Kq poses a number of technical problems, these include stabilization of the Dq at high embedding dimensions dE, choosing appropriate metric, and determining that the formulas for Kq (which are only correct asymptotically) are valid [51]. Unfortunately, there is a paucity of good bench-mark entropy calculations.