Time-Frequency Based Methods for Non-Stationary Signal Analysis with Application To EEG Signals

The analysis of electroencephalogram or EEG plays an important role in diagnosis and detection of brain related disorders like seizures. In this dissertation, we propose three new seizure detection algorithms that can classify seizure from nonseizure data with high accuracy. The first algorithm is based on time-domain features which are the approximate entropy (ApEn), the maximum singular value (MSV) and the median absolute deviation (MAD). These features were fed into the AdaBoost and the Support Vector Machine (SVM) algorithms, which were used to classify the signal as either seizure or non-seizure. The accuracy of these classifications was summarized and compared to different algorithms in the literature. In the second algorithm, the Rényi entropy was extracted from different spectral components after the EEG signal was decomposed using either Empirical Mode Decomposition (EMD) or the Discrete Dyadic Wavelet Transform (DWT). The knearest neighbor (k-NN) classifier was use to classify the seizure segments based on the extracted features. In the third algorithm, we decompose the EEG signal into subcomponents occupying different spectral sub-bands using the EMD. A decomposition energy measure was used to discard those sub-components estimated to contain mostly noise. Different time-frequency representations (TFRs) were computed of the remaining sub-components. Local energy measures were estimated and fed into a linear classifier to determine whether or not the EEG signal contained a seizure. The three algorithms were tested on noisy EEG signals from roaming rats as well as the relatively noise free human seizure from a well-known public dataset provided on-line (Andrzejak et al., 2001). Using Metrics of total Sensitivity, Specificity and Accuracy, it was demonstrated that the proposed algorithms gave either equivalent or superior performance when compared against several other brain seizure algorithms previously reported in the literature. Furthermore, we propose a new warping function to create a new class of warped Time-Frequency Representations (TFRs) that is a generalization of the previously proposed k Power Class and Exponential Class TFRs. The new warping function is k t e t w at / 1 ) ( = ∧ . We provide the formulas for the one-to-one derivative warping function and its inverse defined using the Lambert-W function. Examples are provided demonstrating how the new warping function can be successfully used on wide variety of non-linear FM chirp signals to linearize their support in the warped TimeFrequency plane. An optimization scheme was proposed to find the optimal parameter, “a”, of the new warping function for a given non-linear FM chirp signal; algorithms have previously been proposed for finding the k. The performance of the optimization technique was compared to other warped Time-Frequency Representations; the new warped TFRs achieved better linearization in several cases. The new warping function was used to develop a new algorithm which iteratively isolates and separates nonlinear FM signal components in a multicomponent signal. The isolated components have negligible interference terms and have energy support concentrated along a curve close to the true instantaneous frequency.


Epilepsy and Seizure
Epilepsy is one of the most common chronic neurological disorders that predispose individuals to experiencing recurrent seizures [1]. Seizures are a sudden, paroxysmal alteration of one or more neurological functions such as motor behavior, and/or autonomic functions [2]. Epilepsy is not a singular disease entity, but a variety of disorders reflecting underlying brain dysfunction that may result from many different causes [3]. Approximately 2% of the world populations are diagnosed with epilepsy.
The occurrence of this brain malfunction is unpredictable, and may cause altered perception or behavior as sensory disturbances, or loss of consciousness. The negative influence of uncontrolled seizures, i.e. the patient will experience major limitations in family, social and educational activities, that extend beyond the individual to affect the whole society and may produce irreversible brain damage by time [16].
Seizures are subdivided into two sets: partial and generalized. In partial seizures, a limited brain area is implicated in the epileptic discharge. In contrast, generalized seizures originate from multiple brain regions and are characterized by general neurological symptoms [4]. Epilepsy is commonly treated with anti-epileptic drugs; but for some patients, medications are not enough to restrain their seizures. Thus they are candidates for surgery in order to remove the damaged brain tissue which requires accurate localization. Because of the unknown time of occurrence of seizures, these patients undergo prolonged monitoring during which a variety of clinical examinations are performed [5].
Different types of seizure detectors were developed [4,14,15,17,20]. In this thesis, we will illustrate our new developed algorithms for seizure detection using the brain's electrical activity. Moreover, we will demonstrate the feasibility of using our algorithm for accurate and rapidly detecting seizure.

Electroencephalogram and Laplacian electroencephalogram
The Electroencephalogram or EEG is the recording of electrical activity variations from cortical neuronal activity [6]. The EEG, measured using non-invasive electrodes placed on the scalp, is referred to as a scalp EEG. When an EEG is measured using electrodes placed on the surface of the brain or within the brain it is referred to as intracranial EEG. In this study, scalp EEG signals have been used. The placement of EEG electrodes on the scalp usually follows a standard configuration known as the international 10-20 systems (Figure 1.1) suggested by the International Federation of Societies for Electroencephalography and Clinical Neurophysiology (IFSECN) [18].
The "10" and "20" indicate that the distance between adjacent electrodes is either 10% or 20% of a specified distance measured using for example the total distance between the front and back or left and right of the head. Each electrode is labeled by specific letters and numbers. The placements of these electrodes are labeled according to the adjacent brain areas: F (frontal), C (central), T (temporal), P (posterior), and O (occipital) [18].
The EEG is an important tool in studying and diagnosing neurological disorders such as epilepsy, as it contains valuable information related to the different physiological states of the brain [7]. An abnormal EEG signal displays non-stationary behavior including spikes, sharp waves or spike-and-wave; so patients with epilepsy can have specific features in their EEG [4]. Seizures are manifested in the EEG as paroxysmal events characterized by stereotyped repetitive waveforms that evolve in amplitude, i.e. large amplitude or low amplitude, and high frequency before eventually decaying [8]. Moreover, due to its high temporal resolution and its close relationship to physiological and pathological functions of the brain, the EEG is considered as  [19].
Recently, improvements have been applied to EEG recording techniques, making it more accurate by increasing the spatial resolution. One such improvement is the application of the surface Laplacian to the EEG [9,10]. The surface potential distribution on the scalp reflects functional activities evolving from the brain [11]. The variation in this surface can be recorded using an array of electrodes on the scalp, and then measuring the voltage between pairs of these electrodes. The scalp surface Laplacian is an alternative method for presenting EEG data with higher spatial resolution. It has been shown that the surface Laplacian (the second spatial derivative) is proportional to the cortical potentials and increases the high spatial frequency components of the brain activity near the electrode [12]. To obtain the Laplacian, a new approach was considered by using unique sensors and instrumentation for recording the signal [9,13]. The unique sensor configuration which measures the Laplacian potential directly is the Tripolar Concentric Ring Electrode (TCRE) depicted in Figure 1.3-B. Laplacian EEG or tEEG was defined in Besio et al. [9,13] as are the voltage of the middle ring, outer ring, and central disc of the tripolar electrode, respectively. Figure 1.3-A shows the traditional disc electrode. Koka and Besio [10] showed that TCRE provides approximately four times improvement in the signal-to-noise ratio, three times improvement in spatial resolution, and twelve times improvement in mutual information compared to disc electrode signals. The TCRE also exhibit strong attenuation of common mode artifacts [10]. These findings suggest that tEEG may be useful for seizure detection or other neurological disorders analysis [14,15]. An example of a seizure pattern in Laplacian EEG (tEEG) signal is shown in Figure 1.2.
In this thesis, EEG and tEEG signals were used as a tool for seizure detection and analysis.

Methods for seizure detection
It is unquestionable that a method capable of detecting the occurrence of seizures with high accuracy would significantly improve the therapeutic possibilities and thereby the quality of life for epilepsy patients [16]. scalp EEG where important information that allows seizure detection are measured and extracted from EEG signals. The usual methods for seizure detection are based on data analysis by observing long recordings of continuous EEG signals. Long-term EEG recordings increase the possibility to capture and analyze seizure events and also 4augment detection and diagnosis [4]. However, it is a very monotonous and timeconsuming process [17], because there are large amounts of data to analyze and the presence of artifacts may lead to false positive detections. For this purpose, automated seizure detection methods with high sensitivity and low false positive rates are of great significance in recognizing and reviewing EEG for epileptic seizure detection.
Since the 1970s, automated seizure detection has been a challenge with several algorithms and methods developed [4,7,14,15,[20][21][22] but no detector dominates with excellent sensitivity and specificity. This may be due to noise and artifacts such as eye movement and muscle activity, which make detection more difficult [4]. The first algorithm developed for automatic seizure detection is the original work of Gotman [17]. Since then, there has been a marked increase in seizure-related research using signal processing techniques used for selection of discriminative features related to the presence of seizure.
The general procedure for developing methods for seizure detection is composed into two main steps: features extraction and classification. The purpose of features extraction is to determine which information or patterns are necessary from the EEG signal to identify a seizure event from non-seizure. Extracted features depend significantly on the method used to extract the hidden information in the signal. Some of the methods are based on morphological characteristics of epileptic EEG recordings such as features for seizure detection [4] where epileptic seizures are often visible in EEG recordings as rhythmic discharges or multiple spikes [4]. For example, the Gotman algorithm successfully detects seizures with sustained rhythmic activity with a fundamental frequency below 20 Hz [17]. He developed an algorithm that first breaks down the EEG signal into half-waves. Then morphological characteristics of these half-waves, such as amplitude and duration, were used to determine the existence of epileptic seizures [17,23]. For rhythmic discharges: fast Fourier transform based features [4, 24 and 25], frequency domain based features [4,[26][27][28][29], time-frequency based features [4, 21, 22, and 30], or wavelet based features [4,[31][32][33][34][35] have often been used.

Thesis Contributions
The major contributions of the thesis are summarized as follows: • Developing an algorithm for seizure detection maintaining high sensitivity and specificity with fast execution time using time domain features [14]. The proposed algorithm was tested using human and animal data containing seizure and was compared against popular seizure detection techniques.
• Developing a new technique for seizure detection using time-frequency representation while reducing effects of noise and artifacts [15].
• Developing an algorithm for seizure detection showing the superior performance of the Renyi entropy comparing to other entropies [56].
• Comparing the performance of the Empirical Mode Decomposition (EMD) algorithm and the Discrete Wavelet Transform (DWT) for seizure detection [56].
• Developing a new warping function that can be successfully used to linearize the TF structure of non-linear Frequency Modulated (FM) chirp signal.
• Developing a new warping-based multi-component signal decomposition algorithm to decompose multi-component signals consisting of non-linear FM chirp signals.

Thesis Organization
The thesis consists of six chapters. The organization of the thesis is as follows: Chapter 1 is the introductory chapter that reviews the definition and the basic concepts of epilepsy, seizure, EEG and outlines the previous approaches for seizure detection as well as the major contributions of the thesis.
Chapter 2 briefly reviews the fundamentals of time-frequency (TF) signal processing and summarizes the ability of TF in analyzing non-stationary signals, like electroencephalography. The usefulness of time-frequency Reassignment and Synchrosqueezing for cross-terms reduction will be discussed.
Chapter 3 describes two newly developed algorithms for seizure detection using the scalp EEG signal. The first algorithm was based on time-domain features and the second algorithm was based on the measurement of signal complexity using entropies from different spectral components. Using these sets of features, the proposed algorithms can classify the contaminated data with low computational complexity while still maintaining very good accuracy. The performance of the proposed algorithms was evaluated and compared against several other brain seizure algorithms previously reported in the literature.

Introduction
The objective of many signal processing applications in the real world is to explore the behavior of recorded signals to analyze its components [107,109]. The main goal of signal processing is to extract useful information from the signal by transforming it and involves techniques that improve our understanding of information contained in this signal. For example, the EEG has become one of the most important diagnostic tools in the neurophysiology area, but until now, EEG analysis still relies mostly on its

The need for Time-Frequency Analysis
The traditional way to analyze a signal is time domain analysis or frequency domain analysis. The time domain is a record of the system parameters versus time. In real life, in many applications such as seismic surveying, communications, radar, and sonar, the signals are non-stationary, which means that the frequency content of those signals are varying with time. Time-frequency representations (TFRs) have been applied to analyze, modify and synthesize non-stationary or time-varying signals [40,105,[107][108][109]. Furthermore, TFRs can handle multi-component signals and outperform existing techniques which are only applicable for mono-component signals [36].  In addition to all the advantages of TF analysis cited above, TFR can significantly simplify the interpretation of signals due to their ability to display frequency information that changes over time. Thus, TF analysis has been developed for a wide range of problems with signals that contain highly localized events such as bursts, spikes, and discontinuities, which typically occur in EEG signals during seizures.
Moreover, using TF analysis, the energy content of an EEG signal can be visualized.
This can help understanding the characteristics of the EEG signal in order to determine the best approach to analyze and process the signal.

Types of Time-Frequency Representations
There are many different time-frequency Representations (TFRs) available in the literature for analysis [37][38][39][40][41][42][43][106][107][108][109] and each TFR has its own strengths and weaknesses. Some of these TFRs provide excellent resolution, but take long to calculate. Others provide poor resolution, but can be calculated quickly. Furthermore, some TFRs provide a good balance of speed and resolution, but due to their quadratic nature, have interference terms and cross-terms. TFRs can be divided into three main classes: linear classes, quadratic classes and higher order classes [40,[106][107][108][109]].

Short time Fourier Transform (STFT)
The Fourier Transform (FT) is a reversible, linear transform with many significant properties. The FT decomposes a signal into a set of weighted harmonic components with fixed frequencies. To compute the Fourier transform, the signal must have a finite energy. The Fourier transform of a signal ) (t x is given by: It has been proven that analysis of non-stationary signals by the FT does not bring complete information about spectral content [44,45]. In order to add time-dependency in the FT, a time-domain signal is divided into a series of small pieces, where the signal is windowed into short segments and then the FT is applied to each segment. The new transformation is called the Short-Time Fourier transform (STFT) and it maps the signal into the TF plane. The STFT assumes that the signal is quasistationary, i.e. stationary over the duration of the window. The STFT of a signal ) (t x is given by: whereas the long window used in (C) produces smearing the time direction .Therefore, it is difficult to achieve good localization simultaneously in both the time and the frequency domains since the STFT depends only on one window. This limitation is related to the Heisenberg uncertainty principle [46,47]. It was proven that a Gaussian window is the only window that meets the lower bound of the Heisenberg principle; the Gabor transform [40] is similar to a STFT computed using a Gaussian window.

Wavelet Transform (WT)
The Wavelet transform, a time-scale representation, is another linear TF distribution. The WT is similar to the STFT but instead of a fixed duration window function, the WT uses a varying window length by scaling the axis of the window. The WT of signal ) (t x is defined as [40,108]: The WT is the convolution of a signal ) (t x with a window ) (t g shifted in time by " b " and dilated by a scale parameter " a " [40,48,108]. The parameter " a " can be chosen such that it is inversely proportional to frequency to obtain a TF representation comparable to the STFT [40]. For this reason, at low frequencies the WT provides high spectral resolution but poor temporal resolution. On the other hand, for high frequencies, the WT provides high temporal resolution which enables the WT to "zoom in" on singularities. Unfortunately, high frequencies will have poor spectral resolution. This means that for the WT there is also a tradeoff between time and frequency resolution. The energy density function of a WT is defined as and is called the Scalogram which is a quadratic TFR [40, 108].

Wigner-Ville Distribution (WVD)
The Wigner-Ville distribution (WVD) is considered as a TF representation that attains good tradeoff between time versus frequency resolution [40,105,106]. The WVD of a signal ) (t x is given by: is the complex conjugate of ) (t x . The WVD can provide very good resolution in time and frequency of the underlying signal structure because of its interesting properties such as preserving frequency support, instantaneous frequency, group delay, etc. [40, 48, and 49]. However, because of the bilinear nature of the WVD, and due to the existence of negative values, the WVD has misleading TF results in the case of multi-component signals (e.g., the EEG) due to the presence of cross terms and interference terms [40,48,105]. For example, the WVD of the multi- The first two terms, ) , ( (2.6) Cross WVD terms can be reduced by using proper smoothing kernel functions as well as analyzing the analytic signal (instead of the original signal) to solve the problem of cross terms produced by negative frequency components. The analytic signal is given is the Hilbert transform defined as: . Thus, equation (2.4) will be: where, is the analytic signal associated with the signal ) (t x . Figure

Smoothed-Pseudo Wigner-Ville Distribution (SPWVD)
To address the problem of cross-terms suppression while keeping a high TF resolution, other TFRs have been proposed. Among these is the smoothed pseudo WVD (SPWVD) [50]. The SPWVD permits two independent analysis windows, one in time and the other in the frequency domains to improve the readability of the Wigner-Ville distribution. The SPWVD of a signal ) (t x is given by: where t is the time variable, f is frequency, h is the frequency smoothing window and g is the time smoothing window.

A comparison of the existence of cross-terms in the WVD, the Spectrogram and the Scalogram
It was shown from the previous section that although the WVD has very important properties compared to other TFRs, the presence of cross-terms makes the WVD difficult to interpret. These cross-terms interfere with, and often mask, the true TF information and could lead to misinterpretations of the signal energy concentration and misreading of the TF signature of the corresponding signal. The cross-terms of the WVD have been extensively analyzed [40,48,105]. It has been found that the WD cross-terms lie at the mid-time and mid-frequency of each pair of auto-components; they are highly oscillatory and can have amplitudes twice as large as the product of the magnitudes of the WVD of the two signals under consideration [40,48].
Often practitioners use the STFT spectrogram or WT Scalogram claiming they are free of the problems of cross terms found in the WVD. However, Kadambe and Boudreaux-Bartels [48] showed that the cross-terms comparable to those found in the WVD exist when considering the energy distributions of the STFT (spectrogram) and the WT (Scalogram). By deriving the mathematical expressions for the energy distributions of the STFT the authors deduced that [48]: (1)

Reassignment Time-Frequency Representation
The aim of the Reassignment method is to sharpen the TFR of a signal while keeping the temporal localization correct. This method is well adapted for multicomponent signals [51,108,109]. Moreover, Reassignment TFRs have been used to diminish cross-terms and enhance time-frequency concentration of auto-terms [51,52]. It was proven that the Reassignment method perfectly localizes linear chirps while removing most of the inner interference terms [52]. This method has been generalized to all the bilinear time-frequency and time-scale distributions [52]. The reassignment SPWVD (RSPWVD) offers good TF resolution and good interference reduction. The RSPWVD is given by [52]: ,

Synchrosqueezing Transform
Synchrosqueezing, introduced by Maes and Daubechies [55], is an invertible timefrequency analysis tool designed to decompose signals into constituent components with time-varying oscillatory characteristics. The reassignment method is a good analysis tool as it is a powerful representation of multi-component signals that results in high energy concentration in TF plane. However, it cannot be used for synthesis or reconstruction of the signal's individual sub-components. A recent study [53] shows that the Synchrosqueezing transform is an alternative to the Empirical Mode Decomposition (EMD) method [54], used for analyzing and decomposing natural signals, and developed with more theoretical foundation. This means that the Synchrosqueezing transform can extract and delineate components with time-varying spectra and allow for the reconstruction of these components [51].

Summary
Time It was proven that the existence of cross-terms in these two transforms are comparable to those found in the WVD, they just occur in different regions of the TF plane.
To deal with the problem of cross-terms, the non-linear and iterative Reassignment

Introduction
With a goal of reducing the time-consuming nature of and automating the visual analysis of electroencephalogram or EEG signals during seizure analysis, we developed two different types of algorithms for detecting the occurrence of seizures in data recorded using a scalp EEG. For both algorithms, the detection of seizure events is treated as a classification task to differentiate between data with seizure versus data without seizure. The classification involves two steps. First, figure out which kind of patterns we want to extract from the signal. This step is known as feature extraction; the feature vectors used by the proposed algorithms depend considerably on the method used. Next, a classifier will be used to distinguish the difference between feature vectors and classify the data with seizure from the data without seizure. The achievement of high classification accuracy depends on the features extracted, as well as on the classifier used to determine the class relationship.

EEG data acquisition
The data used for seizure detection are obtained from two different sources -(1) University of Rhode Island (URI), Kingston, RI USA, recording data from rats, and (2) Bonn University, Germany, using Human data. The databases from these sources are described in the next section.

Rats' dataset
The first dataset consists of scalp Laplacian EEG (tEEG) data acquired from rats at the University of Rhode Island (URI). The animal protocol used for recording the  256 Samples/s). The digitized signals were stored on a computer for offline analysis using MATLAB (Mathworks Natick, MA, USA). The two differential signals from the electrode elements (outer ring, inner ring, and center disc) were combined using an algorithm to give a Laplacian derivation of the signal as described by Besio [9,10].
The algorithm weights the difference between the middle ring and center disc sixteen times greater than the difference between the outer ring and the center disc.
The tEEG data of ten rats have been recorded using the procedure described above. The first five minutes from each recorded signal are considered as baseline or non-seizure data. Firstly, the tEEG data were divided into 30 second segments with each segment containing 7680 samples (256 samples per second for 30 s). The selection of the Seizure segments, i.e. segments during which a seizure occurred, was performed by an experienced behavioralist through visual inspections of the video recordings. Because of a large amount of artifacts and noise caused by grooming, chewing, and roaming of the rats during the recording, baseline segments were selected after visual inspection where the tEEG appeared to be calm and artifacts free.
Two sets of tEEG data corresponding to the Baseline data and Seizure data were used as the investigational data set for seizure detection. The database included 70 Seizures segments and 65 Baseline or non-seizure segments comprising the total number of Seizures and Baseline segments recognized by the behavioralist.

Human dataset
The second dataset consists of a subset of data recorded on humans described by Andrzejak et al. [2]. The database consists of five sets (denoted as Z, O, N, F and S) each containing 100 single channel EEG segments of 23.6 s duration. The data were all recorded with the same 128-channel amplifier system and digitized at 173.6 Hz sampling rate and 12 bit A/D resolution. The bandwidth of the acquisition system was from 0.5 to 85 Hz. These segments were selected and cut out from continuous multichannel EEG recordings after visual inspection where the data appear to be artifacts free [2]. Sets Z and O have been recorded extracranially from five healthy volunteers using electrodes placed according to the international 10-20 system locations [18]. were taken from all contacts of the relevant depth electrode [2]. In addition, strip electrodes are implanted onto the lateral and basal regions (middle and bottom) of the neocortex. EEG segments of the subsets S were taken from contacts of all electrodes (depth and strip). One EEG segment from each category is shown in Figure 3.2.

Seizure detection using Time-Domain features
In the EEG recording, seizures appear as a sudden redistribution of spectral energy on a set of EEG channels. To automate the detection of these changes in the scalp EEG signal during seizure activity, we proposed an algorithm which enables us to understand the differences between a seizure period and a non-seizure period of different patients by finding a subset of time-domain features capturing the seizure characteristics of each patient. These features will be used to classify whether a given data segment contained seizures or not. The procedure of the algorithm will be described in the next sections.

Data preprocessing
In this stage, the rats' data were preprocessed before they are used. Before starting the analysis procedure for seizure detection, the tEEG signals are down-sampled from 256 to 128 Hz. This reduced the amount of data samples and thus the computation time of the algorithm. Before down-sampling, the signals were filtered first using an anti-aliasing low-pass filter with a cutoff frequency 64 Hz, to avoid aliasing. The MATLAB function downsample was used for the down-sampling procedure.
Note that the human dataset will be used as it is without any preprocessing step.

Feature extraction
A common approach in seizure detection and also in seizure prediction is to extract information or patterns; in other words, features that can characterize seizure morphologies from EEG recordings. Prior to feature extraction, the tEEG data, rats' data, and EEG data, human data, were segmented into one second duration epochs using non-overlapping Hamming window. During this research, we tested different segment lengths, e.g. 0.5s, 1s, 2s,...,. Among those we tested, the one second EEG epoch achieved optimal detection accuracy. Features used were: (1) the approximate entropy (ApEn) [62] (2) the maximum singular value (MSV) [64], and (3) the median absolute deviation (MAD) [65], which will be described in the next subsection.

Feature 1: Approximate Entropy (ApEn)
Entropy of a signal is a measure of the information contained in that signal [62]. It follows that entropy is also a measure of uncertainty of random variables or a complexity measure of a dynamical system. ApEn, which is the acronym for the approximate entropy [20,61], is a nonlinear dynamical analysis that quantifies the degree of complexity and irregularity in a time series data set such as estimation of regularity in epileptic seizure time series data [62]. ApEn is less sensitive to noise and can be used for short-length data [20,61]. The described methods by Dimbra et al. [59] and Kumar et al. [61] have shown that the value of the ApEn drops abruptly due to the synchronous discharge of neurons during an epileptic activity. Hence, it is a good feature in the automated detection of seizures.
is equal to the negative of the average natural logarithm of the conditional probability that two similar sequences for m points stay analogous, that is, within a tolerance r, at the next point [63]. The process of estimating ApEn is described in the following steps taken from [20, 61]: 1. Let the original signal contain N data points: Each vector represents m consecutive signal values, starting with the th i data point,

The distance between two vectors
is defined as the maximum absolute difference value between their respective scalar components, Then, for : 1 ,..., To compute ApEn values of a signal with length N, the embedding dimension m is set to 5, and a tolerance window r is set to 0.2 times the standard deviation of the original data sequence. Among the several different values for r and m , we tried, gave the best detection accuracy was achieved.

Feature 2: Maximum Singular Value (MSV)
The singular value decomposition (SVD) is a very powerful technique for recognizing and ordering the dimensions along which data points display the most variation. There are some fundamental ideas behind SVD [64]: (1) taking a high dimensional, (2) highly variable set of data points and (3) reducing the data to a lower dimensional space. These can make characteristics of the original data more visible from noise. The SVD decomposes a given matrix A into three different matrices as follows:  [64]. In this work, the maximum of the singular values is taken as a feature extracted for seizure detection.

Feature 3: Median Absolute Deviation (MAD)
The Median Absolute Deviation (MAD) is a measure of statistical dispersion [65].
The absolute deviation of an element of a data set is the absolute difference between that element and a given point. Common measures of statistical dispersion are the variance and standard deviation. The MAD is considered as a simpler way to quantify variation than using variance or standard deviation.
, the MAD is defined as the median of the absolute deviations from the data's median [65]: The MAD provides an absolute measure of dispersion that is not affected by outlier data that can throw off statistical analysis based on the mean and standard deviation.
MAD is more flexible with outliers existing in the data. Since large deviations are possible with seizure signals, a median based measure of dispersion will be a good choice for detection [66].

Classifier 1: Support Vector Machine (SVM)
The basics of the SVM algorithm have been developed by Vapnik [68] to analyze data and recognize patterns. The Support Vector Machine has been used for classification and regression analysis [67]. In this dissertation, it is applied for the . It is clear that the values of MAD and MSV are higher for Seizure data than for Normal data, but the ApEn for Normal data is higher than those for Seizure.
classification of extracted features. The classification is done by constructing an Ndimensional hyper-plane that optimally separates the data into two categories by maximizing the distance from the decision boundary to the nearest data-points (called support vectors) in the training data. The MATLAB implementation of the SVM classifier is used for this study (bioinformatics toolbox statistical learning commands (svmtrain, and svmclassify)). Since Seizure and non-Seizure classes are often not linearly separable, we generate non-linear decision boundaries using the Gaussian Radial Basis Function kernel (RBF) [67]. Furthermore the classification results obtained by our algorithm give better results using the RBF kernel function when compared to other SVM kernels. Figure 3.5-A shows the general diagram of SVM and

Classifier 2: Adaptive Boosting (AdaBoost)
AdaBoost is one of the most popular machine learning algorithms introduced in 1995 by Freund and Schapire [71]; it has surprising good classification results. In theory, boosting can be used to reduce the error of any weak-learning algorithm that constantly creates classifiers which need to be better than random guessing. Similar to SVM, the AdaBoost algorithm works by combining several votes by using weak learners instead of using support vectors. The AdaBoost.M1 algorithm described in Figure 3.6 was adopted for our work [71]. The algorithm takes as input the training set: In the beginning, the EEG data have to be labeled for training, with the Baseline segments labeled as ''-1'' and Seizure segments labeled as ''+1''. The algorithm repeatedly calls a given weak or learning algorithm in a series of Rounds T t ,..., 1 = and produces a distribution or a set of weights over the training set. The weight of this distribution on training data i on round t is denoted ) (i D t . In this work, the decision stumps were used as weak classifiers. A decision stump is a decision tree with one root node and two leaf nodes [71]. It performs a single test on a single attribute with threshold θ . A decision stump can also be used to reduce the error of any Weak-Learning algorithm that constantly creates classifiers which need to be better than random guessing; it is constructed for each feature in the input data as: , the weak learner's goal is to find a hypothesis which minimizes the training error: such that pr is the probability and t ε is the sum of distribution weights of the instances misclassified by the hypothesis t h , (step 3 in the algorithm). And we require that this error be less than ½. This error is measured with respect to the weight of the distribution denoted by ) (i D t . The data is re-weighted to increase the ''importance'' of misclassified samples (rule 5 shown in Figure 3.6). This process continues and at each step the weight of each learner is determined. Finally, the strong classifier is defined as the output of this algorithm.

Performance evaluation
In order to evaluate the performance of the proposed algorithm for seizure detection, the first two-thirds of the feature vectors (for both seizure and non-seizure data) were selected for training while the last one-third was used for testing. The results of classification and the performance of classifiers are expressed in terms of sensitivity, specificity, and accuracy (Table 3.2) which are defined as follows [72]: • Sensitivity: Is the percentage of seizure segments correctly classified by the algorithm.
• Specificity: Is the proportion of segments without seizures correctly classified by the algorithm.
• Accuracy: Is 3. Calculate the error of : be a proper distribution function).
Output the final hypothesis: .

Results and Discussion
The proposed features extraction method is evaluated using two classifiers; SVM and AdaBoost. Each tEEG and EEG segment is divided into epochs of a    Table 3.2 A comparison of classification accuracy obtained using tEEG data, rat's data, and EEG data from human. evaluation of our method using SVM classifier are better than those obtained by Fathima et al. [73] and Guo et al. [20], and are as good as the results obtained by Bedeeuzzaman et al. [66].
The major limitation of our study was that we could not use all the data recorded from rats using the TCRE. We had to pick a small segment of baseline or non-seizure data, 30 seconds duration, which was the least contaminated with artifacts. The artifacts and noise were caused by the rats roaming, grooming, and other behaviors during the recording. Even in the presence of strong artifacts, our methods worked quite well suggesting the algorithm is robust.

Seizure detection using the Rényi entropy
In our second algorithm proposed for human seizure detection, the Rényi entropy [74] was combined with two different methods: the Empirical Mode Decomposition method (EMD) [54] and the discrete wavelet transform (DWT) [48]. A block diagram describing the different steps of the algorithm is shown in Figure 3.7. Initially, the EEG signal was decomposed into sub-signals using the EMD method or the DWT.
Then, the quadratic Rényi entropy was used as a feature. The Rényi entropy was extracted from the first five IMFs when using EMD decomposition or from each subsignal when using the DWT. Finally, the extracted features are input to a k-nearest neighbor (KNN) classifier [27] to separate the EEG segments into either seizure or non-seizure classes. In this study, we will show that high accuracies are obtained using both techniques (EMD and DWT), which indicate good performance of the proposed method.

Feature extraction
Entropy can be used as a complexity measure for analyzing biological signals such as brain signals or EEG, because the brain is a complicated dynamical system with uncertainties [74].

Empirical Mode Decomposition (EMD)
EMD has been proposed for the analysis of nonlinear and non-stationary time series [54]. The EMD is a signal processing technique that adaptively decomposes a signal into oscillating components or Intrinsic Mode Functions (IMFs). Each IMF satisfies two basic conditions [54]: (i) the number of extrema and the number of zero crossings must be equal or differ at most by one; (ii) the mean value of the envelope defined by the local maxima and by the local minima must be zero or close to zero at all points. The spectral content of the IMF extracted at a given iteration is lower than that of the IMF extracted from the previous iteration, which permits analyzing the signal in different frequency bands. The EMD technique has been used in the field of biomedical signal processing, especially for seizure detection in EEG signals [75,76].
The EMD has several advantages: (1) The EMD algorithm is a decomposition method developed for non-linear and non-stationary signals [54], making it suitable in biomedical engineering applications such as detection of epileptic seizure from EEG signals [75,76]; (2) EMD can break down complicated signals, without the need of basis functions, into a finite set of band limited IMFs which can be described in the time-frequency domain; and (3) The EMD algorithm is considered as a type of filter bank decomposition method [78] used to isolate different components from multicomponent signals like EEG. Moreover, The EMD procedure allows time-frequency analysis of transient signals, which is not the case for stationary Fourier Transform based methods [77]. The EMD algorithm for a given signal ) (t x can be summarized in the following steps [78]: (1) Initialize sifting loop: The IMF should have zero local mean; so subtract the local mean from the original signal: Check the two conditions for an IMF described above to decide whether ) (t f j is an IMF or not, (conditions (i) and (ii)), 1 When the first IMF is defined, the smallest temporal scale in the signal ) (t x is defined as: In order to obtain all the other IMF components, the residue ) ( 1 t r of the data is obtained by subtracting ) The sifting process will be continued until the final residue is a constant, monotonic function, or a function with only one extrema from which no more IMFs can be derived. The subsequent basis function and the residues are computed as: where N is the number of IMFs and ) (t r N is the final residue. At the end of the decomposition, the signal ) (t x is given by: The adaptive nature of EMD can often provide a better numerical description of temporal patterns in non-stationary and nonlinear time series than traditional methods such as Wavelet Transform and Fourier Transform methods [54]. The EMD decomposition method is well localized in the time-frequency domain and reveals important characteristics of the signal [54]. The MATLAB code used in this paper to perform EMD is available at [88]. All calculations were implemented using MATLAB software version 7.10. In order to have different rhythms (or frequency components) in the EEG signal, the first five IMFs of each EEG segment were selected and used for further analysis. Figure 3.8 shows the first five IMFs, using the human dataset [2], for non-seizure segments (Z) on the left and seizure segment (S) on the right.

Discrete Wavelet Transform (DWT)
As we already mentioned in Chapter 2, The Continuous WT (CWT) of a signal ) (t x is defined as the affine correlation between ) (t x and a scaled and shifted wavelet function ) (t ψ as follows [48]. The Discrete Wavelet Transform (DWT) is one of the methods used to examine biomedical signals. It provides both time and frequency information of the signal [79].
The DWT is advantageous over the continuous WT since it allows the complete generation of the original signal without redundancy [79]. The DWT uses discretized parameters, "a" and "b", respectively [79]: = .   Figure 3.9

Signal
shows the sub-signals after DWT decomposition for both seizure and non-seizure segments, where each sub-signal represents different frequency sub-bands.

Rényi Entropy
Rényi entropy [74] based measures have recently been applied to different areas of signal processing and information theory. It has been proven that the Rényi entropy is a very robust measure [81]. The Rényi α-entropy is a generalization of the Shannon entropy and is defined as follows [74]: The advantage of Rényi entropy over the Shannon entropy lies in its generality and flexibility due to the parameter α enabling several measurement of uncertainty within a given distribution. Moreover, in contrast to Shannon entropy, there exist efficient methods for computing Rényi entropy for some values of parameter α [83]. In this paper, the Rényi entropy with 2 = α , called the Quadratic Rényi Entropy, was used.
It gave the best classification accuracy of the data analyzed among several values of α (α =2, 3, 4, 5) that were examined.

Ten-fold cross-validation for training and testing
There are various methods to split a given dataset into training and testing for EEG Seizure detection. One of the most known methods is a ten-fold cross-validation [82].
A ten-fold cross-validation technique will be applied throughout the training data and used to estimate how well the classification method will classify the new data during the testing period. In ten-fold cross validation, the data set is split into 10 equal sub-set partitions. In each iteration, one of the 10 subsets is used for testing whereas the other 9 subsets are used for training the dataset. The whole procedure is then repeated ten times using a different testing subset. The final result is the average of all the 10 iterations. The advantage of this method is that all observations are used for both training and validation, and each observation is used for validation exactly once.

Classification using k-nearest neighbor (k-NN)
The k-NN classifier was used to classify the seizure from non-seizure segments.
k-NN is is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure [84]. The k-NN approach has recently been recognized as a very important algorithm, due to its high classification accuracy in problems with unknown and abnormal distributions [84]. The k-NN classification finds a group of k objects in the training set that are closest to the test object, and assigns a label on the predominance of a particular class in this neighborhood [84]. Using the k-NN classifier, three important components must be identified: a set of labeled objects, a distance or similarity metric to compute the distance between objects, and the value of k, the number of nearest neighbors. In this chapter, the number of nearest neighbor k was set to be one, because it achieved the optimal detection accuracy of the different values of k (1, 2, 3, 4 and 5) that were examined.

Performance Evaluation
The performance of the proposed method for seizure detection was evaluated using the human dataset [2]. The same algorithm was tested using the approximate entropy (ApEn). The performance results are summarized in Table 3.6. Comparison of the obtained results, using the quadratic Rényi entropy combined with EMD or DWT, with others in the literature for the process of seizure detection using the same dataset was shown in Table 3.7.

Results and Discussion
The second proposed algorithm for seizure detection was based on the extraction of the quadratic Rényi entropy as a feature to discriminate between non-seizure and seizure segments after decomposing the signal into sub-signals using two different techniques: EMD which decomposes the signal into intrinsic mode functions (IMFs) and DWT which enable a time-frequency view of the signal. To evaluate the proposed method, two sets of human EEG data (available online in [2]) were used; one set having seizure (set S) events and the other set contain non-seizure data (set Z).  After EMD decomposition, the number of bands or IMFs is dependent on the frequency content of the signal to be analyzed (IMF1-IMFN shown in Figure 3.8,

Classification Method
where N is the number of the last IMF which varies from signal to signal depending on the frequency content). Figure 3.8 shows that the frequency content of these IMFs was given from high to low. Furthermore, the lower orders of these IMFs have fast oscillation modes of the signal, whereas the higher order IMFs have the slower oscillation modes for both seizure and non-seizure segments. The first five IMFs were used since they contained significant information with different rhythms.
For the WT, the number of decomposition levels was chosen based on the dominant frequency components of the signal. Similar selection orders as for the EMD was made for DWT. The seizure and non-seizure segments were decomposed into five sub-signals (D1-D4 and A4).
The quadratic Rényi entropy values were calculated for each sub-signal after EMD and DWT decomposition. In order to reduce the dimensionality of the detection algorithm, the mean value of the five quadratic Rényi entropies (for the five IMFs and the five wavelet sub-signals) were computed. The k-nearest neighbor (k-NN) classifier was used to classify the extracted features and calculate the performance of the proposed method. One important concern using k-NN classifier is that the parameter k (number of nearest neighbors) is a variable. In this chapter, good accuracy was achieved with k equal to one. The same algorithm was tested using the approximate entropy (ApEn). The computed accuracies are presented in Table 3.6. These results demonstrate the following; using the Rényi entropy the total classification accuracy reaches 100% using the EMD method and 99.95% using the DWT. These accuracy performances show that the EMD method using the Rényi entropy has some improved results. Thus, the classification accuracy using the EMD method performed slightly better. Comparing the results obtained using the quadratic Rényi entropy to those using the ApEn, with an accuracy of 86.36% using the EMD and 90.30 using the DWT, it is clear that the algorithm shows the superior performance of the Rényi entropy compared to the ApEn in detecting epileptic seizures. There are many other methods proposed for epileptic seizure detection in the literature. Table 3.7 compares our results with the results of other methods proposed in the literature using the same human dataset in [2]. It is shown in  Decomposition method (EMD) and Discrete Wavelet Transform (DWT) were used to decompose the signal into oscillations and trend components. The human dataset in [58] was used to evaluate the performance of the algorithm using the k-nearest neighbor (k-NN) classifier. The results showed that the performance of EMD is slightly better than DWT, which may be because the signals were relatively noise free.

Summary
Our results suggest that the approach of using quadratic Rényi

Introduction
It was shown in the literature that seizure evolution is, in general, a dynamic and non-stationary process [107,109]. Moreover, the signal during seizure is composed of multiple frequency components. In this chapter, we demonstrate the suitability of using the estimated energy from a time-frequency (TF) representation to classify EEG seizure versus non-seizure segments; we also compare our approach to other methods.
Frequency domain analysis of EEG signals shows that EEG signal activity ranges from almost DC to 100 Hz. Since seizure activity may exist in different frequency ranges, the Empirical Mode Decomposition method (EMD) was used to decompose the signal into different components which helps to detect the abnormal energy that appears in the signal during seizure. The algorithm exhibited a high percentage of classification. To test the proposed algorithm, data from rats with induced seizures were recorded using the tripolar concentric ring electrodes (TCREs).

Data Description
In order to evaluate the performance of the proposed algorithm for seizure detection, the rats' dataset shown in chapter 3 was used. The data were recorded using the tripolar concentric ring electrodes (TCRE). Two sets of the Laplacian EEG signal (tEEG) data from ten rats were used. The first set is composed of 65 single-channel tEEG non-seizure segments and the second set is composed of 70 single-channel tEEG Seizure segments.

Model-based seizure detection
Different methods for automatic seizure detection have been discussed in the literature [20, 28, 66, 73, 85, and 87]. Our proposed algorithm is based upon the decomposition of tEEG segments into several oscillating components via the EMD algorithm [54] followed by TFR (time-frequency representation). A localized energy estimate is extracted and considered as a feature for discrimination between seizure and non-seizure data. An overview of the different steps of the detection process is summarized in Figure 4.1, with a more detailed explanation to follow: 1. Downsample the tEEG signal to reduce the sampling rate from 256 Hz to 128 Hz to reduce the amount of data, and hence, computations without losing important information for analysis. The signal was filtered first using an anti-aliasing low-pass filter with a cutoff frequency 64 Hz to meet the Nyquist criteria and avoid aliasing.
The MATLAB function downsample was used for the down-sampling procedure.

2.
Decompose each tEEG segment into IMFs (Intrinsic Mode Functions) using the EMD algorithm [54]. Figure 4.2 shows the IMF bands for both seizure and nonseizure data. The IMF noise distribution algorithm proposed by Flandrin et al. [89] was used to select three IMFs for seizure detection.

3.
The three selected IMFs from each seizure and non-seizure segment were partitioned into one-second epochs using a non-overlapping, sliding Hamming window to avoid redundancy caused by an overlap.

4.
The analytic signal for each epoch was used [90] to eliminate the negative frequencies for better cross term reduction in TFRs. For a given signal ) (t x , the corresponding analytic signal is defined as: , where () HT is the Hilbert transform.

Feature extraction procedure
The procedure for feature extraction was delineated into three steps. First, the data segments were decomposed into the oscillatory components, known as IMFs using the EMD method, and three were adaptively selected. Then, the SPWV distribution was computed for each epoch for selected IMFs (to be described later). Finally, an estimate of the localized TF energy was calculated and used as a feature to discriminate between seizure and non-seizure events. The method is briefly described in the following sections.

Signal decomposition and IMFs selection
The tEEG seizure and non-seizure segments were iteratively decomposed into IMFs using the EMD algorithm [54]. To identify which IMFs to use in the proposed analysis, we need to know whether a specific IMF contains useful information or primarily noise. Flandrin et al. [89] developed a statistical model based on energy distribution of the noise between IMFs.
The method suggests decomposing the noisy signal into IMFs, and then comparing the IMF energies with the theoretical estimated noise-only IMF energies. The model is based on studying the energy in the modes of fractional Gaussian noise (fGn) after EMD decomposition. fGn is a generalization of white noise; it exhibits a flat spectrum and its statistical properties are determined solely by a scalar parameter H known as the Hurst exponent. In this chapter, we take (IV) Compute the EMD of the noisy signal, and compare the IMF energies by using the confidence interval as a threshold.
(V) Compute a partial reconstruction by keeping only the residual and those IMFs whose energy exceeds the threshold (confidence interval).

Non-Seizure Seizure
In this paper, the Flandrin algorithm for IMF selection [89] was run on each 30second segment of the signal to identify which three IMFs should be selected for the next algorithmic step of TFR energy feature classification.
In this study, the Flandrin algorithm determined that IMF3 was the first IMF to cross the threshold for the vast majority of the data set used.

Time-Frequency energy estimation
TF energy distributions are very important in the analysis and processing of nonstationary signals like the EEG [106,107,109]. An energy TFR ) , ( f t T x combines the concepts of a signal's instantaneous power 2 | x(t) | (t) x p = and spectral energy density is the Fourier transform of the signal ) (t x . The energy of TFR satisfies the following marginal properties [40]: Equations (4.4) and (4.5) indicate that if the TF energy density is integrated along one variable, the energy density corresponding to the other variable can be obtained.
The total signal energy condition Ex in equation (4.6) is derived by integrating the TFR ) , ( f t T x over the entire TF plane. Many TFRs, including the SPWV distribution, do not strictly obey the marginal properties in (4.4 and 4.5); that is, the frequency and time integrals of the distribution do not exactly equal the instantaneous signal power and the spectral energy density, respectively [40]. However, depending upon window choice, some TFRs can still be used to generate estimates of localized signal energy [38,40]. The total energy can be a good feature to detect signal events in the TFR because the energy during an EEG seizure event is usually larger than that during normal activity [22]. In this chapter, the approximate localized energy extracted from  , and IMF5 of a non-seizure tEEG rat signal. Shown at the bottom are the corresponding SPWV TF distributions created using two Hamming windows with 64-point length and 128-point length are used respectively, for the time and frequency smoothing windows. TF plots of the three IMFs show that the frequency content of IMF3 is higher than that of IMF4, which in turn, has higher frequency content than that of IMF5.

Classification and Performance calculation
After the selection of IMFs from both seizure and non-seizure tEEG segments, each IMF (of 30 second length) is partitioned into epochs of one second length and the energy is calculated for each epoch. This results in a feature vector set consisting of 90 samples (3 IMFs x 30 x 1 second) for each segment. A ten-fold cross-validation technique was applied during the training periods to estimate how well the classification method will classify the new data which were not used during the testing validation period. In ten-fold cross validation, the data set is split into 10 equal sub-set partitions. In each iteration, one of the 10 subsets is used for testing whereas the other  Shown at the bottom are the corresponding SPWV TF distributions created using two Hamming windows with 64-point length and 128-point length, respectively, for the time and frequency smoothing windows. It is very clear from these SPWVDs, that the frequency content of IMF3 is higher than that of IMF4, which in turn, has higher frequency content than that of IMF5. 9 subsets are used for training the dataset. The total data were randomly split into ten subsets. The whole procedure is repeated ten times using a different testing subset.
The final result is the average of all 10 repetitions. After the training and testing features were selected, they were then applied to a discriminant analysis classifier [91]. Discriminant analysis is a technique used to discriminate a single classification variable using various features. Discriminant analysis also assigns observations to one of the pre-defined groups based on the knowledge of the multi-features [92]. The usefulness of a discriminant model is based on its ability to predict the relationships among known groups in the categories of the dependent variable. In this paper, the MATLAB command "classify" was used and "diagQuadratic" was selected as a discriminant function type [91]. In order to evaluate the performance of the proposed method for seizure detection, the following statistical parameters were calculated [72]: (1) Sensitivity, (2) Specificity, and (3) Accuracy.

Results and discussion
The proposed algorithm for seizure detection is based on TFR analysis of several oscillating components broken down from the original signal via the EMD algorithm [54]. Localized energy estimates were extracted and considered as features fed into a classifier for discrimination between seizure and non-seizure data. The SPWV distribution was used to calculate the localized energy distribution of the signal.
After EMD decomposition, the seizure and non-seizure segments may have different total numbers of IMFs. Figure 4.2 shows that the non-seizure segment had twelve IMFs and the Seizure segment had only ten IMFs; this is because the number of IMFs depends on the spectral content of each signal [6]. Furthermore, it can be observed that the frequency content of these IMFs is organized from high to low, and the lower order IMFs capture fast oscillation modes of the signal, whereas the higher order IMFs capture the slow oscillation modes.
The IMF selection method proposed by Flandrin et al. [89] was used to automate selection of the IMFs used in order to reduce the impact of noise.     Accuracy (Acc)) of the proposed method using different combinations for the length of time and frequency smoothing windows.
achieving 98.61% accuracy shows that the estimated time-frequency localized energy is a better feature to detect the presence of a seizure in a tEEG signal. Furthermore, the classifiers (SVM and AdaBoost) used in our previous work [14] using the same dataset were tested in this study with the new selected features. The obtained results with the classifier used in this chapter gave better performance than SVM and AdaBoost.
Furthermore, there are other studies based on EMD decomposition for seizure detection such as Tafreshi et al. [76], in which non-Seizure data was distinguished from Seizure data with success rates up to 95.42%. Pachori and Bajaj [75] used the area measured from the trace of the analytic IMFs as features to analyze normal and epileptic human seizure EEG signals and they showed that those calculated areas gave good discrimination performance. Comparing our results to those in the literature which used EMD and TFRs for seizure detection, the high accuracy obtained from our method, with 98.61% accuracy, using data from freely moving rats shows that our proposed method for seizure detection is competitive with other methods disclosed.
Again the human data used in other researcher's work often has fewer interfering artifacts than the data recorded while the rats were moving, grooming, etc. which often renders the rat's data more difficult to analyze and classify.

Summary
In this chapter, we proposed an algorithm for seizure detection based on the recognition of relevant changes of the signal's frequency content in different sub-band levels. For this purpose, time-frequency localization of signal energy was essential.
The analysis is performed in three stages: (i) decomposition of tEEG seizure and nonseizure segments into oscillation IMF components using the EMD method; this process helps the analysis of the signal using different frequency levels, (ii) TF analysis of each selected IMF. (iii) Feature extraction. An energy estimate was applied to each selected IMF on specific selected windows length (one second duration) and (iv) classification of the EEG segment (to check the existence of seizure or not on each segment and classify them), was performed using a diagonal-quadratic discriminant analysis classifier. The methods were evaluated using a rat dataset recorded with the new electrodes TCRE. Sensitivity, specificity and the accuracy results were presented and shown that the proposed algorithm is effective to detect seizures from rat's tEEG signals.

Introduction
Frequency modulated (FM) signals, also known as chirp signals, are nonstationary signals that may serve as a paradigm for some non-stationary deterministic signals [93].

Fractional Fourier Transform (FrFT)
The Fractional Fourier Transform [93] was first introduced by Namias in the field of quantum mechanics for solving some classes of differential equations efficiently.
Since then, a number of applications of FrFT have been developed. It was discussed in chapter 2 about the importance and ubiquity of the ordinary Fourier transform (FT).
The FrFT is a generalization of the ordinary Fourier Transform with more flexibility, rich in theory and comparable computational cost. The FFT has many applications in filter design [96], communications [97], and TFRs [98]. Ozaktas et al. [94,95] where, α is a real number called the order of the FrFT, where n is an integer.
The kernel ) , ( f t K φ has the following properties: That is why the FrFT is considered as a generalization of the FT. Moreover, the FrFT is linear, commutative and associative. In general, the FrFT computation can be interpreted as multiplication by a chirp signal in one domain followed by a FT, then multiplication by a chirp signal in the transform domain and finally a complex scaling.   When the FrFT axis of rotation matches the chirp rate of the signal, the FrFT of the chirp signal will produce an impulse and the magnitude of the FrFT will reach its maximum magnitude [93][94][95][96][97]. The corresponding order of the FrFT is called the optimum α : with Fs equal to the sampling frequency (in Hz) and N is the number of samples in the chirp signal. Equation (5.5) shows that the appropriate choice of the FrFT fraction order depends on the chirp rate. However, the FrFT optimum order α cannot be found

Property Order FrFT
Identity operator  [96]. "I" is the identity matrix.
analytically, especially if we don't have any information about the chirp signal. For this reason, a one dimensional search algorithm must be applied to select the optimal order [ ] 1 , 1 − ∈ α using a fine spacing to get a good estimate of this order. The optimum value of α will be selected when the energy of the linear FM chirp focuses well in the time-frequency plane and the FrFT spectrum is highly concentrated and reaches a maximum magnitude.

The relation between the FrFT and the Wigner-Ville Distribution (WVD)
The rotation of the time-frequency plane by an angle φ in the transformed coordinates ) , ( v u used in the FrFT is given by:   FrFT and the WVD is [115,116]: It is shown in Figure 5.2-C that the Fourier Transform spectrum of the chirp signal is spread out over many frequencies whereas its FrFTs in Figure 5 interpreted it as a rotation operator in the time-frequency plane [97]. This rotation will appear as a horizontal chirp when the optimum value of α is used in the FrFT. Hence, it can be said that the WD is rotationally covariant to the effect of the FrFT [99].

Multi-component FM chirp signals decomposition using the FrFT
One important advantage of time-frequency representations (TFRs) is the capability of analyzing non-stationary signals [100][101][102][103][104]. Among these TFRs, it was proven that the WVD is a very effective representation with high energy resolution in both time and frequency [40]. However, with multi-component signals the WVD     To decompose the second FM chirp signal, this process will be repeated as shown in Figure 5.6. In this case the concentrated impulse with the highest FrFT peak value will appear when the optimum order value is equal to 155 . 0 2 − = opt α . After selecting the optimum order for the second chirp, a rectangular window was used to filter out the concentrated narrow pulse of this chirp from the Fractional Fourier domain. Only the peak and two bins on either side of the peak are retained as shown in Figure 5.6-C.

Example 4:
The FrFT-based decomposition algorithm was tested on the real-life bat echolocation chirp signal [122]. The signal is a 400 sample long recording of a bat chirp signal sampled with a sampling period of 7 microseconds. The WVD ( Figure   5.12-A) shows this signal to contain three strong, quasi-linear FM chirp components.

Limitations of FrFT-based multi-component signals decomposition algorithm
In the previous section, we have illustrated the methodology of the FrFT in decomposing multi-component FM chirp signals. We also shows how this algorithm can successfully decompose some non-linear and non-crossing FM chirp signals. The algorithm was also applied on a bat echolocation signal which consisted of multiple chirp signals with approximately linearly changing frequency behavior.
The FrFT-based multi-component signal decomposition algorithm described in  To solve the problem of decomposing multi-component signals composed of nonlinear FM chirp components, we propose to use the principle of warping-based Time-Frequency Representations [105,106,108,109]. In signal processing, warping operators have been used to build Time-Frequency Representations with reduced inner-interference terms [105,106,108,109]. In this chapter, the principle of time warping will be used to linearize the time-frequency support of non-linear chirp signals. Definitions of time warping with all the equations will be given in the next section.
The principle of the proposed warping-based multi-component FM chirp decomposition algorithm is to apply time warping-based Time-Frequency Representations on the multi-component signal to linearize as much as possible each occurrence of a non-linear FM chirp component we want to extract. As a result of the linearization, the corresponding spectrum will have a narrower, higher amplitude and the WVD will show a linear time-frequency structure. The next step is to filter out the spectrum peak using a rectangular window. The last step is to apply the inverse transform, unwarp the filtered signal and plot the WVD of the extracted non-linear FM chirp component, using the original (unwarped) time-frequency axis.
Comparing the proposed warping-based multi-component signal decomposition algorithm to the FrFT-based multi-component signal decomposition, one can say that the general framework of both algorithms is the same. The principle is to filter out in each iteration the spectrum with the higher amplitude corresponding to a "linearized" FM chirp component.
Another drawback of the FrFT-based decomposition is the existence of the innerinterference terms in the time-frequency plots of the extracted component as seen in . This problem appears when the components are non-linear chirps which means that the FrFT-based decomposition cannot reduce the innerinterference terms, which we will demonstrate, is not the case of the proposed time warping-based decomposition.
In the next sections of this chapter, we will explain the new warping function and we will provide some examples demonstrating that the new warping function can work better than the power warping function used in the literature to linearize the time-frequency structure of certain non-linear FM chirp signals. We will also show that the time-frequency distribution of the extracted component using the warpingbased decomposition algorithm exhibits reduced inner-interference terms.

Time-warping principle
One simple and powerful tool used to generate new classes of TFRs is the unitary similarity transformation [105,106,108,109,118]. It permits construction of new TFRs to match closely any FM chirp signal with one-to-one group delay or instantaneous frequency characteristics [105,106,108].
be a signal that is a non-linear function of "t". Mathematically, if ) ( ) ( : is the warping operator, its effect on ) (t x is given by [105,106]: ) (t w is the time-warping function, which is assumed to be a smooth one-to-one function [105], and ) ( , provide examples of useful warping operators [117,118]. Different power classes are obtained by varying the parameter k . The affine class of TFRs [108] is obtained when 1 = k . The th k exponential class quadratic time-frequency representations (QTFRs) are obtained when , the th k exponential power yields the exponential class [109,118]. Using the warping , Iaona Cornel et al. [108] proposed a new method to solve the noise robustness performance of classical high-order methods polynomial signal processing. G. Le Touzé et al. [111] used an axis warping operator to realize mode filtering adapted to guided waves. To avoid the problems due to non-linear timefrequency structures, they used a classical unitary operator [105]. For the th m mode, where C1 is the velocity in the water layer, D the depth guide and R the source-sensor distance, their warping function is Let the signal ) (t x be given by: is the non-negative amplitude, ) (t ϕ is the phase and is the instantaneous frequency. Ideally, the warping function for the FM signal is defined as the functional inverse of its phase [105]: exists and is one-to-one, we can use it to warp the signal in equation (5.11) to obtain the following form: The following FM signal with constant amplitude demonstrates the property of warping operators for time-frequency linearization. Let: be a power chirp signal with the phase power parameter k . The associated warping operator can be defined as [119]: 14) The first-order derivative of the function ) (t w is given by: And the inverse function is define by: is given by: The WVD of the signal ) ( 1 t x is depicted in Figure 5.14-A. The WVD of the warped signal ) ( 1 t W x and its FT spectrum are shown in Figures 5.14-C and 5.14-D, respectively. Figure 5.14-B shows the time-warping function in Figure 5.14-C. This example demonstrates the capability of the warping operator concept proposed in [105,118] to "linearize" the time-frequency content of a power signal.

New warping function
One goal of a warping operator is to linearize the TF behavior of a signal [105,106].
To warp the signal, the time-warping function associated with the monomial of highest order, 7 . 0 t , is as follows: Time  the results produced by warping using one more degree of freedom, the parameter "a", used to "tune" the new warping function to provide an optimal linearization for the given signal. Let's suppose, for the purposes of illustration, that we somehow know what is the best "a" for the next few examples. The optimization of the parameter "a" will be described clearly in section 5.5.2. In Figure 5.16 we assume the parameter "a" is already optimized and it is equal to a=5.321e-5.  Metrics for comparing the performances of the two warping function will be discussed in the next section (section 5.5.2).

Performance analysis and parameter optimization
It is known that an ideal TFR has few inner-interference terms or cross-terms, high energy concentration and good TF resolution and readability [40,123]. One ad hoc way to evaluate cross-term suppression and energy concentration in the TF plane is by visual inspection. Alternatively, to estimate the optimum warping parameter "a" and to compare the performance of linearization (or warping-based TFRs) between the two warping functions = , we propose using numerical metrics such as the Rényi entropy [74,114], the ratio of norm-based measurement [121] and the maximum value of the warped signal spectrum.

Rényi entropy of a TFR
Entropy is a quantative measure proposed for computing the information content of a signal [74]. By the analogy to probability distributions, minimizing the complexity or information in a particular TFR is similar to maximizing the TFR concentration, or peakiness and often, readability [120]. The Rényi entropies make excellent measures of the information extraction performance of TFRs [74,113,114,120]. The Rényi entropy of a TFR ( ) , ) is defined as [74,114]: where σ is the order of entropy. Since the TFR cross-term oscillatory structure cancels under the integration with odd powers, "σ" should be an odd integer value for a general quadratic TFD [58]. In our case the value of the order σ was chosen to be 3 = σ [113,58]. In the case of TFRs, high entropy indicates that the energy concentration of an auto-component is low and vice versa, low entropy indicates that the energy concentration of an auto-component is high. In other words, a highly concentrated TFR reduces the Rényi entropy.

Ratio of Norm
An efficient quantitative criterion to evaluate the performance of any TFR can be obtained by measuring the energy concentration. The Ratio of Norm (RN) based measures proposed by Jones and Parks [121] was used to discriminate a low resolution TFR from a high resolution TFR. The RN was used to adapt the parameters of an analysis window to maximize the measure of concentration created by dividing the fourth power norm of a TFR by its second power norm. RN is given by [121]:

Estimation of optimum warping parameter "a"
The parameter "a" is used to tune the new warping function k t e t w at / 1 ) ( = to provide an optimal TF linearization for a given signal. To find the optimum value of the parameter "a", a one dimensional search was done based upon quantitative measures of concentration. In the search algorithm, warping was computed for different values of "a" along a grid and the value that yields the maximum peak value of the FT spectrum, along with the maximum value of the Ratio of Norm and the minimum value of the Rényi entropy was selected. Using the selected optimum value of "a", it is proposed that the warped signal will have higher energy concentration (higher Ratio of Norm) with minimal inner interference terms (minimum Rényi entropy value) as compared to time-warping using the power function.
To show how the process for choosing the optimum value for the parameter "a" is done, the optimum parameter "a" for the signal selected (from Example 7). Notice that the phase is not a monomial, nor are the powers equal to integers as is typically required for polynomial phase estimation algorithms. For k=0.7, we tested 10 6 values of "a" on an interval ( ) 1 0 ∈ a , using a fine spacing to get a good estimate. If "a=0" we will have the same effect as the power warping function  Rényi entropy and the maximum value of Ratio of Norm was equal to a=5.321e-5.
The same process will be used to select the optimum value of parameter "a" for the other examples discussed in this chapter (Examples 6, 8 and 9). These values are shown in Table 5.4.

Performance comparison
To compare the performance of the new proposed warping function to a traditional warping function, the Rényi entropy, the Ratio of Norm and the maximum amplitude of the FT of the warped signal were calculated for several signals. The results are shown in Table 5.4. Discussing the other advantages of the new proposed time-warping function, we can say that the new warping function offers one more degree of freedom, the parameter "a", which helps to improve the linearization. Moreover, the new warping function can linearize with better performance many types of signals whether the order of the power term in the phase function is either integer or real, which is not the case of the polynomial Wigner-Ville distribution (PWVD) method [112,123]

Ratio of Norm
(higher value is better)

Max(|FT(warped(sig))|)
(higher value is better)  In practical applications, the use of the new warping function requires setting up a search for the parameter "a". In this chapter, we assumed that the order of the signal "k" was known.

Warping-based multi-component non-linear FM chirp signal decomposition
The principle of the new algorithm used to decompose a multi-component signal composed of the sum of non-linear FM chirp signals is based on the concept of warping. In this case, time warping was used to linearize the time-frequency behavior of the non-linear FM chirp components. The result will be a FT spectrum with larger, more concentrated peak amplitudes and a more linear time-frequency structure. After that, a rectangular window will be applied to filter out the peak in the spectrum with the largest amplitude. An inverse transform will be applied.
We already discussed in Example 5 how the FrFT-based decomposition algorithm failed to decompose this signal.
Applying the new warping-based signal decomposition algorithm, shown in Figure   5.19, the different steps to extract the two components ) ( 1 t . The result is a spectrum with larger amplitude. This corresponds to the FT seen in Figure 5.21-A.
A rectangular window of six bins was used to filter the spectrum with the largest amplitude, the result is shown in Figure 5.21-B. The WVD of the extracted component after applying the inverse transform is shown in Figure 5.21-C.
. (E) The FT spectrum after filtering the spectrum peak with larger amplitude. (F) The WVD of the reconstructed chirp signal ) The same process was applied to decompose the second chirp component ) . The result is a spectrum with larger amplitude corresponding to the second signal as seen in Figure 5.21-D. A rectangular window of four bins was used to filter the spectrum peak with the largest amplitude; the result is shown in Figure 5.21-E. The WVD of the new reconstructed signal ) ( 2 t x is plotted after applying the inverse transform as shown in Figure 5.21-F. ; and for the signal , the IF law is: The estimated IF is calculated as:   The mean squared error (MSE) [129]

Future works
The new proposed warping function compared to the power warping function improved linearizing signals with polynomial phase. However, in some cases this linearization is imperfect as seen in Example 9. We also showed how the readability of the unwarped signals is improved with much reduced inner-interferences as seen in Examples 10 and 11. Nevertheless, we mention that this new algorithm requires the and ) (t x C in subplots A, B, and C, respectively. knowledge of the signal power order "k" which characterizes the signals being analyzed.
From Examples 10 and 11, we have shown that in each step in which we want to decompose a component from the multi-component signal, we need to match its highest power order "k" and the corresponding warping parameter "a". This means that there is a relation between the optimization of the time warping and the signal order. Some works have already been done in the literature using the warping function k t t w / 1 ) ( = with unknown k [86]. In their work, the authors used an unsupervised approach based on the Hilbert transform to estimate a non-linear monomial power warping function which characterizes the time frequency content of the signal [86].
Since we already demonstrated with some examples in this work the dependence of the warping function to the order of the signal, we propose in the future to develop new techniques which help to estimate the parameters such as the order of non-linear chirp signals using the warping principle.
In the next example, we will explain some preliminary results to show that the new warping can be a useful technique to estimate the signal order. We already discussed in this chapter that the purpose of the warping the signal was to linearize the time-frequency structure of this signal. This leads to a Fourier spectrum with larger amplitude and a narrow peak compared to other non-linear signals. In the next example, we consider the signal ) ( 2 3 3

Summary
In this chapter, a review of how the FrFT has been used to decompose multi- It was discussed earlier in this chapter that the length of the rectangular window used to filter the peak spectrum with higher amplitude was set depending on the minimum MSE calculated between the original signal and the retrieved signal. It was also assumed that the user knows a priori the number of signal components. However, when one does not know any information about the original signal, this method of determining the length of the window is not practical. So in our future work we will try to automate the selection of this length by finding a relationship between the type of the chirp signal and the width of the window used. Future research also includes applying rank estimation techniques to determine the number of signal components, as well as algorithms to estimate the power parameter, k.

CHAPTER 6
CONCLUSION AND FUTURE PERSPECTIVES

Conclusion
Signal Processing is a topic that was used long time ago and still is used in almost all areas including biomedical and electrical domains. One focus of this dissertation was to develop and test new algorithms for epileptic seizure detection and compare the efficacy of the algorithms in data recorded using conventional disc electrodes (EEG) versus novel tripolar concentric ring disc electrodes (tEEG). These algorithms involved different signal processing techniques which showed that the proposed algorithms are competitive to those in the literature and some of them outperform those techniques.
The design process of the developed algorithms involves the determination of discriminating boundaries between seizure and non-seizure EEG data by parametric representation of seizure EEG in classification algorithms. In this thesis, the first proposed algorithm used time-domain features, such as which were extracted to characterize the seizure from non-seizure segments. The algorithm was evaluated using human and rat data. The results showed the robustness of time-domain features; high accuracy was achieved using the rats' dataset even though the data were highly contaminated with motion artifacts.
It has been shown that entropy can be used as a complexity measure for analyzing biological signals such as brain signals or EEG because the brain is a complicated dynamical system with uncertainties. We have developed a two stage seizure detection technique to classify seizure from non-seizure segments. In the first stage, a signal was decomposed into different frequency sub-bands using either the Empirical Mode Decomposition (EMD) method or the discrete dyadic Wavelet transform (DWT). In the second stage, the Rényi entropy was computed and input into a classifier to discriminate between seizure and non-seizure activities. According to the high accuracy obtained, Rényi entropy combined with EMD or DWT is effective and applicable for discriminating Seizures from Normal EEG recordings.
Because of the non-stationary behavior of EEG signal during seizure activities, The main limitations of our proposed algorithms used for seizure detection can be summarized as follow: (1) The datasets used to prove the performance of the proposed algorithms for seizure detection. For example it was mentioned in chapter 3 that the rats' dataset was minimized into just seizure and non-seizure segments because of the grooming and moving of rats which infect the data by noise. (2)

Future works
In this study, the algorithms proposed for seizure detection were focused on an offline analysis of EEG signals recorded during seizure events. Prediction and detection of seizures from an online recorded EEG can significantly improve the life of patients with seizure by preventing the negative influence of uncontrolled seizures.
Hence, an automated seizure warning algorithm is desired that can analyze the EEG signals directly from patients with the objective of predicting the appearance of seizure during real time recording.
To analyze non-linear chirp signals, the FrFT equation for linear chirp signals has been adopted. In this case the FrFT spectrum of non-linear chirps were spread out which means more bins have to be retained on both sides of the FrFT peak. Future works will consider how to adapt the FrFT for non-linear chirp signals.
It was discussed in chapter 5 that the length of the rectangular window used to mask out the spectrum with higher amplitude was set depending on the minimum MSE calculated between the original signal and the retrieved signal. However, when one does not know any information about the original signal, this method of determining the length of the window is practical. So in our future work we will try to automate this length by finding a relationship between the type of the chirp signal and the width of the window used. Future research also includes applying warping-based analysis to estimate the signal order, k, and the number of signal components, n, as well as finding new warping operators that can exploit the Lambert-W function.