MODELING BRAIN DESYNCHRONIZATION BY EEG SENSOR VARIANCE IN EPILEPTIC PATIENTS

The use of electroencephalogram (EEG) for predictive purposes of seizures in epileptic patients has grown steadily with the access to greater computing power. Methods of seizure analysis to date have focused on modeling and computer aided machine learning to help increase sensitivity and specificity of seizure detection. Brain synchronization between various areas of the brain at the onset of seizures tends to be a common feature of seizures, followed by a resynchronization of the various brain areas at the end of seizures. While previous methods have looked at the cross-correlation or lag-correlation of only two areas the brain, most EEG data these days has a vast array of sensors that can easily exceed 15-20 areas of the brain. The goal of this research is to take a relatively new approach to statistical modeling of multivariate EEG data, by use of the variance of multiple sensors as an extended measure of brain desynchronization in a time series format. Use of the Children’s Hospital Boston and Massachusetts Institute of Technology (CHB-MIT) scalp database from PhysioNet is used to demonstrate the potential e↵ectiveness of modeling multivariate EEG data by assessing the overall variance between sensors for three young patients with intractable seizures by use of a DCC-GARCH model, Bayesian regime switching mixture model, and a Bayesian change-point model. ACKNOWLEDGMENTS I would like to thank my wonderful advisor Dr. Gavino Puggioni for all his time and patience over the past years. Without his insights this research wouldn’t have been possible. Thanks also to my committee members Dr. Natallia Katenka and Dr. Kunal Mankodiya for their resourcefulness. Special thanks to Dr. Walter Besio for his EEG knowledge and continual assistance. Thanks to my Behavioral Science advisor Dr. Lisa Harlow for her years of support and for being my Statistics defense chair. Last, but certainly not least, thank you to Dr. Liliana Gonzalez for accepting me into the Statistics department and for her invaluable input and support.


LIST OF TABLES
Epilepsy is a prevalent disease, with the World Health Organization (WHO) estimating its toll to be around 50 million individuals [1]. A seizure itself is classified as overabundance of neuronal electrical activity in the brain resulting in moderate to severe shaking along with the potential loss of consciousness and excrement control [1]. The obvious impact that a seizure can have on an individual is not only one of physical concern, but also that of mental state and well-being.
While individuals can have independent seizures at one point and never have them again, patients with epilepsy are those that su↵er more than just one seizure in their lifetime, and in most cases more than one in a day itself when not properly treated [1].
Intractable seizures can occur in patients with epilepsy, meaning they do not respond to medication based treatment. It is estimated that about 70% of patients with epilepsy respond to medication, however this also means approximately 3 in 10 patients do not respond [1]. Surgical options are then considered, with implanted sensors in the brain to determine the actual area of seizure activity in the brain to be resected [2].
Commonly, scalp electroencephalogram (EEG) is used to measure the electrical brain activity at various areas of the brain simultaneously. A common EEG recording of many sensors simultaneously is displayed in Figure 1.1 [3,4]. Recent work has extended the use of EEG recordings to ambulatory monitoring, which allows patients to go about their daily routines without being wired to a machine [2,4,5,6]. Real time monitoring of EEG allows for a much larger amount of available data as patients can wear the ambulatory headsets throughout the day doing their normal routines. Post-hoc time series analysis of the ambulatory EEG in recent years has led to many alternative approaches to the analysis of seizures and their onset [7]. The use of battery powered ambulatory devices for monitoring EEG as it occurs or in a post-hoc fashion allows for a much richer and more complex amount of data [6]. One common study for example typically uses 22 sensors recording simultaneously with a resolution of 256 Hz for one hour, a total of 921, 600 data-points for each of the 22 sensors is available for analysis, or 20, 275, 200 data-points for one hour [3,4]. Supervised multivariate machine learning algorithms have dominated current research in seizure analysis, however the use of power required for a real-time analysis needs to also be considered [5,6,7]. Bayesian modeling approaches have also been applied to EEG recordings with focus on modeling the EEG behavior versus computer extracted decisions based on maximum likelihood estimation [8,9].
For consistency, monitoring of EEG recordings are referenced by scalp locations as defined by the international 10-20 system of electrode placement [10,11,12]. Figure 1.2 highlights the areas located within the 10-10 system. It is worth noting that not all areas are necessarily included, as EEG electrode set-ups vary in number and positioning. The 10-20 system is a necessity for consistency in noting electrode placement. For example, the CHB-MIT scalp database utilizes almost two dozen electrodes [3,4] while other studies have used varying sizes of electrodes [7] and configurations [13]. The use of EEG recording allows for not only time series analysis of multivariate data, but also allows for the potential use of spatio-temporal monitoring [8,13]. Location of seizures is another issue in itself, one of importance, however the actual modeling of seizures is still a developing field. The benefit of the 10-20 system does allow for not only the ability of analysis across time, but the inclusion of spatial location can also be added.

Brain Synchronization
A common feature of seizures is the desynchronization and resynchronization of the synaptic activity in the brain [14,15,16,17]. Typically, the electrical activity of the brain will show high levels of desynchronization at the onset of a seizure followed by strong resynchronization of the signals to end the seizure [17]. The scalp electrodes of the EEG monitor the synaptic activity in the neurons of the brain with a minimum of 108 neurons and 6 cm 2 necessary to present an EEG sensor response [15]. Some research has suggested a strong synchronization between various areas of the brain potentially hours leading into a seizure, however this seems to only be extrapolated when knowledge of a seizure occurring is also present [16].
Measuring levels of neuronal synchronization can either be measured in regards to a set threshold or across time, which can thus be interpreted as increasing or decreasing trends [17]. Because of the cortical changes in synchronization that are many times present in EEG recordings, it o↵ers a strong potential for seizure location and detection [14].
Typical approaches to addressing brain synchronization have focused on a correlational aspect of two sensors. A common measure is called mean phase coherence R , or similarly called first Fourier mode and phase locking value [14].
Another method is to utilize the correlation matrix of the normalized sensor data at each time point and analyze the eigenvalues of the matrix across time [16]. Other autoregressive measures and transformations have been utilized, but generally are only beneficial when the knowledge of an ictal state occurring is included in a post-hoc analysis [16].

A Look Forward
Surely, investigation of EEG recordings to help monitor and potentially prevent seizures in epileptic patients has a tremendous potential impact on the health of such patients. Pre-ictal seizure states have been a relative mystery, but current increases in data and computing ability have allowed for a much more thorough focus on seizure prediction. Modeling EEG data o↵ers many benefits, challenges, and possibilities, particularly because of the massive amount of data combined in every patients record. Various multivariate and univariate analytic methods have shown predictive potential, as have frequentist and Bayesian approaches. Melding these ideas into a solid modeling approach that has the potential of low-power usage ambulatory monitoring and care becomes a main goal for any seizure based research.
Use of the Children's Hospital Boston and Massachusetts Institute of Technology (CHB-MIT) scalp database from Physionet [3,4] is accessed to demonstrate the potential e↵ectiveness of modeling multivariate EEG data by assessing the overall variance between sensors for young patients with intractable seizures. Recent articles using the same data set have addressed the question of real life practical use of EEG analysis and power reduction approaches to make acquisition and possible prevention of seizures more plausible [5,6]. Several other articles recently have addressed the CHB-MIT dataset through machine learning, with good predictive success but at the cost of lower potential ambulatory usage [2,18]. What lacks in the research of this dataset is a thorough statistical modeling approach to the CHB-MIT database, particularly with the goal in mind of being less computationally expensive and more practicality based.
By focusing on the idea of brain desynchronization through modeling EEG sensor variance, it is anticipated that predictions will display both a strong sensitivity and specificity to detecting seizures before they actually occur. Time series modeling will be utilized to delineate the processes preceding the seizure onset.
Early detection of seizures is useful in the case of patients such as those in the CHB-MIT database, who have not responded to medication and could benefit from the use of implanted devices to counteract seizures before they occur. To do this, natural patterns of the levels of synchrony in the brain will be analyzed

ARCH & GARCH Models
The use of multivariate measures with a large amount of data points can be tedious, however the dynamic conditional correlation generalized autoregressive conditional heteroscedasticity (DCC-GARCH) model allows for a stream-lined approach to the task. Initially, the autoregressive conditional heteroskedasticity (ARCH) model was introduced to deal with volatile financial data [5]. The ARCH(p) model relates a model with mean zero to it's autoregressive volatility when the assumption of normality is met s.t.: where t is the information available up until time t and h t represents the variance function of the data, which can also be stated in terms of p autoregressive parameters estimated as ↵: with p ARCH model parameters are estimated using maximum likelihood, which results in the values for ↵ that best optimize the model.
The ARCH model was further extended to the generalized autoregressive conditional heteroskedasticity (GARCH) a few years later, which allowed for not only autoregressive parameters p, but also moving average parameters q in an ARMA modeling approach for the error variance [6]. The stochastic GARCH model also assumes normality and is specified as: The GARCH(p, q) model can also be expressed in another variation as follows [6]: where ⌘ t ⇠ N(0, 1) and ⌫ t is uncorrelated across time with a mean of zero. The latter GARCH(p, q) expressions allow for a time series ARMA model for y 2 t with orders of m = max (p, q) and p.

Multivariate DCC-GARCH Models
A multivariate version of the GARCH model relies on the conditional correlation of the matrix of values correspondent with time. The DCC-GARCH model [7] focuses on the time-varying covariance matrix H t such that: p h i,t and h represents the univariate GARCH models. R is the conditional correlation matrix: where ✏ t = D 1 t r t and ✏ represents standard normal disturbances. In the DCC model, the correlation matrix R is allowed to vary with time and conditional variances must summate to unity [7]. The correlation matrix can be represented as a GARCH(1,1) model [7,8] as: where the expected cross-product is⇢ i,j and the variances summate to 1. The correlation can be estimated as: and the expectation is positive definite since the covariance matrix Q t is a weighted average of positive definite and semidefinite matrices. A requirement of ↵ + < 1 for the model to not become explosive and reman stationary must be met [7,8].
The DCC estimation is computed through a maximum likelihood procedure.
The extension of the DCC-GARCH by Engle [7] helps to attain a more parsimonious (yet still very high) number of parameters. The multivariate GARCH is an extensive of the univariate GARCH proposed by Bollerslev [6]. Use of autoregressive (AR) and moving average (MA) parameters can also be included to the univariate measures to help increase multivariate model fit, however adding even more parameters to the model.

Basics of Bayesian Inference
Often in Bayesian modeling we need to simulate from the posterior distributions. In contrast to frequentist statistics, Bayesian statistics focuses on proba-bilities with given information and prior beliefs that can either be strong when evidence supports it, or weak when there isn't as much convincing evidence available. Use of Bayes' rule can be applied to updating probabilities of interest with the addition of new information [9]. Iterative simulation methods are used to approximate the joint posterior distribution by drawing samples from the full conditional distributions via processes such as a Gibbs sampler [9].

Inference of the Conditional Posterior Mean
Bayesian analysis can be utilized to determine posterior probability densities by use of Monte Carlo simulations which su ciently approximate the actual posterior density when enough consecutive iterations (i) of sampling provide convergence. For example, estimation of a parameter ✓ in a sample of y 1 , . . . , y n from the distribution ⇡ (y 1 , . . . , y n | ✓) can be achieved via Markov Chain Monte Carlo (MCMC) methods [9]. For independent iterations i of sampling, the posterior distribution for ✓ is simulated s.t.: where increased values of i give a better approximation of the true posterior distribution [9]. Posterior joint inference of the mean when conditioned on the variance can be estimated for a normally distributed model with mean ✓ and variance 2 as follows when (y 1 , . . . , y n | ✓, 2 ) ⇠ i.i.d. N(✓, 2 ): From the above equation derives a su cient statistic based on ( P y 2 i , P y i ) which can also be transposed to (ȳ, s 2 ) when we knowȳ = P y i /n and s 2 = P (y i ȳ) 2 / (n 1) [9]. Posterior inference for ✓ conditioned on the known value of 2 with a conjugate prior for ✓ will result in: when the terms of exp ⇥ c 1 (✓ c 2 ) 2 ⇤ are normally distributed we can then surmise that ⇡ (✓ | 2 ) is an appropriate conjugate prior [9]. If ✓ ⇠ N (µ 0 , ⌧ 2 0 ) then: when the following are substituted: From this a normal density can be extrapolated for ⇡ (✓ | 2 , y 1 , . . . , y n ) s.t.: ⇡ ✓ | 2 , y 1 , . . . , y n / exp where 1/ p a represents the standard deviation and b/a the mean in terms of a normal distribution, and thus ⇡ (✓ | 2 , y 1 , . . . , y n ) is normally distributed where: The data and prior information are combined to form the joint posterior precision s.t.: Similarly the posterior mean can be determined from the amalgamation of the data and the prior: If there are  0 prior observations, the prior precision could be adjusted to Based on this, the posterior mean would then be: Prediction of a new pointỸ from previous observations y 1 , . . . , y n based on the normal posterior distribution thus follows: Since ✓ and✏ are conditional on the normally distributed data in y 1 , . . . , y n , it follows then that:

Joint Conditional Inference of Mean and Variance
In its simplest form, the Bayes rule for the posterior of the joint distribution of ✓ and 2 states that: The axioms of probability for a joint distribution also state: when 2 is known and ⌧ 2 0 = 2 / 0 where µ 0 and  0 are the mean and sample size from the prior observations, we can then state: Use of the gamma distribution is common for the prior of the precision, or 1/ 2 .
When supplied with data y 1 , . . . , y n and 2 , the conditional distribution of ✓ becomes: ✓ | y 1 , . . . , y n , 2 ⇠ N ✓ µ n , 2  n ◆ when: The posterior of 2 can be derived mathematically as: If thus follows that: where:

Gibbs Sampling for Bayesian Posterior Inference
Iterative Monte Carlo methods can be used to estimate posterior distributions, with convergence occurring after I total iterations. Gibbs sampling is a commonly used Monte Carlo approach to estimating the full conditional mean ✓ and precision 1/ 2 from data y 1 , . . . , y n and any supplied prior information [9,10,11,12]. Assuming we have parameters i at a current iteration i of the Gibbs sampler, we can infer ✓ conditional on 1/ 2 and 1/ 2 conditional on ✓ as follows: The flow of a Gibbs sampler follows a Markov Chain Monte Carlo (MCMC) series of steps where p estimated parameters from at time i are related only to the parameters at i 1 iteratively through: , . . . , , . . . , , . . . , The Gibbs sampling steps produce sequentially dependent vectors which are independent for non-consecutive vectors based on the MCMC properties. The vectors are: The sampling distribution of (i) becomes closer to the target joint distribution when i gets larger. This can be stated when i ! 1 as: and can be expanded to a function g as I ! 1: Thus the mean of the complete set of iterative samples is a good approximation of E [g ( )]. It is important to note that Gibbs sampling is an approximate description of the posterior distribution, not a form of model or the exact integral [9]. In other words, Gibbs sampling is a MCMC approximate of ⇡ ( | y 1 , . . . , y n ) s.t.: 1 I For the observations y 1 , . . . , y n , a probabilistic observation can be made for the latent state variables in S which depend only on the state specific parameters of ✓ and 2 in S 1 , . . . , S K [13]. Use of the Bayesian Gibbs sampler can iteratively approximate the distributions for and thus we will also need a prior transition probability ⇠. The priors for must meet the following:
3. Prior transition beliefs can allow e p > e q if deemed appropriate.
Bayesian posterior analysis for a Markov switching model based on y 1 , . . . , y n from the unconstrained space A results in a posterior of P A (• | y 1 , . . . , y n ).

Dirichlet Prior
Markov switching models have been defined to estimate the hidden states S = where ↵ j = ↵ (j,1) , . . . , ↵ (j,K) 0 and j = 1 : K. Each i samples ⇠ from j = 1 : K and N (j,K) (S) represents the number of state transitions from S j to S k in S s.t.: 3. Sample the value of S (i 1) from S | ⇠ (i) , ✓ (i) , D T which is accomplished by a forward filtering backward sampling routine (FFBS) [14,15]. The algorithm is as follows: and sample S (i) (b) Sample latent state at time t for values of t = (T 1) : 1 from (1:T ) }.

Label Switching Problem
Because the state variable S is dummy coded, label switching poses a potential issue in the Gibbs sampling for Markov switching models. Use of a random permutation sampling (RPS) scheme has been advised to explore the full sampling space and determine if there is adequate separation for K [13,14]. At each step of the Gibbs sampler, a RPS step can be added as such: 1. Randomly select a random permutation ⇢(1) . . . ⇢(K) of S at the current iteration i. Apply the random permutation s.t.

Random Permutation Sampling
Following the use of an unconstrained RPS sampling scheme with proper separation of K, a constrained sampling step can be included at each iteration i of the Gibbs sampler. A common approach is to relabel the indicators S with an identifiability constraint s.t. (✓ 1 < ✓ 2 < . . . < ✓ K ) [13,14]. Gibbs sampling with a constraint to prevent label switching can be done through the standard Gibbs sampling procedure including the following at each i: 1. If the identifiability constraint on ✓ is violated, reorder the values of (✓ 1 , . . . , ✓ K ) as ordered pairs (1, ✓ 1 ) , . . . , (K, ✓ K ) where the first component of the pairs represents the new labelling of the classes so that the identifiability constraint is now met.

Bayesian Change Point Modeling
Another Bayesian adapted approach is to that of change point analysis.
Simply stated, change-points are those in which the distribution suddenly changes at a given time or multiple times. An example would be the change in state from pre-seizure to seizure and back to post-seizure. Gibbs sampling can again be used to estimate the posterior distributions of the parameters of the model, which now includes the change points [16]. Assume we have a partition ⇢ = (i 0 , i 1 , . . . , i b ) s.t.: and ✓ i = ✓ ir , i r 1 < i  i r for r = 1, 2, . . . , b with change points value at i 1 + 1, i 2 + 1, . . . , i b 1 + 1. Notation of time i + 1, . . . , j can be notated as ij and the observations from the series as Y ij [16]. It is assumed that the points in Y ij are independent and follow a distribution s.t.: where W 0 and W 1 are within-block sums of squares (SS) while B 0 and B 1 represent the between-block SS for blocks U i = 0 and U i = 1 from ⇢. The values of and represent tuning parameters bounded by 0 and 1 to assist in model convergence when necessary [16,17]. Because of potential divergence within a long series, the probability of a change point at i can conversely be calculated as: which are incomplete beta integrals and will be stabile for longer series of observations in Y [16].
The Gibbs sampling routine can be achieved for change point models with finite variables U and V which have a joint density of f (u, v) and conditionals of With starting values of U 0 and V 0 we can simulate new states for U i and V i derived from U i 1 and V i 1 as follows: Assuming the states have a stationary distribution with density f (u, v) it can then be supposed that V 0 has the proper marginal density and U 1 has the proper conditional density given V 0 , thus U 1 and V 0 have a proper joint density. It follows that U 1 and V 1 have a proper joint density which mimics U 0 and V 0 and therefore f (u, v) is a stationary distribution and is unique if there is a finite chain of positive probabilities between any two change points. In reference to (u, v) and (u 0 , v 0 ), there must be a positive probability sequence where With a unique stationary distribution, the sampled points U i and V i only need the density f (u, v). In a model with n variables in U 1 , . . . , U n where the transition probabilities between the change points are positive, a sample is drawn from f (u 1 , u 2 , . . . , u n ) with initial values of u 0 1 , . . . , u 0 n and following transitions one at a time conditioned on the other variables. In general, U i k is sampled from the density f (u k | u i 1 , . . . , u i 1 k+1 , . . . , u i 1 n ) [16]. In terms of partitions between change points with n observations, we can set We are left with a transition U i of 0 or 1 based on the probabilistic nature of the partitions f (U 1 , . . . , U i 1 , 1, U i+1 , . . . , U n 1 ) and f (U 1 , . . . , U i 1 , 0, U i+1 , . . . , U n 1 ). [16].
The Bayes estimate is used to assess the change point probabilities in a Bayesian framework. A sweep through the data in Y iteratively draws a value from the distribution U i for every i in the sweep given the data and the values of U j , j 6 = i. With b blocks and i 2 and U i = 0, it follows that: where W 0 and B 0 are the within and between block SS respectively when U i = 0.
In a similar manner, W 1 and B 1 represent the corresponding SS when U i = 1. We can then calculate f (U i = 1 | Y, U j , j 6 = i) and sample U i [16].
For each sweep of the observations the posterior mean (µ r ) can be determined from the partition ⇢ and Y when r 2 ij 2 ⇢ s.t.: For M sweeps through the data, the mean of M estimates is a valid approximation of µ i when given the data in Y . The similar idea is applied to calculate the posterior mean of 2 for M passes through the data. It has been suggested that 50  M  500 gives a fair approximation of the posterior of µ and 2 [16].
Bayesian change point analysis has been utilized in biostatistics to model zeromean heteroskedastic data such as that in Electromyography (EMG) data [18].
The use of change point analysis in EMG seems to logically suggest its application in EEG data might also be warranted, especially with the change of seizure and non-seizure states.

CHAPTER 3
Modeling EEG Data

CHB-MIT Scalp Database
To demonstrate the use of a DCC-GARCH model and to be followed by modeling of the univariate measure of sensor variance, the CHB-MIT scalp database is accessed from PhysioNet [1,2]. Recent articles using the same data set have addressed the question of real-life ambulatory use of EEG analysis and power reduction approaches to make acquisition and possible early detection of seizures more plausible [3,4]. Several other articles recently have addressed the CHB-MIT dataset through machine learning, with good sensitivity and specificity but at the cost of lower potential ambulatory usage [5,6]. What lacks in the research of this dataset is a thorough statistical modeling approach to the CHB-MIT database, particularly with the goal in mind of being less computationally expensive and more practicality based.
Three select patients and clips from the CHB-MIT scalp database are used for demonstration and analysis going forward (chb05.13, chb08.02, and chb22.38).
All three of the clips are one hour in duration and follow the International 10- data, and thus the use of a univariate summary measure of the data certainly has its benefits. Because of the high resolution and autocorrelation of so many data points, one point per second will be utilized accordingly, spaced equally 256 points apart from start to end, leaving 3, 600 total points in each clip to sequentially assess sensor variance as well as for use in DCC-GARCH modeling.  Patients chb05, chb08, and chb22) were ages 7, 3.5, and 9 respectively at the time of data collection. Patients chb05 and chb22 are both female whereas chb08 is male. The ages of these individuals seem beneficial in that they are much less likely to have seizures influenced by other sources such as alcohol, tobacco, or drugs as an older teenager or young adult might potentially have. The CHB-MIT database is very large and rather complex with hundreds more seizure and nonseizure clips available for future analysis. Some clips are longer than one hour and others contain multiple seizures within one clip. The three clips chosen for this analysis are all analogous in the sensors used, the fact that only one seizure occurs during a one hour clip, and the seizure locations aren't too close to the beginning or end of the clips.
Analysis of the individual sensors is commonplace in EEG research, as well as assessing di↵erences between various spatial areas of the brain. Use of the international 10-20 EEG sensor placement in the CHB-MIT scalp dataset allows for varied sensors that can highlight changes in brain synchronization at the onset of a seizure. Thus use of multivariate measures of the sensors is beneficial, and will be examined more in depth with the use of DCC-GARCH soon. Figure 3.2 displays three individual sensors in the rostral area of the brain that go from left to right from clip chb08.02.    all three sensors for the 60 minute clips.     In a similar manner, Figure 3.5 displays the variance of the sensors, the logarithm of the sensor variance, and the densities for clip chb08.02. Figure 3.6 presents the same items for clip chb22.38. All three of these clips seem to suggest a mixture of normal densities, where the seizure data points follow a di↵erent mean and standard deviation than the non-seizure points. This knowledge will be helpful as we start modeling mixtures later in the chapter. The sensor variance can also be predicted by a DCC-GARCH model and will help serve as a validation for the univariate measure of sensor variance derived from the multivariate EEG data available for each clip. This potential univariate measure of the sensor variance will be discussed soon, and o↵ers a potentially strong way to e ciently and computationally downsize EEG data while still keeping the overall integrity of the data.

DCC-GARCH Modeling
A relatively unused approach to modeling EEG sensor data is the use of a DCC-GARCH model. The multivariate DCC-GARCH can not only serve as a method all in its own, but can also help serve as validation for the univariate mod- The DCC-GARCH in this research will assess all 22 sensors during each one hour clip, with a one point per second specification as will be used in the forthcoming Bayesian models. The major di↵erence is that the DCC-GARCH will not use the sensor variance at one measure per second, but will utilize the actual standardized EEG recordings. Thus there will be 22 sensors modeled at a total of 3, 600 data points, or correspondingly one point for every second.
Specification for DCC-GARCH models allows for varied parameters of not only the ARMA(p, q) parameters but also for the GARCH(p, q) model parameters.

Use of the Akaike information criterion (AIC) and Bayesian Information Criterion
(BIC) values will be used for the models to determine the best model fit along with the most parsimonious fit. Typically, GARCH(1, 1) models fit volatile data best, so the GARCH parameters will be set at (1, 1), however the autoregressive and moving average time series parameters will be varied to find the best model fit. Therefore each clip analyzed will be run with a GARCH (   Model diagnostics are presented in Figure 3    The covariance matrices of the actual and predicted model can be plotted using a heat-map image to help visualize the matrices for chb08.02 (see Figure   3.8), with brighter colors closer to white representing higher covariance values in the matrices. To visually aid the di↵erence between the actual and predicted covariance matrices for the sensor variance, the absolute value of the di↵erence between the matrices is also presented in Figure 3.8. Ideally, we'd like to see darker colors in the absolute di↵erence of the covariances, since the model should be a relatively close predictor of the actual covariance of the 22 sensors and we'd expect values closer to 0, which are represented as dark colors in the image plots.  Table   3.2. The values for ! (see Table 3.3), ↵ (see Table 3.4), and (see Table 3.5) are also presented. The parameter estimates are also presented with standard errors (SE), t-values, and corresponding p-values for all of the sensors individually based on the DCC-GARCH model used.   an ARMA(0,0)-GARCH(1,1) DCC model for chb22.38 are illustrated in Figure   3.11 and the corresponding covariance matrices are displayed in Figure 3.12. For compactness, the parameters of both models will be excluded.

EEG Sensor Variance
Common measures of EEG data in regards to synchronization, desynchronization, and re-synchronization in regards to seizures have focused on the correlations between only a few sensors [7,8,9,10]. It could then be expected that the patterns of (and lack thereof) synchrony could be seen in a much larger spectrum covering more spatial area in the brain by utilizing the variance of the sensors at each time point. The variance (s 2 ) addresses the amount of variation with a vector of sample values commonly as: where n represents the sample size and y i represents the data in y 1 , . . . , y n with a sample mean ofȳ. This gives an unbiased variance of the sample of values which is a valid representation on the expected population variance ( 2 ) of: Iteratively, one could now take each r in M as an independent sample of the EEG state at a given time point. Let Y 0 be a storage vector of size r where: This is the same as the sensor variance plotted in Figure 3.5 for chb08.02.  The obvious benefit of this approach is that it can highlight desynchronization more simply from the complex multivariate measures available. Figure 3.13 displays a commonplace occurrence in not only the clip used for this example, but for many if not most of the seizure clips in the CHB-MIT database, with a strong persistence of sensor variance primarily during the seizure. The persistence of the variance to stay away from 0 seen in Figure 3.13 suggests that the variance between the 22 sensors is consistently larger than it is during the non-seizure points. Some occasional rises in variance with notably high peaks in the non-seizure portions can also be seen, but is presumed to be movement of the headset or other potential non-seizure phenomenon such as a sneeze that might cause some disturbance. The long term persistence during the 171 seconds of the professional marked seizure occurrence is definitely stark and quite easy to view simply by eye.
As with any time series based data, it is important to observe the autocorrelation between the time points. Figure 3.14 shows the autocorrelation function (ACF) of the univariate variance measures extracted from the clip chb08.02. Because of the use of variance, all values are lower bounded by 0 and thus a logarithmic transformation can be applied to help make the truncated data more Gaussian based (see Figure 3.15). Use of the log of the variance will allow for a mixture of normals to be utilized, which would not be advised for the original variance of the sensors as seen in Figure 3.14 because of the heavy density of the data at and near 0. Future avenues could utilize a mixture of Gamma distributions however this research will opt to work with normal mixtures based on the logtransformation of the sensor variance.  Notably the density appears to be bimodal with the right hand side of the density in Figure 3. 16 showing a slight increase in the density when compared to its left hand equivalent, a result of the seizure sensor variance persistence.  The box-plots of the points from clip chb08.02 in Figure 3.17 highlight the one extremely low point which can also be seen in Figure 3.15, however the rest of the points are in general well distributed. There are a handful of potential outliers in the non-seizure box-plot that are higher peaks in the data, which again are not actual seizure points but movement noise. Notably, all of the seizure points fall

Markov Regime Switching Mixture Models
The initial analyses for the three clips (Figure 3.4-3.6) have so far suggested two component models would be adequate, which fits ideally with the idea of a non-seizure and seizure based model. Furthermore, the two states will be referred to in a binary format as 0 (non-seizure) and 1 (seizure). Modeling of the two states or regimes, can be done by assessing the point probabilities of being a seizure and approximated with a Gibbs sampling algorithm. A prior for the Dirichlet distribution will be utilized and the number of transitions across the 3, 600 sequential time points will be needed. The transitions for a standard model require four prior values as well, that of transitioning from non-seizure to seizure (01), seizure to non-seizure (10), and staying in a non-seizure (00) or seizure (11)  Unconstrained random permutation sampling (RPS) for all three clips are highlighted in Figure 3.18. Allowing the dummy coded 0 and 1 labels to randomly fluctuate and plotting the two µ values as the x and y axes helps determine if there is proper separation in the mixture densities. Because of the low frequency of seizure points, the separation of the points in Figure 3.18 isn't terribly exaggerated, however it is clear there are two components (k = 2) at work. This is a good indicator that a constrained RPS should be used. Therefore, the Markov regime switching models will be run with a constraint for the two component means s.t.
Each of the three clips were run with a Gibbs sampler of 20, 000 iterations and a burn in of 2, 000 which removes some of the instability that can occur at the initialization of the model. To remove the autocorrelation of the sampling, a thinning of 1 point per 18 will be used to generate 1, 000 total points to be analyzed. The constrained permutation sampling will require that µ 0 < µ 1 for all iterations of the sampler.  research. An added benefit of the use of Bayesian modeling is that further research can better adjust the prior information going into the model, thus enhancing the model even more.

Change Points Models
Gibbs sampling was used to determine the most likely change points in the models, or where the state goes from non-seizure to seizure (change point 1) as well as from seizure back to non-seizure (change point 2). The iterations of the model converge relatively fast compared to the Markov regime switching models, and thus only 1, 000 iterations are assessed after a burn-in of 100.
Because of the specific application of these clips, only two change points will be determined, however initial starting points are necessary as well. To keep good separation of the two change points but to also prevent starting them at the very beginning and end, a moderately informative starting point for the change points is utilized. Change point 1 was set at 1, 000 and change point 2 was initialized at 2, 600 based on the 3, 600 possible points available. Again, convergence can occur quickly, however poor starting points for the change points can cause some convergence issues.
The Bayesian priors for the model were set at µ 0 = µ 1 = 0 for the mean and      The Markov regime switching model was the first univariate Bayesian analysis presented. Results for all three clips were again very promising. The initial step was to use unconstrained RPS to help verify that k = 2 was adequate for the models. Figure 3.18 suggests that a mixture of two components is valid and thus the constraint of µ 0 < µ 1 was employed, meaning the mean of the non-seizure points should be less than the seizure points. This also makes sense conceptually as the plots of the logarithms of the sensor variances showed the seizure points as markedly higher than the non-seizure points (Figures 3.4-3.6).
Because of potential autocorrelation in the Gibbs sampling for a Markovian model, a large amount of iterations (20, 000) were used with a burn-in or (2, 000) and a thinning of only one point per eighteen. A total of 1, 000 samples could then be used to calculate the point probabilities for all 3, 600 time points based on their averages as displayed in Figure 3.19. Clip chb05.13 had very strong predictive power during the entire seizure and very few large spikes elsewhere. Clips chb08.02 and chb22.38 also had good predictive power during seizures, however they only neared 0.8 whereas chb05.13 was very close to 1.0. Also noticeably di↵erent is the higher probabilities for the non-seizure points of the two bottom clips in Figure   3.19, while chb05.13 had most non-seizure points much closer to 0. This suggests that the models behave slightly di↵erently, yet the main results would suggest that the seizures generally occurred when the point probabilities exceeded 0.6 with a few exceptions.
Finally, the change point analysis utilized the univariate sensor variance similar to the Markov regime switching model. The goal of the change point analysis was to determine the most likely change points in the models without the model actually knowing the change points. Figure 3.20 highlighted the predicted and actual change points of non-seizure to seizure and seizure to non-seizure for all three clips. The first change point, indicating non-seizure to seizure state, was fairly accurate for all three models. The second change point, representing the change of state from seizure to non-seizure, was not as promising for all three clips. What is worth mentioning is that even in the Markov regime switching models (see Figure   3.19) there is a still a slightly higher predicted seizure probability than directly before the seizure starts. This could in turn be reflected by the change point analysis, particularly with the second change point occurring later than the actual change point for all three models.
All three models showed fair to good results in modeling seizure location for the clips presented from the CHB-MIT database. The upside is that the multivariate DCC-GARCH supplied su cient modeling of the data without summation, and also gave credence to using the sensor variance as opposed to all 22 channels simultaneously. The Bayesian Markov regime switching model showed a strong capability for modeling the sensor variance and could be a potentially useful tool for online seizure detection. Because clips chb08.02 and chb22.38 had higher point probabilities in general, a threshold of perhaps 0.5 or 0.6 could be very useful as a signal detector for the start of a seizure. In terms of implanted EEG shocks to prevent seizures, a small probability spike in the clips periodically wouldn't do any harm to the patients. Another option would be for a threshold of several seconds worth of 0.6 or higher probabilities to prevent sporadic electrical stimulation. Clip chb05.13 had low to almost 0 point probabilities for the pre-seizure and postseizure areas with the probabilities during seizure reaching almost 1, thus even a threshold of say 0.6 would still be applicable.
The Bayesian change point models required a little more precision with the initialization of values and could certainly be a potential hinderance to online usage. Not to mention the knowledge of there being a seizure in the clip and thus two change points was assumed and most certainly wouldn't always be the case.
That withstanding, the change points from non-seizure to seizure were very close to the actual seizure start, which could certainly help in an ambulatory electric shock situation where the beginning of the seizure is much more important than the end. The change points from seizure to non-seizure were delayed quite a bit when compared to the actual end of the seizure, however there seems to be some carryover persistence in the sensor variance directly after the seizures. This can also be seen in the Markov regime switching models, where the point probabilities have a slow decline back to the approximate area of the non-seizure points, particularly for clips chb08.02 and chb22.38.
Overall, in a post-hoc manner, the DCC-GARCH models might actually be a valid approach if an EEG researcher is willing to work with a large amount of parameters and slow computation. For a faster and more particularly ambulatory type use, the Markov regime switching model seems to be more logical and demonstrated better results than the Bayesian change point models. Time wise, the regime switching models were calculated the fastest of the three models.
While the change point models are more simple in scope, they require constant back and forth point by point iteration which caused for slower computation than the regime switching models. The Markov regime switching models did converge rather quickly, and it is worth noting the models could be run e ciently without 20, 000 iterations and thinning if needed.

Future Directions
The goal in this thesis was to assess several models and there potential application to ambulatory EEG usage. The models were all run in a post-hoc fashion with knowledge of there being a seizure, so this research was of a potential direction for models that could be used in the future. Modifications to all of the models would be necessary to apply the techniques into a realtime environment, but they certainly could be possible.
Primarily, the Bayesian Markov regime switching model might have the best application potential. What would be needed is a particle filtering routine that constantly updates the probabilities with the information being collected and the information already collected [1]. A threshold would also need to be applied to determine when the probabilities are great enough to cause electrical stimulation if that is the purpose of the ambulatory research.
For research interested in ambulatory online EEG data collection without treatment of stimulation, the regime switching model could be highly beneficial.
Again, a particle filter type routine would be necessary to replace the FFBS al-gorithm used in this study. While prior information is needed for the Bayesian models, it would be assumed that those parameters could be fine tuned with analysis of hundreds of various EEG clips. Keeping the priors informative but not too strong would be advised to prevent the EEG readings from spiking excessively or not at all.
What does seem to be a viable and so far unused diagnostic for EEG data regardless of model is the sensor variance. The initial plots of the data displayed stark persistence in the sensor variance when the seizure were occurring versus when they were not. This becomes a potentially useful univariate measure with plenty of applications and further types of analysis if needed. Because of the grand scale of data points in EEG studies, sensor variance as a measure of brain synchronization has merit.
The DCC-GARCH model has a benefit in that it doesn't require downsizing the data into a univariate measure and it also doesn't require Bayesian priors.
Potential work could always look at the use of only several sensors in an online recording scenario, however the parameters would need to be constantly updated which again seems somewhat tedious. The option does exist however, and again the DCC-GARCH does model the volatility of the 22 sensors sequentially quite well.
The change point analysis models might be improved with tuning parameters that were not specified in this study. Regardless, the change point models have a downside similar to the DCC-GARCH in that they really work with discrete data in a post-hoc manner. Application of this type of model seems less versatile, however its ability to capture the start of the seizure by change point from nonseizure to seizure might give it some interest for future work with added model specifications adapted for online data acquisition.
In terms of analysis, future work could certainly look at inter-cranial EEG research. It is known that scalp measures can be more a measure of the physical muscle contractions during a seizure. Inter-cranial measures would also need investigation to determine the appropriateness of sensor variance usage in EEG modeling because they actually measure synaptic functioning much better than scalp EEG recordings. Future work could also address whether 22 or more sensors are even necessary in terms of variance. Perhaps only several sensors are needed to capture brain desynchronization via sensor variance.
The age of the patients used could also be an issue of contention in that a child's brain might not function the same as an adult. There are some patients in their teens and even early 20's in the CHB-MIT database, so they could be examined as well. However it might be more advantageous to utilize the modeling techniques here with several various databases and a much larger age range to determine overall usefulness of the models presented.

Limitations
The models presented all have the major limitation of post-hoc usage. It is known which clips have seizures in the CHB-MIT scalp database which makes prediction the goal but not with the ideal of potential ambulatory usage in the future.
Many studies, including this research, utilize statistical techniques of modeling or machine learning which have very little realistic applications. The DCC-GARCH and change point models utilized in this study both served very useful purposes with generally good results, but would likely not work well in realtime environment. On the other hand, models such as the Markov regime switching model have potential realtime application, but further additions would be needed. It is thus suggested that EEG seizure prediction research should always try and keep potential ambulatory usage in mind when doing their research.
The downfall of the data used in this research is that they are post-hoc based and we are left relying on the reliability of others. The actual seizure boundaries are noted by trained doctors, however that doesn't guarantee the placement of the seizures are exact. The data is also collected in a hospital, which doesn't reflect a potential ambulatory usage situation where the patients are free to go about their day while still being recorded. More noise to the data could be introduced with increased motion and daily activities with ambulatory usage, and the models used might not be wholly reflective of that. The CHB-MIT data is all scalp EEG based, and thus modeling as done in this thesis still lacks evidence of working with inter-cranial EEG recordings.
No knowledge of the patients used in he CHB-MIT database is available other than age or gender. We don't know any of the history or background of the patients, and thus we don't know the actual scope and severity of their seizures.
We get some idea of the frequency of their seizures from their clips, but we are still left without a more complete history. It is also unknown if the patients su↵er from any other ailments or diseases that could be at work. This is a downfall of any post-hoc analysis of readily available data, and thus research as such lacks some practicality until put into a realtime collection where more of the patient information is available and can be accounted for.

Conclusions
Three models for three clips where a seizure occurs have been examined in In an approach to predict the actual start and end of a seizure, a Bayesian change point model was used. With an ability to capture the beginning of the seizure quite well, the change point models also have some interest going forward.
For use with discrete post-hoc data, the change point analysis might be beneficial with some added model specifications and modifications. This type of model could be extremely useful for locating the seizure boundaries when they may not be known.
Since the ideal going forward is to have online ambulatory use, the Markov regime switching model seems the most viable of the three models presented. To make the switching model applicable to online recording, a particle filtering algorithm will need to be put into place to keep the point probabilities continuously updating. The use of particle filtering is not new to EEG data acquisition, so it seems logical that the incorporation of modeling sensor variance through a Markov regime switching model might be a worthwhile next step.
The modeling techniques used in this research also highlight alternatives to the more commonplace machine learning methods utilized in EEG research. Modeling o↵ers a more streamlined approach with realtime implications than machine learning does, and no training data is required for modeling. Some AR and/or MA time-series parameters could also be introduced to the Markov regime switching model to make it even more e↵ective at modeling EEG sensor variance in terms of seizure prediction. These ARMA parameters can in turn help fine tune the particle filtering of the incoming data for use in ambulatory EEG modeling.
Ultimately, the goal is to utilize a mix of statistics, engineering, and computers to help prevent seizures before they occur in epileptic patients. A synergy will be needed between the three to make it viable and practical. While computers continue to process faster and engineering increases the ability to capture EEG in realtime, statistical modeling will be a necessity to actual locate the occurrence of seizure activity. While many methods for prediction exist, it should become the goal to pinpoint models that would viably work in an ambulatory scenario. The prediction of seizures without actual application to reality seems wasteful. Thus, statistical models for seizure activity should focus a little less on almost perfect predictive power and more so on strong sensitivity and specificity mixed with an actual application potential. The use of sensor variance and modeling it with a Markov regime switching model as demonstrated in this research, seems to be a good example of a solid mix of prediction and realtime use and both should be explored more.