Automated High Frequency Oscillation Detection and Seizure Onset Zone Estimation Using TCRES

High frequency oscillations (HFOs) have become a predominant topic in neurology research in recent years as they are considered markers of epileptic activity, potential indicators of seizure onset zone (SOZ), and possibly hint at an oncoming seizure in epileptic patients. These oscillatory signals are frequently seen in intracranial electroencephalography (iEEG) but have proven difficult to record from conventional scalp EEG electrodes. However, with the advent of tripolar concentric ring electrodes (TCREs) it is becoming easier to record HFOs without the need of a surgical procedure. While several different methods have been proposed for detection of HFOs in iEEG signals, no single method has been identified as a best detector and it is unknown how any of these methods will perform on the activity recorded from TCREs (tEEG). In this study, a novel detection design is derived and evaluated on real and simulated tEEG and compared to two of the more popular HFO detection routines designed for iEEG. Subsequently, an estimate of SOZ based strictly on HFO event occurrence is made on a few patients and compared to the markings of a clinician.


Introduction
Epilepsy is a brain disorder characterized by recurrent seizures and affects an estimated 1% of the world's population [1]. Nerve cell communication in the brain is made up of electrical activity balanced by excitatory and inhibitory signalsdiseased brain tissue can cause an imbalance of these neurotransmitter signals leading to a higher likelihood of seizures [2,3]. Antiepileptic drugs are often prescribed to treat this imbalance but are only effective in controlling symptoms for 70% of patients [1,2]. Those suffering from partial seizures-epileptic activity restricted to a certain area of the brain-can often be helped with a surgical procedure that removes the part of the cortex responsible for seizure propagation [2]. However, it can be difficult to find the location of such diseased brain tissue [4,5] and it is not possible to determine if a resection of the epileptic zone will be successful in preventing future seizures before the operation has been performed [4].
Electroencephalography (EEG) is a monitoring of the brain's electrical activity from the scalp surface and is a routine practice for epileptic patients-it is often the first step in epilepsy diagnosis [2,3]. While routine surface EEG (recording from electrodes placed directly on the scalp) is typically concerned with frequencies below 70 Hz [3], modern EEG research commonly uses intracranial or depth electrodes in order to record higher frequency signals that were previously difficult or impossible to find [3,6,7]. A craniotomy, open brain surgery, is required to place the intracranial electrodes. The advantage is better localization and less environmental noise than scalp recordings, yet determining the SOZ is still a challenge [4]. In epileptic cases where a lesionectomy is chosen as a remedy, multiple surgical procedures consisting of iEEG recordings and functional brain mapping with a cortical stimulator are often needed [3,4]. iEEG is associated with a risk to patients and only obtained when required [8]. Therefore, the ability to accurately detect neurophysiological signals of interest and localize areas responsible for seizure propagation without requiring additional complex procedures has the potential to revolutionize how advanced neurological studies are performed.
TCREs are a newer technology capable of recording signals with better spatial resolution than standard EEG by generating a Laplacian montage at a single scalp recording site [9,10]. The appeal is the ability to investigate highly localized brain activity without requiring a surgical procedure. This technology applies to the study at hand as tripolar electroencephalography (tEEG) recordings have been used to find HFOs preceding seizure onset in epileptic patients [11,12]. Pairing an effective HFO detection routine with TCREs has the potential to reduce the number of open-brain procedures necessary when intervening with high-risk epileptic patients.
The tradeoff is a higher background noise in tEEG than iEEG, making automatic signal detection more difficult. Therefore, using detectors that have been tried with success on iEEG data may not be a fruitful endeavor as iEEG has more physiological data contributing to the overall signal and less influence due to environmental noise. This brings us to the purpose of the study: to develop a detection method designed for the signal characteristics of tEEG and to compare to detectors in the literature that have had success evaluating the HFO data type in iEEG studies.
It should be noted that HFOs have been found in recordings from both healthy and epileptic brains, yet they are more commonly seen in patients with epilepsy and their frequency of occurrence appears to be correlated with ictal state [13,14,15,5].
Recent studies have used HFOs recorded from iEEG to try and determine the SOZ [14,15]. It is known that resection of brain areas with ripples correlates with a good postsurgical outcome when compared to those areas that do not generate oscillatory activity and several studies have shown the correlation between fast ripples and SOZ, though it appears most likely it is the rate of occurrence or channels with an abnormally high number of HFOs that indicate epileptic zones and not the individual characteristics of the events themselves [15,16,8,17].
Detecting HFOs when recording using standard scalp electrodes has proven difficult but HFOs can occasionally be seen [5]. However, the typical amplitude of ripples detected from the scalp is very low and only a small proportion of HFOs has a broad enough spatial extent to be recorded from a standard cup electrode [8]. As previously discussed, the advantage of tEEG is better source localization than EEG and a less invasive routine than intracranial, making high-quality recordings of neural activity, such as HFOs, easier to obtain. Therefore, it should be theoretically possible to not only design an automated detector that can locate HFOs without requiring a surgical procedure to do so, but to subsequently use an event detection count and provide an initial estimate of which area of brain tissue contributes to seizure propagation.

Neurophysiological Signal Analysis
Neurophysiological signal processing imposes new complexities over linear time-invariant systems upon which the majority of signal processing techniques have been designed. The signal recorded from any electrode designed to measure electrical properties of the cortex is a summation of activity from many neurons, each of which can be considered an independent random process [3]. Some research suggests EEG can be treated as a nonlinear deterministic process by describing the firing of individual neurons; however, the electrical signals from the brain are often considered stochastic in nature and it is generally understood that if the signals recorded are the result of a deterministic system, then that system is so complex that it is best modeled as a random process [2]. In order to accurately reconstruct neurological signals we turn to a parametric approach.
There are two types of noise to consider when working with human data: TCREs record further from the neuronal source and closer to obstacles in the patient environment when compared with intracranial contacts for which the majority of HFO detectors are designed. Naturally, the HFO amplitude range is highly dependent on the proximity to the HFO generators [8]. This complicates the design of an HFO detection routine for tEEG as there is apt to be more environmental noise and a lesser magnitude of the physiological data under observation.
The EEG signal is nonstationary long-term due to its slowly time-varying signal properties; abrupt state changes; and transient events, such as spikes, that appear superimposed on the background signal [2,19,20]. Neurological data has a 1/f , or pink, spectrum but due to the lack of stationarity, estimating the noise covariance and applying a pre-whitening filter for the length of the recording is not feasible [21]. Fortunately, as much as 90% of EEG less than 4 seconds in duration can be modeled as a Gaussian wide-sense stationary (WSS) process [21].
It is important to ensure any signal processing techniques that rely on a stationary signal, such as frequency spectrum analyses, must be adapted to consider shortterm stationarity.
It has been shown that short, locally stationary EEG signals can be described with an autoregressive (AR) model as an all-pole filter is able to describe the spectral parameters of several different EEG rhythms [2]. The AR model also has the advantage of no spectral leakage offering a better frequency resolution when compared to nonparametric methods [20]. Fourier analysis is limited for EEG since it is designed for linear, stationary systems [16], and requires a tradeoff between short record length and accuracy in the frequency domain [22]. The AR model on the other hand, offers the advantage of relatively few samples for accurate frequency analysis and computational efficiency [22] due to the analytically derived AR spectrum having an infinite frequency resolution with a less restrictive dependence on the length of analyzed data [20]. It is also good for the analysis of signals with sharp spectral features at the cost of poor spectral estimation when inappropriate model orders are used [20].
It is easy to understand the suitability of an all pole model to neurological data as the EEG signal is described by dominant frequencies, such as sharply defined peaks and rhythmic bands describing different brain states, opposed to spectral notches (absence of power at certain frequency bands) which are better described by a moving average (MA) model [22].
Typical, non-adaptive signal analysis models require that the signal is stationary which we know neurophysiological data is not. Performing a spectral analysis on a data record exhibiting a non-stationarity will result in a severely biased estimate [23]. Thus, it is necessary to either use an adaptive model or divide the signal into short data segments that can be assumed wide-sense stationary [22]. In this study, non-stationarities will be found in the data records followed by signal analysis on short windows located at the events of interest believed to exhibit an anomaly.

High Frequency Oscillations
The HFO is a signal found in brain wave activity, typically described by duration of less than 125 ms and frequency in the 80-500 Hz range [8,5]; however, some studies have found HFOs at frequencies well above 500 Hz [6,24]. They can be further categorized into ripples: frequencies between 80 and 200 Hz, and fast ripples: frequencies greater than 200 Hz [25,5]. HFOs have a minimum of four oscillations and an interval of greater than 25 ms between events [14,8]. Although HFOs can be found by reviewing recorded patient data, performing a manual search is time consuming and automated analyses are preferred when adequate [26].
It should be noted due to the pink noise spectrum of neurological data, the amplitude of ripples is higher than that of fast ripples for the same signal-to-noise (SNR) ratio [27]. Similarly, the overall duration is dependent on the frequency of the signal. HFOs must have a minimum of four oscillations [8], and past studies attempting to use strictly a signal duration to decide between HFOs or spikes have failed since the duration of fast ripples will be shorter than that of ripples [27,28]. Therefore, it is important when using a duration threshold to base it on the frequency of the event [27].
In setting thresholds for event detectors it is important to understand that HFOs occur in an on-off intermittent pattern [16] and rates are not consistent across channels or patients [29]. The pathological significance of HFOs is they reflect fields of synchronized action potentials within small neuron clusters which are responsible for seizure generation [16]. Therefore the electrodes contributing the most HFOs in a study should be indicative of the SOZ.

Epileptic Spikes
Epileptic spikes are a non-oscillatory signal characterized by an abrupt change in voltage polarity of less than 70 ms duration [3,8] which can incidentally be selected by HFO detection methods due to the spurious oscillations they exhibit following the use of a digital filter [25,30]. It is known that filtering spikes or artifacts could produce sustained oscillations at high amplitudes due to Gibbs' phenomenon [8,27], and that sharp transients with high amplitudes will contribute more to generating spurious oscillations [28]. However, oscillations resulting from filtering will have a low correlation with the raw signal in the frequency band of interest [8] and result in relatively few oscillations [28]. While epileptic spikes and sharp waves are a highly specific marker of epilepsy, they do not carry the same physiological significance as HFOs since they are not markers of the SOZ; therefore, it is important to distinguish between the two when designing a detection method [8,5]. Fortunately, spikes are broadband in the frequency domain whereas HFOs are sinusoidal and exhibit a narrow frequency band, making it theoretically possible to classify the two with an automated method.

HFO Detection
The short-term Fourier transform (STFT) has been used on TCRE data, visualized with spectrograms, to find areas where high-frequency narrow-band activity is present [12] as this may be a spectral indicator of HFOs. However, there is need for examining other techniques for several reasons. First, there is no way to determine from the spectrogram alone if the high-frequency, narrow-band activity represents a HFO or is due to some other physiological or environmental noise.
Second, this is a manual technique for making follow-up review easier and not a detection routine on its own. Knowing that the statistical properties of EEG signals, and in the case of this problem the underlying noise, varies over time is reason to believe a better estimator exists than the STFT.
It is important to design a detector based on valid assumptions about the data. There is no shortage of ad hoc approaches in the literature, such as a basing on the eyeballing of a frequency spectrum so that it looks similar to what other events have shown [31,28]. While some of these methods will work, the result is likely to be a biased detector and performance demonstrated on limited data subsets. A majority of the detectors available today are limited in clinical practice due to their large number of false detections [27]. Additionally, many detectors are strictly compared against the markings as done by a clinician, which does not show how each algorithm works on different data types and can be a fairly subjective analysis [27]. Similarly, many others are compared only during slow-wave sleep when datasets are relatively clean and therefore is not a valid performance evaluation of the detector [29]. It is important to evaluate each detection routine on controlled computer simulated data, designed to be realistic to the actual environment, in order to verify the efficacy of the algorithm and gain a meaningful performance metric [18].
It is also unknown if the methods that work well for iEEG or EEG will have similar results for tEEG. As previously discussed, while information conveyed is similar, the signals being analyzed are not identical and statistical properties may differ. Of the methods that have been previously evaluated for HFO detection no single detector has been established as a best method [30,26]; therefore a comparison of detectors is worthwhile.
A majority of the successful approaches consist of a two-stage approach: separate events of interest from the background and perform a subsequent test to decide if the event of interest consists of an HFO. Two such methods will be compared to the novel detection routine established here. The first, which will be called the Staba approach, was developed by R. J. Staba et al. in [6] and is most often used as a baseline method for comparison in the literature since it was one of the first publications to offer automated HFO detection with good results. The second, which will be called the Birot method, was developed by G. Birot et al. in [32] and has gained traction recently for its great performance on capturing fast ripples from iEEG.
The Staba routine is a filtering of the signal followed by a short-term energy detector for marking events of interest (EOIs) and the setting of subsequent amplitude and duration thresholds for HFO classification. The first stage of the detector uses a band-pass filter for the HFO frequency range (80-500 Hz) and computes a three ms sliding window of the RMS signal. Successive RMS values > 5 standard deviation above the mean RMS signal with a minimum of six ms duration are flagged as EOIs. The second stage uses the rectified signal and classifies an EOI as an HFO if a minimum of six peaks > 3 standard deviation above the mean amplitude are present.
The first stage of the Birot routine uses a Hanning window to find the local energy of the high-pass filtered signal which is compared to a threshold for stage 2 follow-up. The real contribution of this work is in the second stage which uses a ratio of the power in the periodogram (for full-band data) for high and lowfrequency bands for each EOI. The intent is to reject spikes which are broadband in nature and therefore will result in a lower test statistic than HFOs, which are described by high frequencies only. Results are compared in [32] using both the Fourier and wavelet transforms. Best results are achieved using a low-band of 32-64 Hz and a high-band of 256-512 Hz, although the comparison between different frequency band sets tested does not vary greatly.

Both methods use a different implementation of what is essentially an energy
detector as the first stage, for which there are known drawbacks with neurophysiological data. When detectors rely on thresholds calculated using a standard deviation or percentile, the event count will be based on the distribution of actual events; i.e., it will cause a high number of false alarms on inactive channels and missed events on active ones [27]. The energy detector is known to fail for HFOs in active channels [8].
Though the Staba method is generally accepted as the baseline detector to which the majority of new routines are compared, it is designed for use with short, relatively noiseless datasets and is prone to detecting artifacts as HFOs [29].
Each method the author believes to be a fair implementation and baseline comparison.

Statistical Signal Processing
There are multiple statistical signal processing approaches used in this thesis.
A high level overview of some of them is provided here.

Autoregressive Processes
It has been previously mentioned that neurological data can be well-described by an AR process. This is because the data is defined by its strength of power in certain frequency bands, opposed to its lack thereof which would be better described by a MA process in order to capture spectral notches [22]. The AR process is for model order p, where the data w[n] is modeled as the output of a causal all-pole discrete filter excited at the input by white Gaussian noise (WGN). It is a simple model in which the current value is generated by a weighted sum of previous values plus a noise term [33]. The input to the filter, u[n], is termed the excitation noise and is necessary to ensure the output describes a WSS random process [34]. By using this model to describe EEG data, the pink noise assumption becomes inherent to the filter design and passing generated WGN as input results in simulated data characteristic of EEG.
A single sample of WGN with mean µ takes the form [34] Making the approximate probability density function (PDF) of an AR process excited by WGN for record length N A more intuitive way to think of the AR model is a coloring of the white noise input to allow the modeling of a wide variety of power spectral densities (PSD).
The theoretical PSD for data which fits an AR model can be described bŷ where the "hat" denotes the parameters as estimated from the data, or In order to solve for the AR parameters from a data set, the following AR term must be minimized The partial derivative with respect to the kth AR parameter is Resulting in Using matrix notation, the MLE of the set of AR parameters can be expressed . . .

a[p]
The AR parameters can now be solved with a Cholesky decomposition. This result is an estimate of the optimal forward predictor, often termed the covariance method [33].
It is possible to instead base the estimate on the optimal forward and backward predictors of the AR process using which is termed the modified covariance method [33].
The derivation above is one of multiple methods commonly used to estimate AR parameters from a dataset. The entirety of AR parameter estimation in this thesis utilizes the modified covariance method as it is a common choice in mixedspectrum problems and generally results in a lesser shifting of spectral estimate peaks from their true frequency locations due to additive noise [35,33].

Generalized Likelihood Ratio Test
The generalized likelihood ratio test (GLRT) is a statistical hypothesis test considered best suited for problems where the signal PDF is not entirely known, i.e., may be known with the exception of certain parameters. The GLRT replaces the unknown parameters by their maximum likelihood estimates (MLEs). The advantage of using the MLE is it produces an efficient parameter estimate if one exists [34]. The neurological data being analyzed in this thesis consists of several signal and noise parameters that are quite variable in practice; with the GLRT, no prior knowledge of the unknown signal parameters is necessary. Additionally, the GLRT can be shown to be the uniformly most powerful test asymptotically when compared to other invariant detectors [36]. While no optimality can be claimed with using the GLRT, it works well in practice.
The GLRT decides a signal is present based on the following test statistic where H 1 is the case when a signal is present, H 0 is the null hypothesis, andθ i represents the MLE of the unknown parameters. The benefit of using the GLRT for detection problems where information about the unknown parameters is desired should be clear: the first step in calculating the test statistic is to find the maximum likelihood estimate.

Canonical Autoregressive Decomposition
The canonical autoregressive decomposition (CARD) is a technique for estimating unknown sinusoidal parameters when obscured by an unknown AR process.
The intent of utilizing this approach for HFO detection should now be clear: neurophysiological data can be described by an AR background and we are searching for an unknown brief sinusoidal burst. CARD offers a simplistic way of addressing mixed spectral problems as a sum of an AR process and sinusoids so that both signal components can be estimated simultaneously. The intuitive, yet rudimentary, explanation of the method is a prewhitening of data followed by a periodogram peak search. A full derivation is outside the scope of this document but can be found in [35] where it is shown to require a numerical maximization with respect to only the sinusoidal frequencies.
For implementation here, the CARD method uses a fixed sinusoidal model order of s = 1 and is run on full-band data to reject spurious oscillations due to the filtering of spikes or sharp signal artifacts. It is believed this is an adequate approach as CARD should accurately estimate a pink pre-whitener for the AR background and find the sinusoidal burst representing an HFO.
A background has been provided on signals of interest and techniques that may aid us in event detection from tEEG. The following chapters provide a description and performance evaluation of a novel HFO detection routine and subsequent SOZ estimator based on the assumptions provided above.

CHAPTER 2 Methodology
This chapter starts with a high-level concept on the HFO detection routine used and follows with an algorithm description for each of the two detector stages derived as part of this routine. Then an overview on the tEEG simulator design created for performance evaluation is discussed, how the detection algorithms are to be compared, and how the SOZ is to be estimated based on the results.

Detector Overview
Similar to the majority of the detection routines in the literature, the detection approach generated here starts by filtering out the unwanted background, locating events of interest, and deciding based on signal parameters if an HFO is present.
Novel approaches based on the GLRT are used for locating the events of interest and detecting the signal. The detection routine is run on each tEEG channel individually.
The HFO signal can be thought of as containing a transient component and a rhythmic component [1]. A two-stage detector is therefore designed to first locate transients via an anomaly detector and second find oscillations via a sinusoid detector.
The tEEG signal consisting of an HFO is modeled by a sinusoid embedded in AR noise.
To locate the events of interest, an anomaly detector is derived using the GLRT in order to find stationarity changes described by a change in AR filter coefficients or excitation white noise variance. The derivation is based on finding a stationarity change in a single window of tEEG data and is implemented by a sliding window (recursively calculated for each sample shift) so that all non-stationarities are located in the entire tEEG data record. The use of a sliding window estimator in order to find a transition point is preferred when interested in an abrupt change in data and not any gradual changes in the background [2].
The second stage decides if the events of interest found from stage 1 contain HFOs. This is called the sinusoid detector and is again derived using the GLRT; the result is essentially a normalized replica correlator using pre-whitened estimates.
A required input to this detector is an estimate of the sinusoidal frequency and AR process filter and noise parameters. Two suggestions for estimating these parameters are offered: CARD parameter estimates based on the full-band signal and use of the AR spectrum as already estimated from stage 1, the latter of which requires a subsequent period count to distinguish between HFO and filtered artifact.
Corresponding block diagram for the overall detection routine is shown in Before stepping into the algorithm descriptions, it is worth discussing why this is believed to be a valid approach. In the introduction an overview was provided on neurological signal analysis and the signals of interest. We have essentially built a list of prior assumptions about our data, each of which needs to be tackled by the designed detector. EEG signals are slowly time-varying and non-stationary long term; the anomaly detection stage is designed to reject slowly time-varying signals due to AR estimates being similar between consecutive windows. This same stage will reject any gradual increases in noise throughout the recording due to the noise estimates being slowly time-varying. Any abrupt changes in environmental noise or signal artifacts will be detected by the anomaly detector, but subsequently rejected by the sinusoid detector in stage 2. The data record can be assumed stationary in small segments-this allows the use of a window size much less than a second and Figure 1. Detection Routine Block Diagram subsequent spectrum analysis to be performed since the window can be assumed WSS. EEG has a 1/f , or pink, noise spectrum requiring any signal analysis to use a pre-whitener-this is inherent to the AR filter that is estimated from the data.
Similar arguments can be made for the signals to be detected. The HFO is a transient event occurring above 80 Hz and can be separated from the background; therefore, an 80 Hz high pass filter is applied to the data first to isolate high frequencies, the anomaly detection stage finds transient events, and the subsequent stage is based on the frequency peak. The signal is also less than 125 ms in duration, and the detector window can be adjusted based on the sample rate to run on a quarter-second of data at a time (allowing for a max-duration HFO to be captured in half of the window). Because the frequency of event occurrence varies between channels, the test statistic threshold is constant for all channels and there is not an influence due to, say, an energy detector threshold that is based on the mean of the channel. In order to reject epileptic spikes, which are broadband events and can resemble oscillations due to high-pass filtering, two implementations of the sinusoid detector are employed. The first, to count the number of detected oscillations from the high-pass band based on estimated signal duration and frequency and use this as a subsequent test statistic as HFOs have a minimum of 4 oscillations whereas filtered spikes show fewer [3,4]. The second, to instead perform sinusoid detection and AR estimation on the full-band signal using the CARD method.

Overview
The goal of the first stage is to detect a stationarity change in the PSD of a locally wide sense stationary random process. The PSD is modeled as an AR process of model order p with where a[j] represents the AR parameters and σ 2 u represents the noise variance.
It is known that the signal of interest is a transient signal, separable from the background, so it is desired to only detect significant changes in the AR filter coefficients or driving white noise variance and reject slowly time-varying signals.
For the use of HFO detection, it is known that the signal of interest is above 80 Hz. Therefore, a digital 80 Hz high-pass filter is applied before anomaly detection is performed.

Derivation
For a single window length M , the data is broken into x 0 , the full window length; x 1 , the first half of the data; and x 2 , the second half of the data. Similarly, the data is described by AR parameters a 0 , a 1 , and a 2 each of model order p and excitation white noise variances σ 2 0 , σ 2 1 , and σ 2 2 . This can be shown as where the bold parameters represent column or row-vectors of data.

The hypothesis test is
The GLRT decides H 1 if The PDF of an AR process excited by WGN of length M can be written as or equivalently The test statistic can now be written as To solve the MLE of σ 2 1 we maximize ln p( Setting equal to zero and solving results in Equivalently,σ To find the MLE of the AR parameter sets we must minimize This is solved using k=1 as an example It then becomes clear that the AR parameter estimates arê When implementing, we will use the modified covariance method to solve for the AR parameters.
When plugging in the estimates for σ 2 in the test statistic, the summations in the exponent will cancel. This is shown below for the null hypothesis.
The resulting test statistic is Or equivalently

Explanation
Three sets of AR filter coefficients and σ 2 values are estimated: for the full window length, the first half of the window, and the second half of the window. The By choosing the threshold correctly, the test statistic should identify transient signals but not those that may be described by a time-varying autoregressive (TVAR) model or an environment with variant noise over time. It is expected that the output of this stage, when run on tEEG data, will be a set of events consisting of HFOs; epileptic spikes; and instances of sudden increases in environmental noise, such as electromagnetic interference or a disconnected electrode.

Expected Performance
The asymptotic performance of the GLRT statistic is given by [5] 2 ln Where and a performance evaluation must be run on simulated tEEG data-this is shown in figure 9 in the results section.

Implementation
The detector derived above is based on detecting a stationarity change in a single window of M samples. M ideally should be chosen to be equal to two times the length of a stationary segment so that the stationary signal is captured in half the window length. To find stationarity changes in a data record of N samples, the test statistic must be calculated for each sample delay n 0 , from 0 to N − M − 1.
This can be thought of as a sliding window of length M : recursively compute the test statistic and shift window by one sample. When the test statistic exceeds the threshold, sample location n 0 + M/2 is considered a stationarity change.
A window duration of 250 milliseconds is used for the detection of HFOs in this study in order to ensure the full HFO signal can be captured in half the window length.
In addition to M , only the AR model order (p) and the threshold (γ) need to be defined for implementation. γ is based on a Monte Carlo simulation of data and corresponds to a P F A of 0.05. It should be noted that the threshold will vary based on window size and sample rate. EEG data has successfully been modeled with an AR process of model order 6 previously [6]; however, filtering the data at 80 Hz will remove the strongest contributing frequency components from the signal. For implementation in this study, a model order of 4 is used for anomaly detection on the filtered data.
Implementing this routine programmatically is straightforward. The modified covariance method is used to obtain the estimates of the AR parameters. These estimates are then plugged into equations (23) - (25) and (36) to get the excitation white noise variance estimates and resulting test statistic.
It is established that a threshold crossing signifies the area of a stationarity change. Since the detector is implemented by recursively computing the test statistic for single-sample shifted windows, the results between windows will be highly correlated. In other words, each transient event should be marked by several threshold crossings. To avoid double-counting events, a local peak search with a buffer of M/2 samples is performed on the test statistic results. The event window is determined to be the location of the local peak ±M/2.

Overview
The input signals of the second stage are discrete data subsets of fixed duration assumed to contain an HFO, epileptic spike, or noise characterized by an abrupt change in signal stationarity. The goal is to decide if an oscillatory signal characteristic of an HFO is present: a sinusoid of frequency above 80 Hz with a minimum of 4 cycles. The signal of interest may or may not take up the entire duration of the window. Because HFOs are transient events separable from the background activity, we model the signal to be detected as a sinusoid added to AR noise of model order p, or where f 0 is the signal frequency, A is its amplitude, n 0 the signal onset, M the signal duration, and is the AR noise model of order p, all of which are unknown.
The previous detection stage uses the modified covariance method to estimate the AR parameters and the MLE to find σ 2 . However, if the stationarity change includes a sinusoidal signal, the estimates for the current window may be inaccurate since the AR model order, p, should be greater than originally assumed. It is also unknown if a frequency analysis of the window to find the peak is adequate to estimate f 0 due to the interference of the AR background on the frequency spectrum. Therefore, other methods may need to be utilized to obtain estimates of these parameters.

Derivation
To start, we will assume the sinusoidal frequency and parameters of the AR filter are already known and derive the rest of the problem. The collection and use of these parameters will be discussed in the Implementation section for this detector.

Our hypothesis test is
The noise is the result of an autoregressive process excited by WGN with excitation noise variance σ 2 . It is easily shown that the whitening filter for an AR (44) The GLRT decides H 1 if The approximate distribution under the null hypothesis is And the approximate H 1 distribution is We can expand so that p(x;Â,n 0 ,M , H 1 ) = 1 (2πσ 2 ) N/2 This allows the likelihood function to reduce to The test statistic can now be re-written as Which allows us to make a substitution to get Or equivalentlŷ Which must be maximized with respect to the unknown onset time, n 0 , and duration, M . This can be done by computing the test statistic for each n 0 , M pair.
It should also be noted that summations should start at n 0 + p to prevent out-of-bounds limitations. The effect should be negligible for M >> p.
The test statistic is then which is compared to γ .

Explanation
The resulting detector is a signal pre-whitener followed by a normalized replica correlator using the pre-whitened signal as the signal replica.
The test statistic is calculated for all possible n 0 (between 0 and N − M ) and M (between pre-defined limits m 1 and m 2 ) combinations and the maximum value is compared to a threshold. If the test statistic exceeds the threshold then it is decided that a signal is present and the max n 0 ,M location represents the signal onset and duration length in samples.
It is possible to include phase as an unknown parameter in the GLRT derivation with the assumption that the signal amplitude is positive. However, the purpose of this application is to detect >80 Hz signals and the worst-case for an incorrect phase estimate would cause the signal onset estimation to be off by a half-cycle at 80 Hz, or approximately 6 milliseconds. This is considered negligible and therefore the phase is assumed equal to zero.

Expected Performance
The asymptotic performance discussion is similar to that of the anomaly detector as this is another sample-shifting detector based on the GLRT. Again the windows being compared are highly correlated, in this case for each (n 0 , M ) pair, so IID cannot be assumed. A P F A must be chosen, with threshold based on the result from a Monte Carlo simulation. Performance of this detector can be found in figures 11 and 12 in chapter 3.

Implementation
The derivation assumes the AR parameters and sinusoidal frequency are already known. Two suggestions are offered for the implementation of this second stage. The first is based on CARD as it should allow the estimation of the AR parameters and sinusoidal frequency of the full-band data so that the filtering of spikes does not cause false alarms. The second is to use AR and frequency estimates from the filtered data.
The CARD method is utilized to get an estimate of the AR model order, AR filter coefficients, white excitation noise variance, and sinusoidal signal frequency.
In utilizing this method, it is assumed that a single sinusoid is present. CARD estimates use the full-band data subset in order to avoid detecting spurious oscillations due the filtering of sharp transients. This approach should be able to resolve the HFO frequency peak and AR background from the unfiltered data, and the same unfiltered data must be passed to the sinusoid detector as the AR estimates from CARD act as a pre-whitener for the full-band data. An added test for this stage is to first confirm CARD returns a frequency >80 Hz; otherwise it can be assumed no HFO is present.
An alternate option is to base the AR pre-whitener on the previous window In computing the test statistic it can be seen that

tEEG Simulator
It is difficult to assess performance of a detector based on observed data alone.
For a stationary process it is often possible to use averaging should the underlying process be ergodic [7,2]. Clearly, this is not the case with neurophysiological data. In order to avoid developing a biased detector based on a limited subset of data, detector performance should be evaluated on realistic simulations [4,2]. The following describes the tEEG simulator design created for the evaluation of HFO detection routines.
Multiple sets of AR parameters, and corresponding white noise variances, have been extracted from real, full-band tEEG and saved to file using the modified covariance method. Data subsets corresponding to different tEEG rhythms were hand-selected in order to allow for a varying background, i.e., such that delta / theta (<7 Hz), alpha / beta , gamma (>30 Hz) and seizure (broadband spiking) frequency components contribute to the overall signal. The estimated model order was used for each data subset resulting in model order ranges from p = 10 to p = 15. The range in model orders is preferred for the simulator to ensure detector performance is not biased by selecting a model order for the detector that is identical to each set of filter coefficients used to generate the data. To simplify implementation, each AR parameter set has additional filter coefficients added with value equal to zero so that a fixed model order of p = 15 can be used. By observing the AR model it should be clear that this does not affect the actual filter model order but merely impacts array size programmatically.
In order to simulate tEEG data based on a single AR parameter set, WGN is generated and filtered based on the AR process formula used in equation (1).
Unfortunately, once a set of AR parameters is chosen and used to generate data, the output of the simulator is stationary which is not true to the nature of neurological signals. The purpose of retaining multiple sets of AR parameters is to have the ability to cycle through different types of tEEG background data before adding in physiological events (HFOs and epileptic spikes). However, simulating from an AR parameter set for a duration of time and immediately changing to a different filter parameter set would be equivalent to a neuron cluster shutting down abruptly and another one gaining full effect at the same time which is not a realistic scenario [7]. In order to overcome these impracticalities, the simulator randomly selects a subsequent AR parameter set and linearly spaces the filter coefficients over a period of one to two seconds, such that each filtered sample is effectively described by a slightly different AR filter. At no time does the simulator dwell on a single set of AR parameters and the result is a truly time-varying background for realistic performance estimation of the developed detection routines.
An example of simulated TVAR tEEG is shown in figure 5.
HFOs and spike events can be described as appearing superimposed on background EEG. The simulator design first generates the background data and the physiological events are subsequently added to the TVAR data. The simulator is able to randomize (between set limits) onset times and types of events, which are then outputted with the data so the accuracy of the simulator can be tested without needing to manually find and mark events.
The HFO signal is simply a Gaussian-modulated sinusoidal pulse in which the duration, frequency, and amplitude can be modified. Each parameter is randomized on a uniform distribution between defined limits representing the physiological characteristics of an HFO to ensure the detectors are not biased for certain event features.
The spike signal is designed as an exponential growth function concatenated with a negated exponential decay function. The simulator uses a coin flip to decide if the simulated waveform is positive or negative before adding to the tEEG background since the polarity in practice is dependent on the electrode's position relative to the neural source and the type of polarization shift in the corresponding neuron cluster [8]. Although spike polarity typically only varies on a channel-tochannel basis, this allows the testing of detection routines without assuming a certain type of deflection.
Example simulations of an HFO and a spike signal that get added to the TVAR background are shown in figure 6.
It is important to have the ability to modify the SNR of the tEEG simulator so that an evaluation of detection routines against different qualities of recordings can be performed. In this case we consider the signal an HFO or a spike (as spikes are expected to be detected in the first detection stage) and noise the TVAR tEEG background. For this, we define a new measure termed the Event-to-Noise Ratio (EvNR). EvNR is an event amplitude multiplier based on the most recent halfsecond of tEEG data. With a unity EvNR, the spike max value will be equal to the amplitude of the recent full-band signal; for an HFO, the peak-to-peak max of the sinusoidal pulse is equal to the amplitude of the recent signal high-pass filtered at 80 Hz. An EvNR equal to one appears realistic for these events as recorded from TCREs, though this is a subjective measure. It is believed that as EvNR approaches two the events are unrealistically large but may be closer to iEEG in which more physiological data contributes than environmental noise. Naturally, as EvNR approaches zero the signal becomes indistinguishable.

Comparison Detectors
The Staba and Birot detection algorithms are discussed in the introduction.
Both routines have been implemented in MATLAB for comparison to the detection routine derived above. The Staba routine remains unmodified from its original publication in [9]. The Birot routine has been modified slightly from [10] to work with HFOs in both the ripple and fast ripple band for use on tEEG data (where the sample rate is not identical to that used in the original study). The frequency range is therefore expanded for the Birot routine to ratio the power of frequencies above and below 80 Hz. This should be considered a fair adaptation of the detection routine as the original assumptions of the detector remain unmodified.
All detection routines are evaluated on simulated tEEG in order to obtain performance metrics. Receiver operating characteristic (ROC) curves are provided for the first and second stage of each detection routine to show detection performance against a fixed EvNR. In order to show how the performance varies with SNR, a first stage comparison of the probability of detection (P D ) to EvNR is made.
Similarly, a stage two probability of error (P e ) vs. EvNR comparison is done to see how each detector classifies simulated HFOs against simulated spikes.
An assessment of each routine is then made on a 10 minute subset of real tEEG data, for which actual HFOs were manually identified, in order to get a more realistic detection count; and a comparison of time to run each algorithm is provided for discussion on clinical applications of each routine.

SOZ Estimation
It is believed that the count of HFO events on a channel and not the individual HFO parameters can be an indicator of SOZ [11,12,13]. In [14] an epileptogenicity index is created based on which channels a certain ratio of high-to-low frequency activity shows up immediately preceding a seizure. Since we know the HFO is a higher frequency signal, basing an SOZ estimate on the count of HFO events during the pre-ictal stage is reasonable.
The difficulty with SOZ estimation based on the count or frequency of detected HFO events is there will always be a channel with the most events and one with the least. This alone is not enough to quantify which channels are indicative of the SOZ since HFOs will be detected in patients who do not suffer from seizures at all.
Therefore, it is important to select only those channels that contain a statistically significant number of events as compared with the rest of the channels. Due to the risk factor with making incorrect estimations, no decision should be made over a wrong decision.
In [12] a measure was done to see if there was statistical significance of HFO channels to the SOZ using a comparison of the difference between detections inside and outside the SOZ to the sum of the same detection counts. This has been adapted in an attempt to find channels contributing to the SOZ by the following which is computed for each channel, where D CH is the number of detections on the current channel and D AV G is the average of the number of detections on all other channels. φ CH is the result, between ±1. Values closer to +1 have a higher count of detected HFOs and therefore should be more likely to be in the SOZ. This is to be tested after the HFO detector has been run on 5 tEEG studies for which a clinician has already identified the SOZ. Results will also be compared with output of the first stage detector alone to see if a statistical significance is available for all events of interest, opposed to only those that have been identified as HFOs by the sinusoid detector

Findings
Performance metrics are provided based on the tEEG simulator design. The detector and SOZ estimator are also evaluated based on real tEEG data subsets.
A discussion on the data shown here is provided in chapter 4.   In order to generate, 10,000 seconds of TVAR tEEG data are generated as H 0 The results show that the GLRT anomaly detector is superior for simulated tEEG data at all signal-to-noise ratios.
Two ROCs are shown for the second stage in order to estimate how each detection method performs against different type of data. In both cases, 1000 one-second simulations of TVAR tEEG data are generated and under H 1 HFOs are added to each simulation with frequency and duration randomized and EvNR equal to one. The null hypothesis in figure 11 is based on the background TVAR tEEG noise only whereas in figure 12 it is based on the same background with a simulated epileptic spike signal added with EvNR equal to one.
As described in chapter 2, the sinusoid detector is implemented in two different ways: the first with estimation from the CARD method and the second via the modified covariance method (MCM). The results of the MCM approach are also fed to a Period Counter to determine if there are enough oscillations for the signal to be considered an HFO. Therefore, both the MCM and Period Counter thresholds must contribute to the performance evaluation for the latter implementation. As can be seen from the results in figure 11, the sinusoid detector as implemented via the CARD method is best at distinguishing HFOs from tEEG background. Ideally, these results are less useful than those in figure 12 since the first detection stage should reject cases of background noise.
When the null hypothesis is instead a simulated spike signal, the period counter is clearly the best performer followed closely by the second Birot detec-tion stage [see figure 12]. The sinusoid detection based on MCM does poorly as the magnitude of the test statistic for a spike signal resulting in spurious oscillations is greater than that of an HFO. However, figure 11 shows that the MCM sinusoid detector does well against signals without oscillations and spikes are ultimately rejected since it is paired with the period counter. The CARD and Staba implementations are clearly inconsistent.
The intent of the final performance metric in figure 13 is to show how each routine does at properly classifying an EOI, assumed to contain either an HFO or a spike signal. Five-thousand seconds of TVAR tEEG background data are generated. The first iteration adds a spike at each second and has the second-stage of each routine decide if an HFO is present. Then, the same background data is used but a simulated HFO is added at every second and again each routine decides if an HFO is present. The resulting P e is the number of incorrect classifications over 10,000. This is done for each EvNR from zero to two with P F A fixed at 0.05.
For this example, the period counter and sine detector using MCM results are combined since the two work in series. The results of the P e curve shows that the sine detector paired with a count of the number of periods is better than the comparison methods at all SNRs. The sine detector based on CARD implementation with full-band data is unable to adequately distinguish between HFO and spike signals.

Real Data
A ten-minute subset of real tEEG data from a 1600 Hz study provided by CRE Medical has been annotated for HFO events by a manual review in MATLAB.
This study consists of 23 tEEG channels and the ten-minute segment contains 1535 HFOs. Each detection routine has been run on the data subset with results of the analysis is shown in table 1.  The voltage recorded at each contact site can vary based on a high number of factors, including electrode impedance. Therefore, the threshold of energy detectors are typically based on some average of the measured voltage for each channel.
The issue with using an energy detector for HFO detection is the threshold will be influenced by channels with a lot of events vs. those with none. A large number of EOIs will drive up the mean and cause more events to be missed. A channel with no such events will still cause detections. This effect worsens with surface recordings over iEEG due to the higher, and more frequently sporadic, environmental noise. The GLRT implementation on the other hand, compares consecutive windows to detect a change in the composition of a signal, regardless of event count.
As shown, the GLRT generally performs well in practice.
The difficulty with automatic HFO detection using TCREs is there is a higher environmental noise when compared with intracranial recordings for which the HFO detectors in the literature are designed. The Staba algorithm was used as a baseline detector as it is compared to the most often and was the first automated method to demonstrate some success. The Birot algorithm has gained traction recently and is arguably the best method for detecting fast ripples from iEEG.
It has been shown that the implementation of the GLRT presented here outperforms both of these methods asymptotically for detection of HFOs from surface recordings.
A significant finding from using the tEEG simulator at low EvNR values is that the sinusoid GLRT detector is able to locate oscillations that are not as obviously discernible from a manual review. This has the potential to pick up on more real events that would otherwise be missed by most detectors and some clinicians.
It should be noted that the comparison detectors were developed for iEEG whereas the detector presented here is derived with surface recordings in mind.
Since recording HFOs from surface potentials is a fairly new discovery, there are not many routines in the literature to compare to. This is why the author selected the detectors he thought would best adapt to tEEG data. As HFO recording from scalp electrodes gains more traction, finding the best HFO detector will become a more popular pursuit. To level the playing field with the comparisons presented here, a next step should be to use this same detector on real and simulated iEEG to further evaluate performance and determine if the GLRT-based algorithms offer any advantage over other detectors.
The tradeoff with the detector presented is run-time. Both stages operate by comparing single-sample shifted windows which can take a long time to process.
Run time can be reduced in stage 1 by using less of an overlap between windows and in stage 2 by defining upper and lower limits on the duration of the signal, both at the cost of a lower probability of detection. However, even with these adaptations the processing performance will not rival the detectors presented as comparisons.
In practice, EEG recordings can run anywhere from several hours to several days in duration. For this detector to be clinically tractable, the clinician would not need immediate results and its implementation would have to be clever enough to run a post-study batch analysis on a server or several PCs operating on a single channel at a time. Faster PCs are available today than the one used in this study and more RAM and higher processing speeds continue to be offered. Implementation for the comparison here is also performed in MATLAB; implementing in a language that is able to take advantage of multiple threads and better memory allocation has the potential to improve processing speeds. Although this detection routine is slow to run today, the algorithms presented will only become more practical as technology advances.
An argument could be made against the performance evaluation that the simulator design is based on a TVAR process which is an assumption of the derived anomaly detector. However, this implementation is based on the nature of neurophysiological data and the simulator was designed to be as accurate as possible.
High accuracy is expected if the observed signal is in agreement with the model [3]. The MATLAB simulator is a good representation of time-varying tEEG signals that contain spontaneous epileptic spikes and HFOs. In practice, there will be more types of neurophysiological and environmental noise that are difficult to account for in a simulation. It should also be understood that detectors evaluated on computer simulated data will almost always out-perform an equivalent in-field study; however, simulation is necessary for an algorithm implementation to define a meaningful performance metric and compare to competitor algorithms [4]. The results show that the detection routine derived as part of this thesis have good performance on both real and simulated tEEG data.
It was found that the CARD estimation is an unnecessary step in the majority of the simulations. High-pass filtering reduces, but does not eliminate, the background EEG described by the AR model. When an HFO signal is present and the window size is kept short the filtered signal window chosen by the anomaly detector should be mostly dominated by the sinusoidal frequency. Therefore using strictly a frequency peak estimate and basing the pre-whitener on the AR filter described by preceding data works well when compared to the CARD method which assumes the signal is overshadowed by AR noise. The conclusion here is that this approach is the best detector to use when overall run-time is not a concern.
The GLRT-based detection routines performed well on the small subset of real tEEG data at the cost of a high number of false alarms. In both implementations it was found that the active HFO channels contributed more to the false alarm count than those channels with no events. The Birot routine contributed a fewer number of false alarms but consequently missed more than half of the actual HFOs. The Staba method is clearly unfit for HFO detection using scalp recordings.
Unfortunately, the SOZ estimation method offered here did not work to positively identify electrode contacts corresponding to the brain regions responsible for propogating seizures in the patients studied. Clinician-identified SOZ channels can be fairly subjective making it a difficult item to test against; however it is important that any SOZ estimation method be well validated since incorrectly identifying the SOZ can have dire consequences in practice [5]. The latter goal of determining SOZ from the anomaly detector only is now understood to be flawedepileptic spike signals appear at a higher rate than HFOs in all examined studies and therefore the resulting phi estimate will always be low.
A typical 10-20 EEG montage has 21 electrodes whereas intracranial can often use hundreds of patient contacts at a time [3]. Therefore, even if it is possible to get a rough estimate of the SOZ from a tEEG surface recording it is not going to be as accurate, nor as localized, as an intracranial recording. However, it can be a first step to determining the SOZ and offer the clinician an idea of the general area to look before an open-brain procedure is necessary to perform a functional brain mapping or lesionectomy. The assumption that HFOs can be used to identify the SOZ remains a common topic in neurology research and the efforts of doing so using tEEG data should not be abandoned. It is likely that SOZ estimation from HFO detection will only work for epileptic patients with well-localized epileptic zones and not those suffering from generalized onset seizures. The study in [6] suggests channels in which high frequency activity shows up earlier preceding seizure onset are a positive indicator of SOZ. Therefore, next efforts should be made with TCRE HFO detection to not only base the SOZ estimate on event occurrence in the preictal state, but to add a greater weight to positively-identified HFOs the sooner they occur in an epileptic study.