LONGITUDINAL DATA PREDICTION IN EHR: COMPARISON OF GLMM AND MACHINE LEARNING METHODS

In this study, we aimed to develop and compare models to predict individuals with suicidal ideation using Generalized Linear Mixed Model(GLMM) and Machine Learning(ML) algorithms. We conducted secondary data analysis with data collected by an online clinical measurement company. The sample included 402 individuals aged over 18 years who have received more than three psychiatric treatments since 2017. The data were split into a training set(70%) and a testing set(30%) randomly. In the training set, GLMM, RF model, and GBDT model were trained with all the features. Conditional RF and GBDT with variables selected based on GLMM were trained next. Subsequently, the fitted models were used to predict suicide ideation in the test set. All analyses were conducted in R and Python. The prediction models based on ML algorithms (R from 0.260 to 0.409, MSE from 1.761 to 2.202, MAE from 0.942 to 0.985) performed better than GLMM (R = 0.115, MSE=2.880, MAE=1.013). The insights gained from this study may be of assistance to broadly apply ML algorithms to the massive data from EHR to enhance suicide risk prediction. There is, therefore, a definite need for improvements understanding prediction accuracy versus traditionally employed GLMM approaches.

Risk behavior prediction can help to avert disadvantageous outcomes in mental health and health behavior. This determination of antecedent risks to a specific behavior problem or mental disorder can be employed to guide preventative decisions by clinicians, doctors, policymakers, and educators.
With an annual death rate of 800,000 people, suicide has been a leading cause of unnatural death that invites worldwide attention [1]. Arsenault-Lapierre, Kim, and Turecki [2] found in a meta-analysis that most suicide completers have been diagnosed with mental disorders or illnesses, such as depression and substance use disorder. There is evidence that suicidal ideation plays a crucial role in predicting future suicide attempts and behaviors. Therefore, assessing suicidal ideation among individuals having psychiatric treatment is an essential strategy in suicide prevention.
Although risk prediction models are very prevalent in commercially relevant areas such as finance and the physical sciences, mental health data archives have only recently begun to be sufficiently robust for this kind of modeling. Electronic Health Records (EHRs) are commonly used in health care situations, which provide access to longitudinal data that can be used to analyze change over time and predict future outcomes. Machine learning is a branch of artificial intelligence (AI), in which a computer learns from the raw data to generate the underlying rules. The increasing volume of person-specific multivariate trend data in mental health EHRs makes it possible to enhance prediction by utilizing machine learning methods.
The goal of the current research was to compare the viability and performance 1 of suicidal ideation prediction using two approaches on a longitudinal mental health EHR database: a) the Generalized Linear Mixed Model (GLMM) and, b) machine learning algorithms. We expected that machine learning algorithms could provide higher accuracy than GLMM when used to predict individuals at high risk of suicidal ideation.

CHAPTER 2
Literature Review

Importance of Suicidal Ideation Prediction
Mental disorders are common in the United States. Approximately 18.5% U.S. adults, about 44.7 million, live with a mental disorder, and about 9.8 million, 20% of this group, are suffering severe mental illnesses in 2016 [3]. Within this population, individuals are at an increased risk of involving risk behaviors (e.g., substance use, suicide risk, unsafe sex).
Previous studies reported higher rates of suicidal thoughts and attempts in psychotic populations [4]. Psychosis is found to directly or indirectly contribute to suicide risks. Alcoholism, drug abuse, and self-injurious behaviors are all risky behaviors that individual may use to get temporary relief from intense stress or emotional pain, however, almost always results in greater feelings of loneliness and hopelessness. These types of vicious cycles can cause severe mental illness or can worsen an existing mental health disorder [5], or even threaten one's own life. This underscores the urgency and importance of suicide prevention among psychotherapy patients.
Considered as a significant predictor of committing suicide, the term 'suicidal ideation' was used in this thesis to refer to the thoughts about killing oneself.
Even though individuals with suicidal ideation do not all subsequently have suicide behaviors or attempts, patients who have consistent suicidal ideation are at higher risk of committing suicide [6]. Previous research has established that among individuals who have or had suicidal ideation, the probability of future suicide behaviors or attempts was approximately 55% [7]. About 60% of the transitions from suicidal ideation to suicide plan, and then to suicide attempt, occurred within the first year after onset of suicidal ideation [7]. Predicting individuals' levels of suicidal ideation might be an effective approach to target the patients at high risk of suicide and provide relevant interventions.

Electronic Health Records
Electronic Health Records (EHRs) are real-time, patient-centered records that make health-related information available instantly and securely to authorized users [8]. EHRs replaced paper-based systems in many healthcare organizations and are used commonly to capture and utilize the vast amount of detailed clinical information and offers lots of advantages for clinical research, such as cost efficiency, the considerable amount of data, and the ability to analyze data over time [9]. A recent report of the American Medical Informatics Association stated that [10]: Secondary uses of health data can enhance individuals' health care experiences, expand knowledge about diseases and treatments, strengthen understanding of health care system effectiveness and efficiency, support public health and security goals, and aid businesses in meeting customers' needs.
In the United States, due to its broad implementation, accumulated EHR data have become a crucial resource for clinical studies. Notably, access to longitudinal data that can be used to predict future outcomes opens opportunities to support decision making or clinical judgment for patients. Simon et al.[11] used EHRs to develop prediction models for suicide attempt and suicide death over 90 days for both mental health and primary care visits. Their models showed that c-statistics (equivalent to area under the curve) for suicide attempts prediction ranged from 0.833 to 0.861, achieving a significant improvement for the predictive performance than existing prediction tools [11]. EHRs offer a data-rich environment for scientists to conduct essential research and connect with practice in the future.

Generalized Linear Mixed Model
Longitudinal data refers to the data obtained by observing and recording each participants at successive time points over a period of time. Compared with the cross-sectional data, the main advantage of longitudinal data is that it can more effectively estimate the person-specific and group trends of changes over time within and between samples. Longitudinal data analysis supports modeling of the relations between response variables and covariates, and also considers the The generalized linear mixed model (GLMM) can be used to access predictive relations between selected covariates and random coefficients reflecting the variability of person-specific intercepts or trends [12]. GLMM combines two basic and widely-used statistical methods, the linear mixed model (LMM) and the generalized linear model (GLM). Therefore, it can be used for categorical longitudinal variables and provides a flexible framework for analyzing grouped data while considering the within-group correlation. It can handle non-normal data by using link functions and exponential distributions, involving both fixed and random effects [13].

Limitations of GLMM
GLMM is a routine option to explore longitudinal prediction for categorical outcomes in behavioral science [14,15]. However, there are two main limitations while applying GLMM in EHRs to develop predictive models.
First, similar to other linear parametric models, GLMM is predicated on multiple statistical assumptions, including additivity of the linear predictors, in-dependence of errors, equal variance of errors (homoscedasticity) and normality of errors [16][17] [18]. Under those assumptions, we can estimate statistical tests and magnitudes according to some criteria. However, these are all based on assumptions about data distribution and models. If these assumptions are not met, the statistical criteria would become meaningless and Type I error rate would increase [17]. Therefore, the traditional method has a strong dependence on the assumptions and theories, which are difficult to verify in some cases.
Second, a challenge in deploying this model is that the use of a large dataset with hundreds of independent variables introduces the possibility of over-fitting [19]. In predictive modeling, we regard the real underlying factors as the signal, which we want to learn from the data. Noise, conversely, refers to the unnecessary detail or randomness in a specific dataset. When researchers include too much irrelevant information in a regression equation to increase the effect size, the prediction model will be vulnerable to overfitting, resulting in lower suitabilty to be applied to other databases.

Advantage and Feasibility of Machine Learning Methods
Machine learning is a compelling predictive method for large-scale, highdimensional data, enabling computers to "learn by themselves" based on data (e.g., progressively improve their performance on a specific task), without the model being directly specified [20]. This method can be used to minimize the problem of overfitting by searching for stable data patterns based on algorithmic rules [21].
The increasing accessibility of big data in health care makes its great potential to enhance health service with data mining and machine learning methods. In addition, common analysis models in psychological studies, such as regression and classification models, can be alternatively pursued via machine learning algorithms to identify linear and nonlinear patterns without predefined underlying assump-tions [17]. Identified patterns can be utilized to make predictions about future events and then be continuously used to improve the model performance for better prediction.
Several recent papers have pointed to some success stories when behavioral scientists have employed a predictive approach involving pattern detection [22].
Therefore, we explored the application of both traditional GLMM and machine learning algorithms to assess the relative performance of modeling strategies in attaining a stable, accurate and efficient suicidal ideation prediction models. The readers should bear in mind that the purpose of the current study was to compare the methods used to build prediction models, rather than to explain or support a theoretical model. We expected this research would provide researchers with some critical guidance on model selection, through a fully worked pair of example involving the same database. A future goal is to standardize our application of machine learning methods on EHR data to inform a real-time data-driven clinical decision support system.

CHAPTER 3
Methodology

Data
The current study involved secondary data analysis, which has been classified by the University of Rhode Island Institutional Review Board (IRB) as non-humansubject research HU1718-124, as it is a secondary data analysis. The thesis only engaged with part of clinical EHR data. A software company, Mirah [27], collected all information and data, much of which had been formally approved for research purposed at other academic institutions. MIRAH offers routine measurement of patient symptoms to multiple clinic sites all over the United States. Their software collects a hybrid of clinical observations and provides clinical data-tracking and clinician feedback features that assists clinicians in improving health care. Patients were asked to finish the measurements before every psychotherapy session. The measurements include a Computer Adaptive Multidimensional Scale (CAMS) [28] and some commonly used psychotherapy scales for common disorders, such as the Generalized anxiety disorder (GAD) scale, the Patient Health Questionnaire (PHQ), and the Post Traumatic Stress Disorder Checklist (PCL).
The primary data used to demonstrate the analysis in the study consist of 402 participants [27] Risk. Each scale contains a screening question. On the first visit or treatment, a patient will be requested to finish all the questions. For subsequent visits, the screening question will be asked, but the following questions will only be asked if the screening question has a high score or if a running average of their previous scores is above a threshold. The adaptive feature of this measurement led to large number of missing values for many questions. Hence, only the 17 screening questions listed in Table 1 for which data were more complete, were selected for the analysis in current study. The data were split into train set(70%) and test set(30%) randomly.

Response(dependent) Variable -suicidal ideation
For each patient, the degree of agreement to the statement, "I think it would be better if I were dead.", was used as a response measure (dependent variable) of suicidal ideation and represented by Likert points, ranging from 1 to 7.

Independent Variables
Baseline demographic patient characteristics used for analysis included gender and age. The 7-point Likert scores to the other 15 questions at each time point were also included as independent variables in the study. Considering the response variable varies over time, time was included as an independent variable and measured by how many weeks the patient has been visiting.

Exploratory data analysis
To find a probability distribution that best fits the data, descriptive analyses were conducted with continuous variables as shown in Table 2.
Based on the descriptive analysis, the scores for suicidal ideation were rel-    To conduct GLMM, the distribution of the response variable needs to meet the assumption that variable needs to be normally distributed. When the histogram looks roughly bell-shaped and symmetric, or the Q-Q plots generally fall close to a diagonal line, we can conclude that the variable is normally distributed. However, based on Figure 1. The variable Suicidal Ideation failed to meet the requirements.
A log-normal (or lognormal) distribution was used to test if the logarithm of the variable was normally distributed as shown in Figure 2. The y axis represents the observations and the x axis represents the quantiles modeled by the distribution.
The solid line represents a perfect distribution fit and the dashed lines are the confidence intervals of the perfect distribution fit. In Figure 2, observations fell closer to the dashed lines. Therefore, the response variable was transformed using log-transformation for use in the GLMMs. To select the variables to be included in the model, a correlation matrix was calculated in Table 3.

Generalized Linear Mixed Model
We utilized R [29] and lme4 [30] to fit a generalized linear mixed model and perform the prediction with the test set. There are essentially two ways to fit a GLMM: 1. Starting with a small model and building up, or,

Starting with a big model and trimming down.
Considering the slated comparison with machine learning methods, we used the later strategy to proceed because it is more analogous to ML approach. All the variables that reflected significant correlations with Suicidal Ideation were added to the first baseline model as fixed effects, as well as the interaction forms of Time and Substance Use, Time and Resilience, Quadratic Time and Substance Use, and Quadratic Time and Resilience. The between-patient differences were included as random effects. The intercept for each patient was set as random. As a random effect, we included the vector of random intercepts for subjects.

Machine Learning Methods: Random Forest(RF) and Gradient Boosting Decision Tree (GBDT)
The Random Forest(RF) model was selected for its robustness and performance in both accuracy and ease of implementation [31] [32]. RF is a supervised Machine Learning algorithm for both regression and classification with the use of multiple decision trees and a technique called Bagging [33]. Bagging, in the RF method, involves training each decision tree on a different data sample where sampling is done with replacement. Then, multiple decision trees are combined to determine the final results as shown in Figure 3.

Figure 3: Random Forest Algorithm
Several recent studies have shown that classification models based on RF had relatively high accuracy when predicting suicidal ideation or suicide attempts [25] [34]. We deployed the RF regression in our study to predict the scores for suicidal ideation.
Gradient boosting is one of powerful techniques for building predictive models [35]. Depending on the type of the problem, a loss function is used to optimize the model. The most basic structure of GBDT, as shown in Figure 4, is also decision trees. We will start with one decision tree, then trees are added one at a time stepwise. The parameters are modified and the results are combined at each step until we reach our goal of minimizing the loss to an acceptable level.

Figure 4: Gradient Boosting Decision Tree Algorithm
We used Python [36] and Scikit-learn package [37] to proceed RF regression and GBDT regression. All the fixed-effect variables were used to train the models with training set. We then used the grid search method from the Scikit-learn package [37] to determine the optimal values to be used for the hyperparameters of our models. Grid search is the process of performing hyper parameter tuning in order to determine the optimal values for a given model [38]. This method was also applied when training conditional RF and GBDT models -conditional models with variables selected from GLMM final version. Then, the best four models were deployed on test set to predict the score for suicidal ideation.

CHAPTER 4 Results
The fit of models and estimators of the predictors were compared among all the 14 Generalized Linear Mixed Models as shown in Table 4  We demonstrated the best fit models (Final GLMM, RF model, GBDT model, conditional RF model, and conditional GBDT model) with test data set to predict suicidal ideation. The performance of stet GLMM was displayed in residual plots in Figure 5. on the y-axis. The distance from the line at 0 is how bad the prediction was for that value. The spots did not cluster towards the middle of the plot as shown in Figure   5(a). Figure 5(b) is a plot comparing the actual values of dependent variable with predicted values, in which there was not a strong correlation between the model's predictions and its actual values. Overall, the fit of the GLMM was poor. Then, the performance of all five models was evaluated using standard goodness of model fit (mean squared error(MSE), R 2 , and mean absolute error(MAE)) as shown in   Table 6 presents that overall, ML models performed better than GLMM when predicting suicidal ideation in test set. The GBDT models performed marginally better than RF models and the GBDT with all features included had the highest relative accuracy with R 2 = .409, MSE = 1.761 and MAE = .942.
RF and GBDT both consist of multiple decision trees. Each node in the decision tree is a condition of a single feature to separate the dataset into two sets.
Within each set, the values for the dependent variable are similar. The measure based on which the optimal condition is chosen is called impurity, which is variance in regression trees [39]. Therefore, how much each feature decreases the weighted impurity will be computed when training a tree. Furthermore, when training a forest, the average impurity decrease from each feature can be calculated. And we used the term 'importance of the feature' to present the ranking of the features computed from this measure as shown in Table 6.  There are also some practical problems we need to concerned about. First, the sample used in the study was collected from one state in the US. In addition to assessing the prediction performance within this sample, we should consider generalizability to other care-provider locations or patient populations. Second, we need to be aware that prediction models cannot replace clinical judgment, but only provide an alert message to help with decision making and treatment modification. The models developed in this study were designed to address the question of who has the thought of killing himself/herself, but not when he/she will commit suicide. Decision-makers should be cautious when interpreting the predicted results with respect to patient safety.
In conclusion, this initial applied demonstration showed that Machine Learn-