Date of Award
Master of Science in Statistics
Computer Science and Statistics
Sparse regression models are an actively burgeoning area of statistical learning research. A subset of these models seek to separate out significant and non-trivial main effects from noise effects within the regression framework (yielding so-called “sparse” coefficient estimates, where many estimated effects are zero) by imposing penalty terms on a likelihood-based estimator. As this area of the field is relatively recent, many published techniques have not yet been investigated under a wide range of applications. Our goal is to fit several penalty-based estimators for the Cox semiparametric survival model in the context of genomic covariates on breast cancer survival data where there are potentially many more covariates than observations. We use the elastic net family of estimators, special cases of which are the LASSO and ridge regression. Simultaneously, we aim to investigate whether the finer resolution of next-generation genetic sequencing techniques adds improved predictive power to the breast cancer patient survival models. Models are compared using estimates of concordance, namely the c-statistic and a variant which we refer to as Uno’s C. We find that ridge regression models best fit our dataset. Concordance estimates suggest finer resolution genetic covariates improve model predictions, though further work with more observations is required.
Amin, Daven, "Risk Classification in High Dimensional Survival Models" (2016). Open Access Master's Theses. Paper 958.