Date of Award
2024
Degree Type
Thesis
Degree Name
Master of Science in Statistics
Department
Computer Science and Statistics
First Advisor
Haihan Yu
Abstract
This thesis addresses the challenge of nonlinear feature selection in datasets that include categorical features. Conventional feature selection methods often struggle with nonlinear relationships and are ineffective in handling categorical variables. This limitation leads to suboptimal model performance and interpretability issues. Therefore, there is an urgent need to develop methodologies that can robustly handle nonlinearities and categorical features simultaneously.
To tackle this problem, this thesis proposes and explores novel knockoff methods. Knockoff methods have shown promise in feature selection tasks by generating "knockoff" features that mimic the statistical properties of the original features, enabling robust variable selection while controlling the false discovery rate (FDR). In this work, knockoff methods are applied to datasets with categorical features, leveraging advanced statistical techniques to handle the unique challenges posed by categorical variables in nonlinear feature selection.
The findings of this thesis demonstrate the efficacy of the proposed knockoff methods in addressing linear and nonlinear feature selection tasks that involve categorical data. Through comprehensive simulation, we show that the knockoff methods outperform traditional approaches in terms of both FDR and power. Additionally, the methods exhibit robustness across different types of relationships, including linear, nonlinear, and categorical feature distributions, highlighting their versatility and effectiveness in real-world data analysis scenarios.
Recommended Citation
Khalil Loo, Behrooz, "KNOCKOFF METHODS FOR NONLINEAR FEATURE SELECTION IN DATA WITH CATEGORICAL FEATURES" (2024). Open Access Master's Theses. Paper 2507.
https://digitalcommons.uri.edu/theses/2507