Date of Award
2024
Degree Type
Thesis
Degree Name
Master of Science in Statistics
Department
Computer Science and Statistics
First Advisor
Yichi Zhang
Abstract
With the advancement of artificial intelligence and deep learning, effective learning processes have become increasingly crucial. This thesis explores methods for logistic regression analysis in the context of grouped or clustered data, where only the number or proportion of positive binary responses within each group is observed. The primary focus is on developing and evaluating a proposed method designed to handle such grouped data structures effectively. The study compares the performance of the proposed method with several established techniques, including Generalized Linear Models (GLM), Empirical Risk Minimization (ERM), Easy Learning from Label Proportions (Easy LLP), and Proportion Matching (PropMatch).
The dataset used for these comparisons was simulated to reflect real-world scenarios with varying bag sizes, covariate distributions, and label proportions. Additionally, the study implements unequal bag sizes to assess how different grouping criteria impact model performance. The models were evaluated on metrics such as bias, standard deviation (SD), coverage rate, and standard error (SE) of coefficient estimates, along with their predictive accuracy.
The results demonstrate that the proposed method performs comparably to traditional methods in estimating model coefficients, particularly under conditions of clustered data. Despite slight deviations in standard error compared to GLM due to differences in the availability of individual-level labels, the proposed method exhibits competitive accuracy. These findings suggest that the proposed method is a viable alternative for analyzing data with inherent grouping structures, providing a robust framework for prediction and inference in complex, real-world data settings.
Recommended Citation
Dey, Anamika, "ESTIMATING THE LOGISTIC REGRESSION FROM LABEL PROPORTIONS" (2024). Open Access Master's Theses. Paper 2558.
https://digitalcommons.uri.edu/theses/2558