The comparative efficacy of imputation methods for missing data in structural equation modeling
Missing data is a problem that permeates much of the research being done today. Traditional techniques of replacing missing values include listwise deletion, pairwise deletion, and mean substitution. However, these techniques may have serious limitations. Recent developments in computing allow more sophisticated techniques to be used. It is the aim of this dissertation to compare the efficacies of five current, and promising, methods that can be used to replace missing data. These five techniques include expectation maximization (EM), full information maximum likelihood (FIML), mean substitution (Mean), multiple imputation (MI), and regression imputation (Regression). The mean substitution is being included as a point of comparison to some base level. Multiple imputation is a method of much promise which, however, has not seen extensive use due to its complexity and computer demands. Fortunately, the recent increase in computer power at a reasonable price make methods, such as multiple imputation, feasible and companies are beginning to include them with their current software. ^ In addition, the focus of this paper is on structural equation modeling (SEM), a popular statistical technique, which subsumes many of the traditional statistical procedures. To effectuate the comparison, this dissertation examines two models that have been used in prior research. One of these models involves the application of confirmatory factor analysis (CFA) to an actual data set. The other is a full structural equation model and is generated by simulation in accord with previous research. ^ After extensive bootstrapping and simulation, the results indicate that FIML is a superior method in the estimation of most different types of parameters in a structural equation modeling format. However, multiple imputation is superior in the estimation of standard errors. Multiple imputation also is an excellent estimator, with the exception of data sets with over 24% missing information. Another consideration is the fact that FIML is a direct method and does not actually impute the missing data, whereas multiple imputation does. The author thus concludes that FIML is an excellent method and that multiple imputation, because of its theoretical and distributional underpinnings, is probably most promising for future applications in this field. ^
Alan David Olinsky,
"The comparative efficacy of imputation methods for missing data in structural equation modeling"
Dissertations and Master's Theses (Campus Access).