Weaknesses of goodness-of-fit tests for evaluating propensity score models: The case of th omitted confounder

Sherry Weitzen, The Warren Alpert Medical School
Kate L. Lapane, The Warren Alpert Medical School
Alicia Y. Toledano, The Warren Alpert Medical School
Anne L. Hume, University of Rhode Island
Vincent Mor, The Warren Alpert Medical School


Purpose: Propensity scores are used in observational studies to adjust for confounding, although they do not provide control for confounders omitted from the propensity score model. We sought to determine if tests used to evaluate logistic model fit and discrimination would be helpful in detecting the omission of an important confounder in the propensity score. Methods: Using simulated data, we estimated propensity scores under two scenarios: (1) including all confounders and (2) omitting the binary confounder. We compared the propensity score model fit and discrimination under each scenario, using the Hosmer-Lemeshow goodness-of-fit (GOF) test and the c-statistic. We measured residual confounding in treatment effect estimates adjusted by the propensity score omitting the confounder. Results: The GOF statistic and discrimination of propensity score models were the same for models excluding an important predictor of treatment compared to the full propensity score model. The GOF test failed to detect poor model fit for the propensity score model omitting the confounder. C-statistics under both scenarios were similar. Residual confounding was observed from using the propensity score excluding the confounder (range: 1-30%). Conclusions: Omission of important confounders from the propensity score leads to residual confounding in estimates of treatment effect. However, tests of GOF and discrimination do not provide information to detect missing confounders in propensity score models. Our findings suggest that it may not be necessary to compute GOF statistics or model discrimination when developing propensity score models. Copyright © 2004 Wiley & Sons, Ltd.