Abstract
Objectives
Propensity score (PS) weighting methods are commonly used to adjust for confounding in observational treatment comparisons. However, in the setting of substantial covariate imbalance, PS values may approach 0 and 1, yielding extreme weights and inflated variance of the estimated treatment effect. Adaptations of the standard inverse probability of treatment weights (IPTW) can reduce the influence of extremes, including trimming methods that exclude people with PS values near 0 or 1. Alternatively, overlap weighting (OW) optimizes criteria related to bias and variance, and performs well compared to other PS weighting and matching methods. However, it has not been compared to propensity score stratification (PSS). PSS has some of the same potential advantages; being insensitive extreme values. We sought to compare these methods in the setting of substantial covariate imbalance to generate practical recommendations.
Methods
Analytical derivations were used to establish connections between methods, and simulation studies were conducted to assess bias and variance of alternative methods.
Results
We find that OW is generally superior, particularly as covariate imbalance increases. In addition, a common method for implementing PSS based on Mantel–Haenszel weights (PSS-MH) is equivalent to a coarsened version of OW and can perform nearly as well. Finally, trimming methods increase bias across methods (IPTW, PSS and PSS-MH) unless the PS model is re-fit to the trimmed sample and weights or strata are re-derived. After trimming with re-fitting, all methods perform similarly to OW.
Conclusions
These results may guide the selection, implementation and reporting of PS methods for observational studies with substantial covariate imbalance.
Funding source: Patient-Centered Outcomes Research Institute
Award Identifier / Grant number: ME-2018C2-13289
Funding source: Agency for Healthcare Research and Quality
Award Identifier / Grant number: RFA-HS-14-006
Acknowledgments
We appreciate the clinical input and motivating questions from COMPARE-UF PI Evan Myers and COMPARE-UF investigators.
-
Research ethics: This study involves primarily simulated data. The analysis of COMPARE-UF was conducted within the COMPARE-UF project approved by the local Institutional Review Board under Protocol 00057883 with PI Evan Myers.
-
Informed consent: Not applicable.
-
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
-
Competing interests: Authors state no conflict of interest.
-
Research funding: This research is supported in part by the Patient-Centered Outcomes Research Institute (PCORI) contract ME-2018C2-13289. The COMPARE-UF study and analysis was supported by the Agency for Healthcare Research and Quality grant RFA-HS-14-006. The contents of this article are solely the responsibility of the authors and do not necessarily represent the view of PCORI nor AHRQ.
-
Data availability: The COMPARE-UF data will be made publicly available through the Patient-Centered Outcomes Data Repository (PCODR), but has not yet been tranferred as of this submission.
References
1. Rosenbaum, PR, Rubin, DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983;70:41–55. https://doi.org/10.1093/biomet/70.1.41.Search in Google Scholar
2. Lunceford, JK, Davidian, M. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat Med 2004;23:2937–60. https://doi.org/10.1002/sim.1903.Search in Google Scholar PubMed
3. Austin, PC. The performance of different propensity-score methods for estimating differences in proportions (risk differences or absolute risk reductions) in observational studies. Stat Med 2010;29:2137–48. https://doi.org/10.1002/sim.3854.Search in Google Scholar PubMed PubMed Central
4. Hernán, MA, JM Robins. Causal inference: what if. Boca Raton: Chapman & Hall/CRC; 2020.Search in Google Scholar
5. Crump, RK, Hotz, VJ, Imbens, GW, Mitnik, OA. Dealing with limited overlap in estimation of average treatment effects. Biometrika 2009;96:187–99. https://doi.org/10.1093/biomet/asn055.Search in Google Scholar
6. Stürmer, T, Rothman, KJ, Avorn, J, Glynn, RJ. Treatment effects in the presence of unmeasured confounding: dealing with observations in the tails of the propensity score distribution—a simulation study. Am J Epidemiol 2010;172:843–54. https://doi.org/10.1093/aje/kwq198.Search in Google Scholar PubMed PubMed Central
7. Patorno, E, RJ Glynn, S Hernández-Díaz, J Liu, S Schneeweiss. Studies with many covariates and few outcomes: selecting covariates and implementing propensity-score–based confounding adjustments. Epidemiology 2014;25:268–78. https://doi.org/10.1097/ede.0000000000000069.Search in Google Scholar PubMed
8. Li, L, Greene, T. A weighting analogue to pair matching in propensity score analysis. Int J Biostat 2013;9:215–34. https://doi.org/10.1515/ijb-2012-0030.Search in Google Scholar PubMed
9. Yoshida, K, Hernández-Díaz, S, Solomon, DH, Jackson, JW, Gagne, JJ, Glynn, RJ, et al.. Matching weights to simultaneously compare three treatment groups: comparison to three-way matching. Epidemiology 2017;28:387. https://doi.org/10.1097/ede.0000000000000627.Search in Google Scholar
10. Li, F, Morgan, KL, Zaslavsky, AM. Balancing covariates via propensity score weighting. J Am Stat Assoc 2018;113:390–400. https://doi.org/10.1080/01621459.2016.1260466.Search in Google Scholar
11. Zanutto, EL. A comparison of propensity score and linear regression analysis of complex survey data. J Data Sci 2006;4:67–91. https://doi.org/10.6339/jds.2006.04(1).233.Search in Google Scholar
12. Rudolph, KE, Colson, KE, Stuart, EA, Ahern, J. Optimally combining propensity score subclasses. Stat Med 2016;35:4937–47. https://doi.org/10.1002/sim.7046.Search in Google Scholar PubMed PubMed Central
13. Austin, PC, Schuster, T. The performance of different propensity score methods for estimating absolute effects of treatments on survival outcomes: a simulation study. Stat Methods Med Res 2016;25:2214–37. https://doi.org/10.1177/0962280213519716.Search in Google Scholar PubMed PubMed Central
14. Elze, MC, Gregson, J, Baber, U, Williamson, E, Sartori, S, Mehran, R, et al.. Comparison of propensity score methods and covariate adjustment: evaluation in 4 cardiovascular studies. J Am Coll Cardiol 2017;69:345–57. https://doi.org/10.1016/j.jacc.2016.10.060.Search in Google Scholar PubMed
15. Li, F, Thomas, LE, Li, F. Addressing extreme propensity scores via the overlap weights. Am J Epidemiol 2019;188:250–7. https://doi.org/10.1093/aje/kwy201.Search in Google Scholar PubMed
16. Stuart, EA. Matching methods for causal inference: a review and a look forward. Stat Sci 2010;25:1. https://doi.org/10.1214/09-sts313.Search in Google Scholar PubMed PubMed Central
17. Mao, H, Li, L, Greene, T. Propensity score weighting analysis and treatment effect discovery. Stat Methods Med Res 2019;28:2439–54. https://doi.org/10.1177/0962280218781171.Search in Google Scholar PubMed
18. Li, F, Li, F. Propensity score weighting for causal inference with multiple treatments. Ann Appl Stat 2019;13:2389–415. https://doi.org/10.1214/19-aoas1282.Search in Google Scholar
19. Zhou, Y, Matsouaka, RA, Thomas, L. Propensity score weighting under limited overlap and model misspecification. Stat Methods Med Res 2020;29:3721–56. https://doi.org/10.1177/0962280220940334.Search in Google Scholar PubMed
20. Stewart, EA, Lytle, BL, Thomas, L, Wegienka, GR, Jacoby, V, Diamond, MP, et al.. The comparing options for management: patient-centered results for uterine fibroids (compare-uf) registry: rationale and design. Am J Obstet Gynecol 2018;219:95.e1–e10. https://doi.org/10.1016/j.ajog.2018.05.004.Search in Google Scholar PubMed PubMed Central
21. Nicholson, WK, Wegienka, G, Zhang, S, Wallace, K, Stewart, E, Laughlin-Tommaso, S, et al.. Short-term health-related quality of life after hysterectomy compared with myomectomy for symptomatic leiomyomas. Obstet Gynecol 2019;134:261. https://doi.org/10.1097/aog.0000000000003354.Search in Google Scholar
22. Yang, S, Lorenzi, E, Papadogeorgou, G, Wojdyla, DM, Li, F, Thomas, LE. Propensity score weighting for causal subgroup analysis. Stat Med 2021;40:4294–309. https://doi.org/10.1002/sim.9029.Search in Google Scholar PubMed PubMed Central
23. Lee, BK, Lessler, J, Stuart, EA. Improving propensity score weighting using machine learning. Stat Med 2010;29:337–46. https://doi.org/10.1002/sim.3782.Search in Google Scholar PubMed PubMed Central
24. Austin, PC, Stuart, EA. Moving towards best practice when using inverse probability of treatment weighting (iptw) using the propensity score to estimate causal treatment effects in observational studies. Stat Med 2015;34:3661–79. https://doi.org/10.1002/sim.6607.Search in Google Scholar PubMed PubMed Central
25. Austin, PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behav Res 2011;46:399–424. https://doi.org/10.1080/00273171.2011.568786.Search in Google Scholar PubMed PubMed Central
26. Cheng, C, Li, F, Thomas, LE, Li, F. Addressing extreme propensity scores in estimating counterfactual survival functions via the overlap weights. Am J Epidemiol 2022;191:1140–51. https://doi.org/10.1093/aje/kwac043.Search in Google Scholar PubMed
27. Zubizarreta, JR. Stable weights that balance covariates for estimation with incomplete outcome data. J Am Stat Assoc 2015;110:910–22. https://doi.org/10.1080/01621459.2015.1023805.Search in Google Scholar
28. Thomas, LE, Li, F, Pencina, MJ. Overlap weighting: a propensity score method that mimics attributes of a randomized clinical trial. JAMA 2020;323:2417–18. https://doi.org/10.1001/jama.2020.7819.Search in Google Scholar PubMed
29. Rosenbaum, PR, Rubin, DB. Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc 1984;79:516–24. https://doi.org/10.1080/01621459.1984.10478078.Search in Google Scholar
30. Austin, PC. Bootstrap vs asymptotic variance estimation when using propensity score weighting with continuous and binary outcomes. Stat Med 2022;4426–43. https://doi.org/10.1002/sim.9519.Search in Google Scholar PubMed PubMed Central
31. Zhou, T, Tong, G, Li, F, Thomas, LE, Li, F. Psweight: an R package for propensity score weighting analysis. R J 2022. https://doi.org/10.32614/rj-2022-011.Search in Google Scholar
32. Tu, W, Zhou, X-H. A bootstrap confidence interval procedure for the treatment effect using propensity score subclassification. Health Serv Outcome Res Methodol 2002;3:135–47. https://doi.org/10.1023/a:1024212107921.10.1023/A:1024212107921Search in Google Scholar
33. Robins, JM, MA Hernan, B Brumback. Marginal structural models and causal inference in epidemiology. Epidemiology 2000;11:550–60. https://doi.org/10.1097/00001648-200009000-00011.Search in Google Scholar PubMed
34. Franklin, JM, Rassen, JA, Bartels, DB, Schneeweiss, S. Prospective cohort studies of newly marketed medications: using covariate data to inform the design of large-scale studies. Epidemiology 2014:126–33, https://doi.org/10.1097/ede.0000000000000020.Search in Google Scholar PubMed
35. Böhning, D, Sangnawakij, P, Holling, H. Confidence interval estimation for the mantel–haenszel estimator of the risk ratio and risk difference in rare event meta-analysis with emphasis on the bootstrap. J Stat Comput Simulat 2022;92:1267–91. https://doi.org/10.1080/00949655.2021.1991347.Search in Google Scholar
© 2023 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Research Articles
- Development and application of an evidence-based directed acyclic graph to evaluate the associations between metal mixtures and cardiometabolic outcomes
- Addressing substantial covariate imbalance with propensity score stratification and balancing weights: connections and recommendations
- Tutorial
- On some pitfalls of the log-linear modeling framework for capture-recapture studies in disease surveillance
Articles in the same Issue
- Research Articles
- Development and application of an evidence-based directed acyclic graph to evaluate the associations between metal mixtures and cardiometabolic outcomes
- Addressing substantial covariate imbalance with propensity score stratification and balancing weights: connections and recommendations
- Tutorial
- On some pitfalls of the log-linear modeling framework for capture-recapture studies in disease surveillance