Addressing substantial covariate imbalance with propensity score stratification and balancing weights: connections and recommendations

Laine E. Thomas; Steven M. Thomas; Fan Li; Roland A. Matsouaka

doi:10.1515/em-2022-0131

Article

Addressing substantial covariate imbalance with propensity score stratification and balancing weights: connections and recommendations

Laine E. Thomas , Steven M. Thomas , Fan Li and Roland A. Matsouaka

Published/Copyright: November 13, 2023

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Epidemiologic Methods Volume 12 Issue s1

Abstract

Objectives

Propensity score (PS) weighting methods are commonly used to adjust for confounding in observational treatment comparisons. However, in the setting of substantial covariate imbalance, PS values may approach 0 and 1, yielding extreme weights and inflated variance of the estimated treatment effect. Adaptations of the standard inverse probability of treatment weights (IPTW) can reduce the influence of extremes, including trimming methods that exclude people with PS values near 0 or 1. Alternatively, overlap weighting (OW) optimizes criteria related to bias and variance, and performs well compared to other PS weighting and matching methods. However, it has not been compared to propensity score stratification (PSS). PSS has some of the same potential advantages; being insensitive extreme values. We sought to compare these methods in the setting of substantial covariate imbalance to generate practical recommendations.

Methods

Analytical derivations were used to establish connections between methods, and simulation studies were conducted to assess bias and variance of alternative methods.

Results

We find that OW is generally superior, particularly as covariate imbalance increases. In addition, a common method for implementing PSS based on Mantel–Haenszel weights (PSS-MH) is equivalent to a coarsened version of OW and can perform nearly as well. Finally, trimming methods increase bias across methods (IPTW, PSS and PSS-MH) unless the PS model is re-fit to the trimmed sample and weights or strata are re-derived. After trimming with re-fitting, all methods perform similarly to OW.

Conclusions

These results may guide the selection, implementation and reporting of PS methods for observational studies with substantial covariate imbalance.

Keywords: propensity score; positivity; overlap weighting; propensity score stratification; inverse probability of treatment weighting; trimming

Corresponding author: Laine E. Thomas, Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, USA, E-mail: laine.thomas@duke.edu.

Funding source: Patient-Centered Outcomes Research Institute

Award Identifier / Grant number: ME-2018C2-13289

Funding source: Agency for Healthcare Research and Quality

Award Identifier / Grant number: RFA-HS-14-006

Acknowledgments

We appreciate the clinical input and motivating questions from COMPARE-UF PI Evan Myers and COMPARE-UF investigators.

Research ethics: This study involves primarily simulated data. The analysis of COMPARE-UF was conducted within the COMPARE-UF project approved by the local Institutional Review Board under Protocol 00057883 with PI Evan Myers.
Informed consent: Not applicable.
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Competing interests: Authors state no conflict of interest.
Research funding: This research is supported in part by the Patient-Centered Outcomes Research Institute (PCORI) contract ME-2018C2-13289. The COMPARE-UF study and analysis was supported by the Agency for Healthcare Research and Quality grant RFA-HS-14-006. The contents of this article are solely the responsibility of the authors and do not necessarily represent the view of PCORI nor AHRQ.
Data availability: The COMPARE-UF data will be made publicly available through the Patient-Centered Outcomes Data Repository (PCODR), but has not yet been tranferred as of this submission.

References

1. Rosenbaum, PR, Rubin, DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983;70:41–55. https://doi.org/10.1093/biomet/70.1.41.Search in Google Scholar

2. Lunceford, JK, Davidian, M. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat Med 2004;23:2937–60. https://doi.org/10.1002/sim.1903.Search in Google Scholar PubMed

3. Austin, PC. The performance of different propensity-score methods for estimating differences in proportions (risk differences or absolute risk reductions) in observational studies. Stat Med 2010;29:2137–48. https://doi.org/10.1002/sim.3854.Search in Google Scholar PubMed PubMed Central

4. Hernán, MA, JM Robins. Causal inference: what if. Boca Raton: Chapman & Hall/CRC; 2020.Search in Google Scholar

5. Crump, RK, Hotz, VJ, Imbens, GW, Mitnik, OA. Dealing with limited overlap in estimation of average treatment effects. Biometrika 2009;96:187–99. https://doi.org/10.1093/biomet/asn055.Search in Google Scholar

6. Stürmer, T, Rothman, KJ, Avorn, J, Glynn, RJ. Treatment effects in the presence of unmeasured confounding: dealing with observations in the tails of the propensity score distribution—a simulation study. Am J Epidemiol 2010;172:843–54. https://doi.org/10.1093/aje/kwq198.Search in Google Scholar PubMed PubMed Central

7. Patorno, E, RJ Glynn, S Hernández-Díaz, J Liu, S Schneeweiss. Studies with many covariates and few outcomes: selecting covariates and implementing propensity-score–based confounding adjustments. Epidemiology 2014;25:268–78. https://doi.org/10.1097/ede.0000000000000069.Search in Google Scholar PubMed

8. Li, L, Greene, T. A weighting analogue to pair matching in propensity score analysis. Int J Biostat 2013;9:215–34. https://doi.org/10.1515/ijb-2012-0030.Search in Google Scholar PubMed

9. Yoshida, K, Hernández-Díaz, S, Solomon, DH, Jackson, JW, Gagne, JJ, Glynn, RJ, et al.. Matching weights to simultaneously compare three treatment groups: comparison to three-way matching. Epidemiology 2017;28:387. https://doi.org/10.1097/ede.0000000000000627.Search in Google Scholar

10. Li, F, Morgan, KL, Zaslavsky, AM. Balancing covariates via propensity score weighting. J Am Stat Assoc 2018;113:390–400. https://doi.org/10.1080/01621459.2016.1260466.Search in Google Scholar

11. Zanutto, EL. A comparison of propensity score and linear regression analysis of complex survey data. J Data Sci 2006;4:67–91. https://doi.org/10.6339/jds.2006.04(1).233.Search in Google Scholar

12. Rudolph, KE, Colson, KE, Stuart, EA, Ahern, J. Optimally combining propensity score subclasses. Stat Med 2016;35:4937–47. https://doi.org/10.1002/sim.7046.Search in Google Scholar PubMed PubMed Central

13. Austin, PC, Schuster, T. The performance of different propensity score methods for estimating absolute effects of treatments on survival outcomes: a simulation study. Stat Methods Med Res 2016;25:2214–37. https://doi.org/10.1177/0962280213519716.Search in Google Scholar PubMed PubMed Central

14. Elze, MC, Gregson, J, Baber, U, Williamson, E, Sartori, S, Mehran, R, et al.. Comparison of propensity score methods and covariate adjustment: evaluation in 4 cardiovascular studies. J Am Coll Cardiol 2017;69:345–57. https://doi.org/10.1016/j.jacc.2016.10.060.Search in Google Scholar PubMed

15. Li, F, Thomas, LE, Li, F. Addressing extreme propensity scores via the overlap weights. Am J Epidemiol 2019;188:250–7. https://doi.org/10.1093/aje/kwy201.Search in Google Scholar PubMed

16. Stuart, EA. Matching methods for causal inference: a review and a look forward. Stat Sci 2010;25:1. https://doi.org/10.1214/09-sts313.Search in Google Scholar PubMed PubMed Central

17. Mao, H, Li, L, Greene, T. Propensity score weighting analysis and treatment effect discovery. Stat Methods Med Res 2019;28:2439–54. https://doi.org/10.1177/0962280218781171.Search in Google Scholar PubMed

18. Li, F, Li, F. Propensity score weighting for causal inference with multiple treatments. Ann Appl Stat 2019;13:2389–415. https://doi.org/10.1214/19-aoas1282.Search in Google Scholar

19. Zhou, Y, Matsouaka, RA, Thomas, L. Propensity score weighting under limited overlap and model misspecification. Stat Methods Med Res 2020;29:3721–56. https://doi.org/10.1177/0962280220940334.Search in Google Scholar PubMed

20. Stewart, EA, Lytle, BL, Thomas, L, Wegienka, GR, Jacoby, V, Diamond, MP, et al.. The comparing options for management: patient-centered results for uterine fibroids (compare-uf) registry: rationale and design. Am J Obstet Gynecol 2018;219:95.e1–e10. https://doi.org/10.1016/j.ajog.2018.05.004.Search in Google Scholar PubMed PubMed Central

21. Nicholson, WK, Wegienka, G, Zhang, S, Wallace, K, Stewart, E, Laughlin-Tommaso, S, et al.. Short-term health-related quality of life after hysterectomy compared with myomectomy for symptomatic leiomyomas. Obstet Gynecol 2019;134:261. https://doi.org/10.1097/aog.0000000000003354.Search in Google Scholar

22. Yang, S, Lorenzi, E, Papadogeorgou, G, Wojdyla, DM, Li, F, Thomas, LE. Propensity score weighting for causal subgroup analysis. Stat Med 2021;40:4294–309. https://doi.org/10.1002/sim.9029.Search in Google Scholar PubMed PubMed Central

23. Lee, BK, Lessler, J, Stuart, EA. Improving propensity score weighting using machine learning. Stat Med 2010;29:337–46. https://doi.org/10.1002/sim.3782.Search in Google Scholar PubMed PubMed Central

24. Austin, PC, Stuart, EA. Moving towards best practice when using inverse probability of treatment weighting (iptw) using the propensity score to estimate causal treatment effects in observational studies. Stat Med 2015;34:3661–79. https://doi.org/10.1002/sim.6607.Search in Google Scholar PubMed PubMed Central

25. Austin, PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behav Res 2011;46:399–424. https://doi.org/10.1080/00273171.2011.568786.Search in Google Scholar PubMed PubMed Central

26. Cheng, C, Li, F, Thomas, LE, Li, F. Addressing extreme propensity scores in estimating counterfactual survival functions via the overlap weights. Am J Epidemiol 2022;191:1140–51. https://doi.org/10.1093/aje/kwac043.Search in Google Scholar PubMed

27. Zubizarreta, JR. Stable weights that balance covariates for estimation with incomplete outcome data. J Am Stat Assoc 2015;110:910–22. https://doi.org/10.1080/01621459.2015.1023805.Search in Google Scholar

28. Thomas, LE, Li, F, Pencina, MJ. Overlap weighting: a propensity score method that mimics attributes of a randomized clinical trial. JAMA 2020;323:2417–18. https://doi.org/10.1001/jama.2020.7819.Search in Google Scholar PubMed

29. Rosenbaum, PR, Rubin, DB. Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc 1984;79:516–24. https://doi.org/10.1080/01621459.1984.10478078.Search in Google Scholar

30. Austin, PC. Bootstrap vs asymptotic variance estimation when using propensity score weighting with continuous and binary outcomes. Stat Med 2022;4426–43. https://doi.org/10.1002/sim.9519.Search in Google Scholar PubMed PubMed Central

31. Zhou, T, Tong, G, Li, F, Thomas, LE, Li, F. Psweight: an R package for propensity score weighting analysis. R J 2022. https://doi.org/10.32614/rj-2022-011.Search in Google Scholar

32. Tu, W, Zhou, X-H. A bootstrap confidence interval procedure for the treatment effect using propensity score subclassification. Health Serv Outcome Res Methodol 2002;3:135–47. https://doi.org/10.1023/a:1024212107921.10.1023/A:1024212107921Search in Google Scholar

33. Robins, JM, MA Hernan, B Brumback. Marginal structural models and causal inference in epidemiology. Epidemiology 2000;11:550–60. https://doi.org/10.1097/00001648-200009000-00011.Search in Google Scholar PubMed

34. Franklin, JM, Rassen, JA, Bartels, DB, Schneeweiss, S. Prospective cohort studies of newly marketed medications: using covariate data to inform the design of large-scale studies. Epidemiology 2014:126–33, https://doi.org/10.1097/ede.0000000000000020.Search in Google Scholar PubMed

35. Böhning, D, Sangnawakij, P, Holling, H. Confidence interval estimation for the mantel–haenszel estimator of the risk ratio and risk difference in rare event meta-analysis with emphasis on the bootstrap. J Stat Comput Simulat 2022;92:1267–91. https://doi.org/10.1080/00949655.2021.1991347.Search in Google Scholar

Received: 2022-08-26

Accepted: 2023-08-25

Published Online: 2023-11-13

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/em-2022-0131

Keywords for this article

propensity score; positivity; overlap weighting; propensity score stratification; inverse probability of treatment weighting; trimming