Abstract
In finding effects of a binary treatment, practitioners use mostly either propensity score matching (PSM) or inverse probability weighting (IPW). However, many new treatment effect estimators are available now using propensity score and “prognostic score”, and some of these estimators are much better than PSM and IPW in several aspects. In this paper, we review those recent treatment effect estimators to show how they are related to one another, and why they are better than PSM and IPW. We compare 26 estimators in total through extensive simulation and empirical studies. Based on these, we recommend recent treatment effect estimators using “overlap weight”, and “targeted MLE” using statistical/machine learning, as well as a simple regression imputation/adjustment estimator using linear prognostic score models.
Acknowledgment
The authors are grateful to the Editor and the reviewers for their helpful comments.
-
Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
-
Research funding: The research of Myoung-jae Lee has been supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2020R1A2C1A01007786), and by a Korea University fund.
-
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.
References
1. Abadie, A, Imbens, G. Bias-corrected matching estimators for average treatment effects. J Bus Econ Stat 2011;29:1–11. https://doi.org/10.1198/jbes.2009.07333.Search in Google Scholar
2. Abadie, A, Imbens, G. Matching on the estimated propensity score. Econometrica 2016;84:781–807. https://doi.org/10.3982/ecta11293.Search in Google Scholar
3. Abadie, A, Drukker, D, Herr, JL, Imbens, GW. Implementing matching estimators for average treatment effects in Stata. STATA J 2004;4:290–311. https://doi.org/10.1177/1536867x0400400307.Search in Google Scholar
4. Austin, PC. A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003. Stat Med 2008;27:2037–49. https://doi.org/10.1002/sim.3150.Search in Google Scholar PubMed
5. Bodory, H, Camponovo, L, Huber, M, Lechner, M. The finite sample performance of inference methods for propensity score matching and weighting estimators. J Bus Econ Stat 2020;38:183–200. https://doi.org/10.1080/07350015.2018.1476247.Search in Google Scholar
6. Busso, M, DiNardo, J, McCrary, J. New evidence on the finite sample properties of propensity score reweighting and matching estimators. Rev Econ Stat 2014;96:885–97. https://doi.org/10.1162/rest_a_00431.Search in Google Scholar
7. Chatton, A, Le Borgne, F, Leyrat, C, Gillaizeau, F, Rousseau, C, Barbin, L, et al.. G-computation, propensity score-based methods, and targeted maximum likelihood estimator for causal inference with different covariates sets: a comparative simulation study. Sci Rep 2020;10:9219. https://doi.org/10.1038/s41598-020-65917-x.Search in Google Scholar PubMed PubMed Central
8. Choi, J, Lee, MJ. Overlap weight and propensity score residual for heterogeneous effects: a review with extensions. J Stat Plann Inference 2022. forthcoming.10.1016/j.jspi.2022.04.003Search in Google Scholar
9. Doenst, T, Haverich, T, Serruys, P, et al.. PCI and CABG for treating stable coronary artery disease: JACC review topic of the week. J Am Coll Cardiol 2019;73:964–76. https://doi.org/10.1016/j.jacc.2018.11.053.Search in Google Scholar PubMed
10. Elze, MC, Gregson, J, Baber, U, Williamson, E, Sartori, S, Mehran, R, et al.. Comparison of propensity score methods and covariate adjustment. J Am Coll Cardiol 2017;69:345–57. https://doi.org/10.1016/j.jacc.2016.10.060.Search in Google Scholar PubMed
11. Franklin, JM, Eddings, W, Austin, PC, Stuart, EA, Schneeweiss, S. Comparing the performance of propensity score methods in healthcare database studies with rare outcomes. Stat Med 2017;36:1946–63. https://doi.org/10.1002/sim.7250.Search in Google Scholar PubMed
12. Frölich, M. Finite sample properties of propensity-score matching and weighting estimators. Rev Econ Stat 2004;86:77–90. https://doi.org/10.1162/003465304323023697.Search in Google Scholar
13. Gruber, S, van der Laan, MJ. An application of collaborative targeted maximum likelihood estimation in causal inference and genomics. Int J Biostat 2010;6:18. https://doi.org/10.2202/1557-4679.1182.Search in Google Scholar PubMed PubMed Central
14. Hansen, BB. The prognostic analogue of the propensity score. Biometrika 2008;95:481–8. https://doi.org/10.1093/biomet/asn004.Search in Google Scholar
15. Hirano, K, Imbens, GW, Ridder, G. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 2003;71:1161–89. https://doi.org/10.1111/1468-0262.00442.Search in Google Scholar
16. Hong, G. Marginal mean weighting through stratification: adjustment for selection bias in multilevel data. J Educ Behav Stat 2010;35:499–531. https://doi.org/10.3102/1076998609359785.Search in Google Scholar
17. Horvitz, D, Thompson, D. A generalization of sampling without replacement from a finite population. J Am Stat Assoc 1952;47:663–85. https://doi.org/10.1080/01621459.1952.10483446.Search in Google Scholar
18. Huber, M, Lechner, M, Wunsch, C. The performance of estimators based on the propensity score. J Econom 2013;175:1–21. https://doi.org/10.1016/j.jeconom.2012.11.006.Search in Google Scholar
19. Imai, K, Ratkovic, M. Covariate balancing propensity score. J Roy Stat Soc 2014;76:243–63. https://doi.org/10.1111/rssb.12027.Search in Google Scholar
20. Imbens, GW. The role of the propensity score in estimating dose-response functions. Biometrika 2000;87:706–10. https://doi.org/10.1093/biomet/87.3.706.Search in Google Scholar
21. Imbens, GW, Rubin, DB. Causal inference for statistics, social, and biomedical sciences: an introduction. New York: Cambridge University Press; 2015.10.1017/CBO9781139025751Search in Google Scholar
22. Kang, JDY, Schafer, JL. Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci 2007;22:523–39. https://doi.org/10.1214/07-sts227.Search in Google Scholar
23. King, G, Nielsen, R. Why propensity scores should not be used for matching. Polit Anal 2019;27:435–54. https://doi.org/10.1017/pan.2019.11.Search in Google Scholar
24. Kreif, N, Gruber, S, Radice, R, Grieve, R, Sekhon, JS. Evaluating treatment effectiveness under model misspecification: a comparison of targeted maximum likelihood estimation with bias-corrected matching. Stat Methods Med Res 2016;25:2315–36. https://doi.org/10.1177/0962280214521341.Search in Google Scholar PubMed PubMed Central
25. Lee, MJ. Micro-econometrics for policy, program, and treatment effects. Oxford: Oxford University Press; 2005.10.1093/0199267693.001.0001Search in Google Scholar
26. Lee, MJ. Nonparametric tests for distributional treatment effects for censored responses. J Roy Stat Soc 2009;71:243–64. https://doi.org/10.1111/j.1467-9868.2008.00683.x.Search in Google Scholar
27. Lee, MJ. Treatment effects in sample selection models and their nonparametric estimation. J Econom 2012;167:317–29. https://doi.org/10.1016/j.jeconom.2011.09.018.Search in Google Scholar
28. Lee, MJ. Matching, regression discontinuity, difference in differences, and beyond. New York: Oxford University Press; 2016.10.1093/acprof:oso/9780190258733.001.0001Search in Google Scholar
29. Lee, MJ. Simple least squares estimator for treatment effects using propensity score residuals. Biometrika 2018;105:149–64. https://doi.org/10.1093/biomet/asx062.Search in Google Scholar
30. Lee, MJ. Instrument residual estimator for any response variable with endogenous binary treatment. J Roy Stat Soc 2021;83:612–35. https://doi.org/10.1111/rssb.12442.Search in Google Scholar
31. Lee, MJ, Lee, SH. Double robustness without weighting. Stat Probab Lett 2019;146:175–80. https://doi.org/10.1016/j.spl.2018.11.017.Search in Google Scholar
32. Li, L, Greene, T. A weighting analogue to pair matching in propensity score analysis. Int J Biostat 2013;9:215–34. https://doi.org/10.1515/ijb-2012-0030.Search in Google Scholar PubMed
33. Li, F, Morgan, KL, Zaslavsky, AM. Balancing covariates via propensity score weighting. J Am Stat Assoc 2018;113:390–400. https://doi.org/10.1080/01621459.2016.1260466.Search in Google Scholar
34. Linden, A. Improving causal inference with a doubly robust estimator that combines propensity score stratification and weighting. J Eval Clin Pract 2017;23:697–702. https://doi.org/10.1111/jep.12714.Search in Google Scholar PubMed
35. Linden, A, Uysal, SD, Ryan, A, Adams, JL. Estimating causal effects for multivalued treatments: a comparison of approaches. Stat Med 2016;35:534–52. https://doi.org/10.1002/sim.6768.Search in Google Scholar PubMed
36. Lunceford, JK, Davidian, M. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat Med 2004;23:2937–60. https://doi.org/10.1002/sim.1903.Search in Google Scholar PubMed
37. Moore, KL, van der Laan, MJ. Covariate adjustment in randomized trials with binary outcomes: targeted maximum likelihood estimation. Stat Med 2009;28:39–64. https://doi.org/10.1002/sim.3445.Search in Google Scholar PubMed PubMed Central
38. Muñoz, ID, van der Laan, MJ. Population intervention causal effects based on stochastic interventions. Biometrics 2012;68:541–9. https://doi.org/10.1111/j.1541-0420.2011.01685.x.Search in Google Scholar PubMed PubMed Central
39. Nayan, M, Hamilton, RJ, Juurline, DN, Finelli, A, Kulkarni, GS, Austin, PC. Critical appraisal of the application of propensity score methods in the urology literature. BJU Int 2017;120:873–80. https://doi.org/10.1111/bju.13930.Search in Google Scholar PubMed
40. Pang, M, Schuster, T, Filion, KB, Schnitzer, ME, Eberg, M, Platt, RW. Effect estimation in point-exposure studies with binary outcomes and high-dimensional covariate data–a comparison of targeted maximum likelihood estimation and inverse probability of treatment weighting. Int J Biostat 2016;12:20150034. https://doi.org/10.1515/ijb-2015-0034.Search in Google Scholar PubMed PubMed Central
41. Pearl, J. Causality, 2nd ed. Cambridge: Cambridge University Press; 2009.10.1017/CBO9780511803161Search in Google Scholar
42. Peikes, DN, Moreno, L, Orzol, SM. Propensity score matching: a note of caution for evaluators of social programs. Am Statistician 2008;62:222–31. https://doi.org/10.1198/000313008x332016.Search in Google Scholar
43. Porter, KE, Gruber, S, van der Laan, MJ, Sekhon, JS. The relative performance of targeted maximum likelihood estimators. Int J Biostat 2011;7:31. https://doi.org/10.2202/1557-4679.1308.Search in Google Scholar PubMed PubMed Central
44. Robins, JM, Mark, SD, Newey, WK. Estimating exposure effects by modelling the expectation of exposure conditional on confounders. Biometrics 1992;48:479–95. https://doi.org/10.2307/2532304.Search in Google Scholar
45. Robins, JM, Rotnitzky, A, Zhao, LP. Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 1994;89:846–66. https://doi.org/10.1080/01621459.1994.10476818.Search in Google Scholar
46. Robins, JM, Sued, M, Lei-Gomez, Q, Rotnitzky, A. Performance of double-robust estimators when inverse probability weights are highly variable. Stat Sci 2007;22:544–59. https://doi.org/10.1214/07-sts227d.Search in Google Scholar
47. Rose, S, van der Laan, MJ. Simple optimal weighting of cases and controls in case-control studies. Int J Biostat 2008;4:19. https://doi.org/10.2202/1557-4679.1115.Search in Google Scholar PubMed PubMed Central
48. Rosenbaum, PR. Observational studies, 2nd ed. New York: Springer; 2002.10.1007/978-1-4757-3692-2Search in Google Scholar
49. Rosenbaum, PR, Rubin, DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983;70:41–55. https://doi.org/10.1093/biomet/70.1.41.Search in Google Scholar
50. Rosenbaum, PR, Rubin, DB. Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc 1984;79:516–24. https://doi.org/10.1080/01621459.1984.10478078.Search in Google Scholar
51. Rosenbaum, PR, Rubin, DB. Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Statistician 1985;39:33–8. https://doi.org/10.2307/2683903.Search in Google Scholar
52. Rotnitzky, A, Lei, QH, Sued, M, Robins, JM. Improved double-robust estimation in missing data and causal inference models. Biometrika 2012;99:439–56. https://doi.org/10.1093/biomet/ass013.Search in Google Scholar PubMed PubMed Central
53. Rubin, D, van der Laan, MJ. A doubly robust censoring unbiased transformation. Int J Biostat 2007;3:4. https://doi.org/10.2202/1557-4679.1052.Search in Google Scholar PubMed
54. Rubin, DB, Thomas, N. Combining propensity score matching with additional adjustments for prognostic covariates. J Am Stat Assoc 2000;95:573–85. https://doi.org/10.1080/01621459.2000.10474233.Search in Google Scholar
55. Scharfstein, DO, Rotnitzky, A, Robins, JM. Adjusting for nonignorable drop-out using semiparametric nonresponse models. J Am Stat Assoc 1999;94:1096–120. https://doi.org/10.1080/01621459.1999.10473862.Search in Google Scholar
56. Schnitzer, ME, Moodie, EE, Platt, RW. Targeted maximum likelihood estimation for marginal time-dependent treatment effects under density misspecification. Biostatistics 2013;14:1–14. https://doi.org/10.1093/biostatistics/kxs024.Search in Google Scholar PubMed
57. Schnitzer, ME, van der Laan, MJ, Moodie, EE, Platt, RW. Effect of breastfeeding on gastrointestinal infection in infants: a targeted maximum likelihood approach for clustered longitudinal data. Ann Appl Stat 2014;8:703–25. https://doi.org/10.1214/14-aoas727.Search in Google Scholar PubMed PubMed Central
58. Stuart, EA. Matching methods for causal inference: a review and a look forward. Stat Sci 2010;25:1–21. https://doi.org/10.1214/09-STS313.Search in Google Scholar PubMed PubMed Central
59. Stuart, EA, Lee, BK, Leacy, FP. Prognostic score-based balance measures can be a useful diagnostic for propensity score methods in comparative effectiveness research. J Clin Epidemiol 2013;66:S84–90. https://doi.org/10.1016/j.jclinepi.2013.01.013.Search in Google Scholar PubMed PubMed Central
60. Vansteelandt, S, Daniel, RM. On regression adjustment for the propensity score. Stat Med 2014;33:4053–72. https://doi.org/10.1002/sim.6207.Search in Google Scholar PubMed
61. Van der Laan, MJ, Gruber, S. Targeted minimum loss based estimation of causal effects of multiple time point interventions. Int J Biostat 2012;8:9. https://doi.org/10.1515/1557-4679.1370.Search in Google Scholar PubMed
62. Van der Laan, MJ, Polley, EC, Hubbard, AE Super Learner, Statistical Applications in Genetics and Molecular Biology, 6; 2007. p. 1–21. https://doi.org/10.2202/1544-6115.1309.Search in Google Scholar PubMed
63. Van der Laan, MJ, Rubin, D. Targeted maximum likelihood learning. Int J Biostat 2006;2:11. https://doi.org/10.2202/1557-4679.1043.Search in Google Scholar
64. Waernbaum, I. Model misspecification and robustness in causal inference: comparing matching with doubly robust estimation. Stat Med 2012;31:1572–81. https://doi.org/10.1002/sim.4496.Search in Google Scholar PubMed
65. Wu, S, Ding, Y, Wu, F, Hu, J, Mao, P. Application of propensity-score matching in four leading medical journals. Epidemiology 2015;26:e19–20. https://doi.org/10.1097/ede.0000000000000249.Search in Google Scholar PubMed
66. Zhao, Z. Using matching to estimate treatment effects. Rev Econ Stat 2004;86:91–107. https://doi.org/10.1162/003465304323023705.Search in Google Scholar
Supplementary Material
The online version of this article offers supplementary material (https://doi.org/10.1515/ijb-2021-0005).
© 2022 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Research Articles
- Doubly robust adaptive LASSO for effect modifier discovery
- Increasing the efficiency of randomized trial estimates via linear adjustment for a prognostic score
- Review
- Review and comparison of treatment effect estimators using propensity and prognostic scores
- Research Articles
- Error rate control for classification rules in multiclass mixture models
- Regression trees and ensembles for cumulative incidence functions
- Causal inference under over-simplified longitudinal causal models
- Causal inference under interference with prognostic scores for dynamic group therapy studies
- Bayesian multi-response nonlinear mixed-effect model: application of two recent HIV infection biomarkers
- A Bayesian semiparametric accelerate failure time mixture cure model
- Quantifying the extent of visit irregularity in longitudinal data
- An improved method for analysis of interrupted time series (ITS) data: accounting for patient heterogeneity using weighted analysis
- A robust hazard ratio for general modeling of survival-times
- Penalized likelihood estimation of the proportional hazards model for survival data with interval censoring
- A parametric approach to relaxing the independence assumption in relative survival analysis
- The number of response categories in ordered response models
- A comparison of joint dichotomization and single dichotomization of interacting variables to discriminate a disease outcome
- Spike detection for calcium activity
Articles in the same Issue
- Frontmatter
- Research Articles
- Doubly robust adaptive LASSO for effect modifier discovery
- Increasing the efficiency of randomized trial estimates via linear adjustment for a prognostic score
- Review
- Review and comparison of treatment effect estimators using propensity and prognostic scores
- Research Articles
- Error rate control for classification rules in multiclass mixture models
- Regression trees and ensembles for cumulative incidence functions
- Causal inference under over-simplified longitudinal causal models
- Causal inference under interference with prognostic scores for dynamic group therapy studies
- Bayesian multi-response nonlinear mixed-effect model: application of two recent HIV infection biomarkers
- A Bayesian semiparametric accelerate failure time mixture cure model
- Quantifying the extent of visit irregularity in longitudinal data
- An improved method for analysis of interrupted time series (ITS) data: accounting for patient heterogeneity using weighted analysis
- A robust hazard ratio for general modeling of survival-times
- Penalized likelihood estimation of the proportional hazards model for survival data with interval censoring
- A parametric approach to relaxing the independence assumption in relative survival analysis
- The number of response categories in ordered response models
- A comparison of joint dichotomization and single dichotomization of interacting variables to discriminate a disease outcome
- Spike detection for calcium activity