Abstract
A major focus of causal inference is the estimation of heterogeneous average treatment effects (HTE) – average treatment effects within strata of another variable of interest such as levels of a biomarker, education, or age strata. Inference involves estimating a stratum-specific regression and integrating it over the distribution of confounders in that stratum – which itself must be estimated. Standard practice involves estimating these stratum-specific confounder distributions independently (e.g. via the empirical distribution or Rubin’s Bayesian bootstrap), which becomes problematic for sparsely populated strata with few observed confounder vectors. In this paper, we develop a nonparametric hierarchical Bayesian bootstrap (HBB) prior over the stratum-specific confounder distributions for HTE estimation. The HBB partially pools the stratum-specific distributions, thereby allowing principled borrowing of confounder information across strata when sparsity is a concern. We show that posterior inference under the HBB can yield efficiency gains over standard marginalization approaches while avoiding strong parametric assumptions about the confounder distribution. We use our approach to estimate the adverse event risk of proton versus photon chemoradiotherapy across various cancer types.
Funding source: School of Medicine
Award Identifier / Grant number: Unassigned
Funding source: University of Pennsylvania
Award Identifier / Grant number: Unassigned
Acknowledgement
We would like to thank James Metz and Justin Bekelman (Department of Radiation Oncology, Perelman School of Medicine, University of Pennsylvania) for data support.
-
Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
-
Research funding: None declared.
-
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.
References
1. Hill, JL. Bayesian nonparametric modeling for causal inference. J Comput Graph Stat 2011;20:217–40. https://doi.org/10.1198/jcgs.2010.08162.Suche in Google Scholar
2. Zeldow, B, Lo Re, VIII, Roy, J. A semiparametric modeling approach using Bayesian additive regression trees with an application to evaluate heterogeneous treatment effects. Ann Appl Stat 2019;13:1989–2010. https://doi.org/10.1214/19-AOAS1266.Suche in Google Scholar PubMed PubMed Central
3. Henderson, NC, Louis, TA, Rosner, GL, Varadhan, R. Individualized treatment effects with censored data via fully nonparametric Bayesian accelerated failure time models. Biostatistics 2018;21:50–68. https://doi.org/10.1093/biostatistics/kxy028.Suche in Google Scholar PubMed PubMed Central
4. Hahn, PR, Murray, JS, Carvalho, CM. Bayesian regression tree models for causal inference: regularization, confounding, and heterogeneous effects. Bayesian Anal 2020. https://doi.org/10.1214/19-BA1195.Suche in Google Scholar
5. Caron, A, Baio, G, Manolopoulou, I. Shrinkage Bayesian causal forests for heterogeneous treatment effects estimation. J Comput Graph Stat 2022;0:1–13. https://doi.org/10.1080/10618600.2022.2067549.Suche in Google Scholar
6. Starling, JE, Murray, JS, Lohr, PA, Aiken, ARA, Carvalho, CM, Scott, JG. Targeted Smooth Bayesian Causal Forests: an analysis of heterogeneous treatment effects for simultaneous vs. interval medical abortion regimens over gestation. Ann Appl Stat 2021;15:1194–219. https://doi.org/10.1214/20-aoas1438.Suche in Google Scholar
7. Oganisian, A, Mitra, N, Roy, JA. A Bayesian nonparametric model for zero-inflated outcomes: prediction, clustering, and causal estimation. Biometrics 2020.10.1111/biom.13244Suche in Google Scholar PubMed
8. Roy, J, Lum, KJ, Zeldow, B, Dworkin, JD, Re, VLIII, Daniels, MJ. Bayesian nonparametric generative models for causal inference with missing at random covariates. Biometrics 2018;74:1193–202.10.1111/biom.12875Suche in Google Scholar PubMed PubMed Central
9. Kim, C, Daniels, MJ, Marcus, BH, Roy, JA. A framework for Bayesian nonparametric inference for causal effects of mediation. Biometrics 2017;73:401–9.10.1111/biom.12575Suche in Google Scholar PubMed PubMed Central
10. A Bayesian nonparametric approach for evaluating the causal effect of treatment in randomized trials with semi-competing risks. Biostatistics: 2020.https://doi.org/10.1093/biostatistics/kxaa008.Kxaa008.Suche in Google Scholar
11. Xu, D, Daniels, MJ, Winterstein, AG. A Bayesian nonparametric approach to causal inference on quantiles. Biometrics 2018;74:986–96.10.1111/biom.12863Suche in Google Scholar PubMed PubMed Central
12. Shahn, Z, Madigan, D. Latent class mixture models of treatment effect heterogeneity. Bayesian Anal 2017;12:831–54. https://doi.org/10.1214/16-ba1022.Suche in Google Scholar
13. Wang, C, Dominici, F, Parmigiani, G, Zigler, CM. Accounting for uncertainty in confounder and effect modifier selection when estimating average causal effects in generalized linear models. Biometrics 2015;71:654–65.10.1111/biom.12315Suche in Google Scholar PubMed PubMed Central
14. Nethery, RC, Mealli, F, Dominici, F. Estimating population average causal effects in the presence of non-overlap: the effect of natural gas compressor station exposure on cancer mortality. Ann Appl Stat 2019;13:1242–67. https://doi.org/10.1214/18-AOAS1231.Suche in Google Scholar PubMed PubMed Central
15. The Bayesian bootstrap. Ann Stat 1981;9:130–4. https://doi.org/10.1214/aos/1176345338.Suche in Google Scholar
16. Boatman, JA, Vock, DM, Koopmeiners, JS. Borrowing from supplemental sources to estimate causal effects from a primary data source. arXiv preprint arXiv:2003.09680, 2020.10.1002/sim.9114Suche in Google Scholar PubMed
17. Roy, J, Lum, KJ, Daniels, MJ. A Bayesian nonparametric approach to marginal structural models for point treatments and a continuous or survival outcome. Biostatistics 2016;18:32–47. https://doi.org/10.1093/biostatistics/kxw029.Suche in Google Scholar PubMed PubMed Central
18. Taddy, M, Gardner, M, Chen, L, Draper, D. A nonparametric Bayesian analysis of heterogenous treatment effects in digital experimentation. J Bus Econ Stat 2016;34:661–72. https://doi.org/10.1080/07350015.2016.1172013.Suche in Google Scholar
19. Makela, S, Si, Y, Gelman, A. Bayesian inference under cluster sampling with probability proportional to size. Stat Med 2018;37:3849–68.10.1002/sim.7892Suche in Google Scholar PubMed PubMed Central
20. Barrientos, A, Pena, V. Bayesian bootstraps for massive data. Bayesian Anal 2020;15:363–88. https://doi.org/10.1214/19-BA1155.Suche in Google Scholar
21. Kleiner, A, Talwalkar, A, Sarkar, P, Jordan, MI. A scalable bootstrap for massive data. J Roy Stat Soc B 2014;76:795–816. https://doi.org/10.1111/rssb.12050.Suche in Google Scholar
22. Efron, B, Gong, G. A leisurely look at the bootstrap, the jackknife, and cross-validation. Am Statistician 1983;37:36–48.10.1080/00031305.1983.10483087Suche in Google Scholar
23. Silverman, BW, Young, GA. The bootstrap: to smooth or not to smooth? Biometrika 1987;74:469–79. https://doi.org/10.1093/biomet/74.3.469.Suche in Google Scholar
24. Wang, S. Optimizing the smoothed bootstrap. Ann Inst Stat Math 1995;47:65–80. https://doi.org/10.1007/bf00773412.Suche in Google Scholar
25. Rubin, DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 1974;66:688–701. https://doi.org/10.1037/h0037350.Suche in Google Scholar
26. Robins, J. A new approach to causal inference in mortality studies with a sustained exposure period - application to control of the healthy worker survivor effect. Math Model 1986;7:1393–512. https://doi.org/10.1016/0270-0255(86)90088-6.Suche in Google Scholar
27. Saarela, O, Stephens, DA, Moodie, EEM, Klein, MB. On Bayesian estimation of marginal structural models. Biometrics 2015;71:279–88.10.1111/biom.12269Suche in Google Scholar PubMed
28. Teh, YW, Jordan, MI, Beal, MJ, Blei, DM. Hierarchical Dirichlet processes. J Am Stat Assoc 2006;101:1566–81. https://doi.org/10.1198/016214506000000302.Suche in Google Scholar
29. Blackwell, D, MacQueen, JB. Ferguson distributions via polya urn schemes. Ann Stat 1973;1:353–5. https://doi.org/10.1214/aos/1176342372.Suche in Google Scholar
30. Baumann, BC, Mitra, N, Harton, JG, Xiao, Y, Wojcieszynski, AP, Gabriel, PE, et al.. Comparative effectiveness of proton vs photon therapy as part of concurrent chemoradiotherapy for locally advanced cancer. JAMA Oncol 2020;6:237–46. https://doi.org/10.1001/jamaoncol.2019.4889.Suche in Google Scholar PubMed PubMed Central
31. Chipman, HA, George, EI, McCulloch, RE. Bart: bayesian additive regression trees. Ann Appl Stat 2010;4:266–98. https://doi.org/10.1214/09-AOAS285.Suche in Google Scholar
32. Gelman, A, Hill, J, Yajima, M. Why we (usually) don’t have to worry about multiple comparisons. J Res Educ Effect 2012;5:189–211. https://doi.org/10.1080/19345747.2011.618213.Suche in Google Scholar
Supplementary Material
This article contains supplementary material (https://doi.org/10.1515/ijb-2022-0051).
© 2022 Walter de Gruyter GmbH, Berlin/Boston
Artikel in diesem Heft
- Frontmatter
- Research Articles
- Survival analysis using deep learning with medical imaging
- Using a population-based Kalman estimator to model the COVID-19 epidemic in France: estimating associations between disease transmission and non-pharmaceutical interventions
- Approximate reciprocal relationship between two cause-specific hazard ratios in COVID-19 data with mutually exclusive events
- Sensitivity of estimands in clinical trials with imperfect compliance
- Highly robust causal semiparametric U-statistic with applications in biomedical studies
- Hierarchical Bayesian bootstrap for heterogeneous treatment effect estimation
- Penalized logistic regression with prior information for microarray gene expression classification
- Bayesian learners in gradient boosting for linear mixed models
- Unequal allocation of sample/event sizes with considerations of sampling cost for testing equality, non-inferiority/superiority, and equivalence of two Poisson rates
- HiPerMAb: a tool for judging the potential of small sample size biomarker pilot studies
- Heterogeneity in meta-analysis: a comprehensive overview
- On stochastic dynamic modeling of incidence data
- Power of testing for exposure effects under incomplete mediation
- Exact correction factor for estimating the OR in the presence of sparse data with a zero cell in 2 × 2 tables
- Right-censored partially linear regression model with error in variables: application with carotid endarterectomy dataset
- Assessing HIV-infected patient retention in a program of differentiated care in sub-Saharan Africa: a G-estimation approach
- Prediction-based variable selection for component-wise gradient boosting
Artikel in diesem Heft
- Frontmatter
- Research Articles
- Survival analysis using deep learning with medical imaging
- Using a population-based Kalman estimator to model the COVID-19 epidemic in France: estimating associations between disease transmission and non-pharmaceutical interventions
- Approximate reciprocal relationship between two cause-specific hazard ratios in COVID-19 data with mutually exclusive events
- Sensitivity of estimands in clinical trials with imperfect compliance
- Highly robust causal semiparametric U-statistic with applications in biomedical studies
- Hierarchical Bayesian bootstrap for heterogeneous treatment effect estimation
- Penalized logistic regression with prior information for microarray gene expression classification
- Bayesian learners in gradient boosting for linear mixed models
- Unequal allocation of sample/event sizes with considerations of sampling cost for testing equality, non-inferiority/superiority, and equivalence of two Poisson rates
- HiPerMAb: a tool for judging the potential of small sample size biomarker pilot studies
- Heterogeneity in meta-analysis: a comprehensive overview
- On stochastic dynamic modeling of incidence data
- Power of testing for exposure effects under incomplete mediation
- Exact correction factor for estimating the OR in the presence of sparse data with a zero cell in 2 × 2 tables
- Right-censored partially linear regression model with error in variables: application with carotid endarterectomy dataset
- Assessing HIV-infected patient retention in a program of differentiated care in sub-Saharan Africa: a G-estimation approach
- Prediction-based variable selection for component-wise gradient boosting