Home Highly robust causal semiparametric U-statistic with applications in biomedical studies
Article
Licensed
Unlicensed Requires Authentication

Highly robust causal semiparametric U-statistic with applications in biomedical studies

  • Anqi Yin ORCID logo , Ao Yuan EMAIL logo and Ming T. Tan EMAIL logo
Published/Copyright: November 28, 2022

Abstract

With our increased ability to capture large data, causal inference has received renewed attention and is playing an ever-important role in biomedicine and economics. However, one major methodological hurdle is that existing methods rely on many unverifiable model assumptions. Thus robust modeling is a critically important approach complementary to sensitivity analysis, where it compares results under various model assumptions. The more robust a method is with respect to model assumptions, the more worthy it is. The doubly robust estimator (DRE) is a significant advance in this direction. However, in practice, many outcome measures are functionals of multiple distributions, and so are the associated estimands, which can only be estimated via U-statistics. Thus most existing DREs do not apply. This article proposes a broad class of highly robust U-statistic estimators (HREs), which use semiparametric specifications for both the propensity score and outcome models in constructing the U-statistic. Thus, the HRE is more robust than the existing DREs. We derive comprehensive asymptotic properties of the proposed estimators and perform extensive simulation studies to evaluate their finite sample performance and compare them with the corresponding parametric U-statistics and the naive estimators, which show significant advantages. Then we apply the method to analyze a clinical trial from the AIDS Clinical Trials Group.


Corresponding authors: Ao Yuan and Ming T. Tan, Department of Biostatistics, Bioinformatics and Biomathematics Georgetown University, Washington, DC 20057, USA, E-mail: (A. Yuan) and (M. T. Tan)

Acknowledgments

This research is part of the Ph.D. dissertation of the first author who would like to thank members of her dissertation committee for their helpful comments. This research is supported in part by NIH grant R21CA270585.

  1. Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.

  2. Research funding: None declared.

  3. Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

References

1. Horvitz, DG, Thompson, DJ. A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 1952;47:663–85. https://doi.org/10.1080/01621459.1952.10483446.Search in Google Scholar

2. Crump, R, Hotz, VJ, Imbens, GW, Mitnik, OA. Dealing with limited overlap in estimation of average treatment effects. Biometrika 2009;96:187–99. https://doi.org/10.1093/biomet/asn055.Search in Google Scholar

3. Yang, S, Imbens, GW, Cui, Z, Faries, DE, Kadziola, Z. Propensity score matching and subclassification in observational studies with multi-level treatments. Biometrics 2016;72:1055–65. https://doi.org/10.1111/biom.12505.Search in Google Scholar PubMed

4. Li, H, Graham, DJ, Ding, H,Ren, G. Comparison of empirical Bayes and propensity score methods for road safety evaluation: a simulation study. Accid Anal Prev. 2019;129:148–55.10.1016/j.aap.2019.05.015Search in Google Scholar PubMed

5. Rosenbaum, P, Rubin, DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983;70:41–55. https://doi.org/10.1093/biomet/70.1.41.Search in Google Scholar

6. Rubin, D. Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 1974;66:688–701. https://doi.org/10.1037/h0037350.Search in Google Scholar

7. Cassel, CM, Särndal, CE, Wretman, JH. Some results on generalized difference estimation and generalized regression estimation for finite populations. Biometrika 1976;63:615–20. https://doi.org/10.1093/biomet/63.3.615.Search in Google Scholar

8. Lunceford, JK, Davidian, M. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat Med 2004;23:2937–60. https://doi.org/10.1002/sim.1903.Search in Google Scholar PubMed

9. Robins, J, Rotnitzky, A, Zhao, L. Estimation of regression coefficients when some of the regressors are not always observed. J Am Stat Assoc 1994;89:846–66. https://doi.org/10.1080/01621459.1994.10476818.Search in Google Scholar

10. Rotnitzky, A, Lei, QH, Sued, M, Robins, JM. Improved double-robust estimation in missing data and causal inference models. Biometrika 2012;99:439–56. https://doi.org/10.1093/biomet/ass013.Search in Google Scholar PubMed PubMed Central

11. Scharfstein, DO, Rotnitzky, A, Robins, JM. Adjusting for nonignorable drop-out using semiparametric nonresponse models. J Am Stat Assoc 1999;94:1096–120. https://doi.org/10.1080/01621459.1999.10473862.Search in Google Scholar

12. Kang, JDY, Schafer, JL. Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci 2007;22:523–39. https://doi.org/10.1214/07-sts227.Search in Google Scholar

13. Seaman, SR, Vansteelandt, S. Introduction to double robust methods for incomplete data. Stat Sci 2018;33:184–97. https://doi.org/10.1214/18-STS647.Search in Google Scholar PubMed PubMed Central

14. Zhou, T, Elliott, MR, Little, RJ. Penalized spline of propensity methods for treatment comparisons (with discussion and rejoinder). J Am Stat Assoc 2019;114:1–38. https://doi.org/10.1080/01621459.2018.1518234.Search in Google Scholar

15. Yuan, A, Yin, A, Tan, MT. Enhanced doubly robust procedure for causal inference. Stat Biosci 2021;13:454–78.10.1007/s12561-021-09300-ySearch in Google Scholar

16. Huang, P, Tan, MT. Multistage nonparametric tests for treatment comparisons in clinical trials with multiple primary endpoints. Stat Interface 2016;9:343–54. https://doi.org/10.4310/sii.2016.v9.n3.a8.Search in Google Scholar

17. Yuan, A, Yue, Q, Apprey, V, Bonney, G. Detecting disease gene in DNA haplotype sequences by nonparametric dissimilarity test. Hum Genet 2006;120:253–61. https://doi.org/10.1007/s00439-006-0216-z.Search in Google Scholar PubMed

18. Yuan, A, Zheng, Y, Huang, P, Tan, MT. A nonparametric test for the evaluation of group sequential clinical trials with covariate information. J Multivariate Anal 2016;152:82–99. https://doi.org/10.1016/j.jmva.2016.08.002.Search in Google Scholar

19. Tu, XM, Kowalski, J. Modern applied U-statistics. Ukraine: Wiley; 2008.10.1002/9780470186466Search in Google Scholar

20. Hoeffding, W. A class of statistics with asymptotically normal distribution. Ann Math Stat 1948;19:293–325. https://doi.org/10.1214/aoms/1177730196.Search in Google Scholar

21. Hoeffding, W. The strong law of large numbers for U-statistics. Raleigh: North Carolina State University, Department of Statistics; 1961 Technical Report No. 302.Search in Google Scholar

22. Serfling, R. Approximation theorems of mathematical statistics. New York: John Wiley & Sons; 1980.10.1002/9780470316481Search in Google Scholar

23. Lee, MLT, Dehling, HG. Generalized two-sample U-statistics for clustered data. Stat Neerl 2005;59:313–23. https://doi.org/10.1111/j.1467-9574.2005.00298.x.Search in Google Scholar

24. Schaid, DJ, McDonnell, SK, Hebbring, SJ, Cunningham, JM. Nonparametric tests of association of mutation genes with human disease. Am J Hum Genet 2005;76:780–93. https://doi.org/10.1086/429838.Search in Google Scholar PubMed PubMed Central

25. Sherman, RP. Maximal inequalities for degenerate U-processes with applications to optimization estimators. Ann Stat 1994;22:439–59. https://doi.org/10.1214/aos/1176325377.Search in Google Scholar

26. Vardi, Y, Ying, Z, Zhang, CH. Two-sample tests for growth curves under dependent right censoring. Biometrika 2001;88:949–60. https://doi.org/10.1093/biomet/88.4.949.Search in Google Scholar

27. Yuan, A, He, W, Wang, B, Qin, G. U-statistic with side information. J Multivariate Anal 2012;111:20–38. https://doi.org/10.1016/j.jmva.2012.04.008.Search in Google Scholar PubMed PubMed Central

28. Schisterman, E, Rotnitzky, A. Estimation of the mean of a K-sample U-statistic with missing outcomes and auxiliaries. Biometrika 2001;88:713–25. https://doi.org/10.1093/biomet/88.3.713.Search in Google Scholar

29. Vermeulen, K, Thas, O, Vansteelandt, S. Increasing the power of the Mann–Whitney test in randomized experiments through flexible covariate adjustment. Stat Med 2015;34:1012–30. https://doi.org/10.1002/sim.6386.Search in Google Scholar PubMed

30. Rotnitzky, A, Faraggi, D, Schisterman, E. Doubly robust estimation of the area under the receiver-operating characteristic curve in the presence of verification bias. J Am Stat Assoc 2006;101:1276–88. https://doi.org/10.1198/016214505000001339.Search in Google Scholar

31. Mao, L. On causal estimation using U-statistics. Biometrika 2018;105:215–20. https://doi.org/10.1093/biomet/asx071.Search in Google Scholar

32. Zhang, Z, Ma, S, Shen, C, Liu, C. Estimating Mann–Whitney-type causal effects. Int Stat Rev 2019;87:514–30. https://doi.org/10.1111/insr.12326.Search in Google Scholar

33. Härdle, W, Hall, P, Ichimura, H. Optimal smoothing in the single index model. Ann Stat 1993;21:157–78. https://doi.org/10.1214/aos/1176349020.Search in Google Scholar

34. Xia, Y, Tong, H, Li, WK, Zhu, L. An adaptive estimation of dimension reduction space (with discussions). J Roy Stat Soc B 2002;64:363–410. https://doi.org/10.1111/1467-9868.03411.Search in Google Scholar

35. Yu, Y, Ruppert, D. Penalized spline estimation for partially linear single index models. J Am Stat Assoc 2002;97:1042–54. https://doi.org/10.1198/016214502388618861.Search in Google Scholar

36. Wang, L, Yang, L. Spline estimation of single-index models. Stat Sin 2009;19:765–83.Search in Google Scholar

37. Luss, R, Rosset, S, Shahar, M. Efficient regularized isotonic regression with application to genegene interaction search. Ann Appl Stat 2012;6:253–83. https://doi.org/10.1214/11-aoas504.Search in Google Scholar

38. Schell, MJ, Singh, B. The reduced monotonic regression method. J Am Stat Assoc 1997;92:128–35. https://doi.org/10.1080/01621459.1997.10473609.Search in Google Scholar

39. Foster, JC, Taylor, JMG, Nan, B. Variable selection in monotone single-index models via the adaptive LASSO. Stat Med 2013;32:3944–54. https://doi.org/10.1002/sim.5834.Search in Google Scholar PubMed PubMed Central

40. Friedman, JH, Tibshirani, R. The monotone smoothing of scatter plots. Technometrics 1984;26:243–50. https://doi.org/10.1080/00401706.1984.10487961.Search in Google Scholar

41. Huang, J. A note on estimating a partly linear model under monotonicity constraint. J Stat Plann Inference 2002;107:343–51. https://doi.org/10.1016/s0378-3758(02)00262-8.Search in Google Scholar

42. Qin, J, Garcia, TP, Ma, Y, Tang, MX, Marder, K, Wang, Y. Combining isotonic regerssion and EM algorithm to predict risk under monotonicity constraint. Ann Appl Stat 2014;8:1182–208. https://doi.org/10.1214/14-AOAS730.Search in Google Scholar PubMed PubMed Central

43. Balabdaoui, F, Groeneboom, P, Hendrickx, K. Score estimation in the monotone single index model. Scand J Stat 2018;46:517–44. https://doi.org/10.1111/sjos.12361.Search in Google Scholar

44. Fay, MP, Brittain, EH, Shih, JH, Follmann, DA, Gabriel, EE. Causal estimands and confidence intervals associated with Wilcoxon-Mann-Whitney tests in randomized experiments. Stat Med 2018;37:2923–37. https://doi.org/10.1002/sim.7799.Search in Google Scholar PubMed PubMed Central

45. Greenland, S, Fay, MP, Brittain, EH, Shih, JH, Follmann, DA, Gabriel, EE, et al.. On causal inferences for personalized medicine: how hidden causal assumptions led to erroneous causal claims about the D-value. Am Statistician 2020;74:243–8. https://doi.org/10.1080/00031305.2019.1575771.Search in Google Scholar PubMed PubMed Central

46. Robertson, T, Wright, FT, Dykstra, R. Order restricted statistical inference. Chichester, New York, Brisbane, Toronto, Singapore: John Wiley, Sons; 1988.Search in Google Scholar

47. van der Vaart, AW, Wellner, JA. Weak convergence and empirical processes. New York: Springer; 1996.10.1007/978-1-4757-2545-2Search in Google Scholar

48. Huang, J, Wellner, JA. Interval censored survival data: a review of recent progress. In: Lin, D, Fleming, T, editors. Proceedings of the first seattle symposium in biostatistics: survival snalysis. New York: Springer-Verlag; 1997:123–69 pp.10.1007/978-1-4684-6316-3_8Search in Google Scholar

49. Murphy, SA, van der Vaart, AW, Wellner, JA. Current status regression. Math Methods Stat 1999;8:407–25.Search in Google Scholar

50. Groeneboom, P, Hendrickx, K. Current status linear regression. Ann Stat 2018;46:1415–44. https://doi.org/10.1214/17-aos1589.Search in Google Scholar

51. Andersen, PK, Gill, RD. Cox’s regression model for counting processes: a large sample study. Ann Stat 1982;10:1100–20. https://doi.org/10.1214/aos/1176345976.Search in Google Scholar

52. Stute, W. The central limit theorm under random censorship. Ann Stat 1995;23:422–39. https://doi.org/10.1214/aos/1176324528.Search in Google Scholar

53. Lopuhaa, HP, Nane, GF. Shape constrained non-parametric estimators of the baseline distribution in cox proportional Hazards model. Scand J Stat 2013;40:619–46. https://doi.org/10.1002/sjos.12008.Search in Google Scholar


Supplementary Material

The online version of this article offers supplementary material (https://doi.org/10.1515/ijb-2022-0047). We provide additional material to support the results of this paper. This includes the proofs of Theorems and the regularity conditions, further examples, further simulation results, and a zip file with R code and datasets used for the simulations and applications.


Received: 2022-04-19
Accepted: 2022-10-31
Published Online: 2022-11-28

© 2022 Walter de Gruyter GmbH, Berlin/Boston

Articles in the same Issue

  1. Frontmatter
  2. Research Articles
  3. Survival analysis using deep learning with medical imaging
  4. Using a population-based Kalman estimator to model the COVID-19 epidemic in France: estimating associations between disease transmission and non-pharmaceutical interventions
  5. Approximate reciprocal relationship between two cause-specific hazard ratios in COVID-19 data with mutually exclusive events
  6. Sensitivity of estimands in clinical trials with imperfect compliance
  7. Highly robust causal semiparametric U-statistic with applications in biomedical studies
  8. Hierarchical Bayesian bootstrap for heterogeneous treatment effect estimation
  9. Penalized logistic regression with prior information for microarray gene expression classification
  10. Bayesian learners in gradient boosting for linear mixed models
  11. Unequal allocation of sample/event sizes with considerations of sampling cost for testing equality, non-inferiority/superiority, and equivalence of two Poisson rates
  12. HiPerMAb: a tool for judging the potential of small sample size biomarker pilot studies
  13. Heterogeneity in meta-analysis: a comprehensive overview
  14. On stochastic dynamic modeling of incidence data
  15. Power of testing for exposure effects under incomplete mediation
  16. Exact correction factor for estimating the OR in the presence of sparse data with a zero cell in 2 × 2 tables
  17. Right-censored partially linear regression model with error in variables: application with carotid endarterectomy dataset
  18. Assessing HIV-infected patient retention in a program of differentiated care in sub-Saharan Africa: a G-estimation approach
  19. Prediction-based variable selection for component-wise gradient boosting
Downloaded on 18.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/ijb-2022-0047/html
Scroll to top button