Highly robust causal semiparametric U-statistic with applications in biomedical studies

Anqi Yin; Ao Yuan; Ming T. Tan

doi:10.1515/ijb-2022-0047

Article

Highly robust causal semiparametric U-statistic with applications in biomedical studies

Anqi Yin , Ao Yuan and Ming T. Tan

Published/Copyright: November 28, 2022

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal The International Journal of Biostatistics Volume 20 Issue 1

Abstract

With our increased ability to capture large data, causal inference has received renewed attention and is playing an ever-important role in biomedicine and economics. However, one major methodological hurdle is that existing methods rely on many unverifiable model assumptions. Thus robust modeling is a critically important approach complementary to sensitivity analysis, where it compares results under various model assumptions. The more robust a method is with respect to model assumptions, the more worthy it is. The doubly robust estimator (DRE) is a significant advance in this direction. However, in practice, many outcome measures are functionals of multiple distributions, and so are the associated estimands, which can only be estimated via U-statistics. Thus most existing DREs do not apply. This article proposes a broad class of highly robust U-statistic estimators (HREs), which use semiparametric specifications for both the propensity score and outcome models in constructing the U-statistic. Thus, the HRE is more robust than the existing DREs. We derive comprehensive asymptotic properties of the proposed estimators and perform extensive simulation studies to evaluate their finite sample performance and compare them with the corresponding parametric U-statistics and the naive estimators, which show significant advantages. Then we apply the method to analyze a clinical trial from the AIDS Clinical Trials Group.

Keywords: causal effect; highly robust estimation; semiparametric model; U-statistic

Corresponding authors: Ao Yuan and Ming T. Tan, Department of Biostatistics, Bioinformatics and Biomathematics Georgetown University, Washington, DC 20057, USA, E-mail: ay312@georgetown.edu (A. Yuan) and mtt34@georgetown.edu (M. T. Tan)

Acknowledgments

This research is part of the Ph.D. dissertation of the first author who would like to thank members of her dissertation committee for their helpful comments. This research is supported in part by NIH grant R21CA270585.

Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: None declared.
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

References

1. Horvitz, DG, Thompson, DJ. A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 1952;47:663–85. https://doi.org/10.1080/01621459.1952.10483446.Search in Google Scholar

2. Crump, R, Hotz, VJ, Imbens, GW, Mitnik, OA. Dealing with limited overlap in estimation of average treatment effects. Biometrika 2009;96:187–99. https://doi.org/10.1093/biomet/asn055.Search in Google Scholar

3. Yang, S, Imbens, GW, Cui, Z, Faries, DE, Kadziola, Z. Propensity score matching and subclassification in observational studies with multi-level treatments. Biometrics 2016;72:1055–65. https://doi.org/10.1111/biom.12505.Search in Google Scholar PubMed

4. Li, H, Graham, DJ, Ding, H,Ren, G. Comparison of empirical Bayes and propensity score methods for road safety evaluation: a simulation study. Accid Anal Prev. 2019;129:148–55.10.1016/j.aap.2019.05.015Search in Google Scholar PubMed

5. Rosenbaum, P, Rubin, DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983;70:41–55. https://doi.org/10.1093/biomet/70.1.41.Search in Google Scholar

6. Rubin, D. Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 1974;66:688–701. https://doi.org/10.1037/h0037350.Search in Google Scholar

7. Cassel, CM, Särndal, CE, Wretman, JH. Some results on generalized difference estimation and generalized regression estimation for finite populations. Biometrika 1976;63:615–20. https://doi.org/10.1093/biomet/63.3.615.Search in Google Scholar

8. Lunceford, JK, Davidian, M. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat Med 2004;23:2937–60. https://doi.org/10.1002/sim.1903.Search in Google Scholar PubMed

9. Robins, J, Rotnitzky, A, Zhao, L. Estimation of regression coefficients when some of the regressors are not always observed. J Am Stat Assoc 1994;89:846–66. https://doi.org/10.1080/01621459.1994.10476818.Search in Google Scholar

10. Rotnitzky, A, Lei, QH, Sued, M, Robins, JM. Improved double-robust estimation in missing data and causal inference models. Biometrika 2012;99:439–56. https://doi.org/10.1093/biomet/ass013.Search in Google Scholar PubMed PubMed Central

11. Scharfstein, DO, Rotnitzky, A, Robins, JM. Adjusting for nonignorable drop-out using semiparametric nonresponse models. J Am Stat Assoc 1999;94:1096–120. https://doi.org/10.1080/01621459.1999.10473862.Search in Google Scholar

12. Kang, JDY, Schafer, JL. Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci 2007;22:523–39. https://doi.org/10.1214/07-sts227.Search in Google Scholar

13. Seaman, SR, Vansteelandt, S. Introduction to double robust methods for incomplete data. Stat Sci 2018;33:184–97. https://doi.org/10.1214/18-STS647.Search in Google Scholar PubMed PubMed Central

14. Zhou, T, Elliott, MR, Little, RJ. Penalized spline of propensity methods for treatment comparisons (with discussion and rejoinder). J Am Stat Assoc 2019;114:1–38. https://doi.org/10.1080/01621459.2018.1518234.Search in Google Scholar

15. Yuan, A, Yin, A, Tan, MT. Enhanced doubly robust procedure for causal inference. Stat Biosci 2021;13:454–78.10.1007/s12561-021-09300-ySearch in Google Scholar

16. Huang, P, Tan, MT. Multistage nonparametric tests for treatment comparisons in clinical trials with multiple primary endpoints. Stat Interface 2016;9:343–54. https://doi.org/10.4310/sii.2016.v9.n3.a8.Search in Google Scholar

17. Yuan, A, Yue, Q, Apprey, V, Bonney, G. Detecting disease gene in DNA haplotype sequences by nonparametric dissimilarity test. Hum Genet 2006;120:253–61. https://doi.org/10.1007/s00439-006-0216-z.Search in Google Scholar PubMed

18. Yuan, A, Zheng, Y, Huang, P, Tan, MT. A nonparametric test for the evaluation of group sequential clinical trials with covariate information. J Multivariate Anal 2016;152:82–99. https://doi.org/10.1016/j.jmva.2016.08.002.Search in Google Scholar

19. Tu, XM, Kowalski, J. Modern applied U-statistics. Ukraine: Wiley; 2008.10.1002/9780470186466Search in Google Scholar

20. Hoeffding, W. A class of statistics with asymptotically normal distribution. Ann Math Stat 1948;19:293–325. https://doi.org/10.1214/aoms/1177730196.Search in Google Scholar

21. Hoeffding, W. The strong law of large numbers for U-statistics. Raleigh: North Carolina State University, Department of Statistics; 1961 Technical Report No. 302.Search in Google Scholar

22. Serfling, R. Approximation theorems of mathematical statistics. New York: John Wiley & Sons; 1980.10.1002/9780470316481Search in Google Scholar

23. Lee, MLT, Dehling, HG. Generalized two-sample U-statistics for clustered data. Stat Neerl 2005;59:313–23. https://doi.org/10.1111/j.1467-9574.2005.00298.x.Search in Google Scholar

24. Schaid, DJ, McDonnell, SK, Hebbring, SJ, Cunningham, JM. Nonparametric tests of association of mutation genes with human disease. Am J Hum Genet 2005;76:780–93. https://doi.org/10.1086/429838.Search in Google Scholar PubMed PubMed Central

25. Sherman, RP. Maximal inequalities for degenerate U-processes with applications to optimization estimators. Ann Stat 1994;22:439–59. https://doi.org/10.1214/aos/1176325377.Search in Google Scholar

26. Vardi, Y, Ying, Z, Zhang, CH. Two-sample tests for growth curves under dependent right censoring. Biometrika 2001;88:949–60. https://doi.org/10.1093/biomet/88.4.949.Search in Google Scholar

27. Yuan, A, He, W, Wang, B, Qin, G. U-statistic with side information. J Multivariate Anal 2012;111:20–38. https://doi.org/10.1016/j.jmva.2012.04.008.Search in Google Scholar PubMed PubMed Central

28. Schisterman, E, Rotnitzky, A. Estimation of the mean of a K-sample U-statistic with missing outcomes and auxiliaries. Biometrika 2001;88:713–25. https://doi.org/10.1093/biomet/88.3.713.Search in Google Scholar

29. Vermeulen, K, Thas, O, Vansteelandt, S. Increasing the power of the Mann–Whitney test in randomized experiments through flexible covariate adjustment. Stat Med 2015;34:1012–30. https://doi.org/10.1002/sim.6386.Search in Google Scholar PubMed

30. Rotnitzky, A, Faraggi, D, Schisterman, E. Doubly robust estimation of the area under the receiver-operating characteristic curve in the presence of verification bias. J Am Stat Assoc 2006;101:1276–88. https://doi.org/10.1198/016214505000001339.Search in Google Scholar

31. Mao, L. On causal estimation using U-statistics. Biometrika 2018;105:215–20. https://doi.org/10.1093/biomet/asx071.Search in Google Scholar

32. Zhang, Z, Ma, S, Shen, C, Liu, C. Estimating Mann–Whitney-type causal effects. Int Stat Rev 2019;87:514–30. https://doi.org/10.1111/insr.12326.Search in Google Scholar

33. Härdle, W, Hall, P, Ichimura, H. Optimal smoothing in the single index model. Ann Stat 1993;21:157–78. https://doi.org/10.1214/aos/1176349020.Search in Google Scholar

34. Xia, Y, Tong, H, Li, WK, Zhu, L. An adaptive estimation of dimension reduction space (with discussions). J Roy Stat Soc B 2002;64:363–410. https://doi.org/10.1111/1467-9868.03411.Search in Google Scholar

35. Yu, Y, Ruppert, D. Penalized spline estimation for partially linear single index models. J Am Stat Assoc 2002;97:1042–54. https://doi.org/10.1198/016214502388618861.Search in Google Scholar

36. Wang, L, Yang, L. Spline estimation of single-index models. Stat Sin 2009;19:765–83.Search in Google Scholar

37. Luss, R, Rosset, S, Shahar, M. Efficient regularized isotonic regression with application to genegene interaction search. Ann Appl Stat 2012;6:253–83. https://doi.org/10.1214/11-aoas504.Search in Google Scholar

38. Schell, MJ, Singh, B. The reduced monotonic regression method. J Am Stat Assoc 1997;92:128–35. https://doi.org/10.1080/01621459.1997.10473609.Search in Google Scholar

39. Foster, JC, Taylor, JMG, Nan, B. Variable selection in monotone single-index models via the adaptive LASSO. Stat Med 2013;32:3944–54. https://doi.org/10.1002/sim.5834.Search in Google Scholar PubMed PubMed Central

40. Friedman, JH, Tibshirani, R. The monotone smoothing of scatter plots. Technometrics 1984;26:243–50. https://doi.org/10.1080/00401706.1984.10487961.Search in Google Scholar

41. Huang, J. A note on estimating a partly linear model under monotonicity constraint. J Stat Plann Inference 2002;107:343–51. https://doi.org/10.1016/s0378-3758(02)00262-8.Search in Google Scholar

42. Qin, J, Garcia, TP, Ma, Y, Tang, MX, Marder, K, Wang, Y. Combining isotonic regerssion and EM algorithm to predict risk under monotonicity constraint. Ann Appl Stat 2014;8:1182–208. https://doi.org/10.1214/14-AOAS730.Search in Google Scholar PubMed PubMed Central

43. Balabdaoui, F, Groeneboom, P, Hendrickx, K. Score estimation in the monotone single index model. Scand J Stat 2018;46:517–44. https://doi.org/10.1111/sjos.12361.Search in Google Scholar

44. Fay, MP, Brittain, EH, Shih, JH, Follmann, DA, Gabriel, EE. Causal estimands and confidence intervals associated with Wilcoxon-Mann-Whitney tests in randomized experiments. Stat Med 2018;37:2923–37. https://doi.org/10.1002/sim.7799.Search in Google Scholar PubMed PubMed Central

45. Greenland, S, Fay, MP, Brittain, EH, Shih, JH, Follmann, DA, Gabriel, EE, et al.. On causal inferences for personalized medicine: how hidden causal assumptions led to erroneous causal claims about the D-value. Am Statistician 2020;74:243–8. https://doi.org/10.1080/00031305.2019.1575771.Search in Google Scholar PubMed PubMed Central

46. Robertson, T, Wright, FT, Dykstra, R. Order restricted statistical inference. Chichester, New York, Brisbane, Toronto, Singapore: John Wiley, Sons; 1988.Search in Google Scholar

47. van der Vaart, AW, Wellner, JA. Weak convergence and empirical processes. New York: Springer; 1996.10.1007/978-1-4757-2545-2Search in Google Scholar

48. Huang, J, Wellner, JA. Interval censored survival data: a review of recent progress. In: Lin, D, Fleming, T, editors. Proceedings of the first seattle symposium in biostatistics: survival snalysis. New York: Springer-Verlag; 1997:123–69 pp.10.1007/978-1-4684-6316-3_8Search in Google Scholar

49. Murphy, SA, van der Vaart, AW, Wellner, JA. Current status regression. Math Methods Stat 1999;8:407–25.Search in Google Scholar

50. Groeneboom, P, Hendrickx, K. Current status linear regression. Ann Stat 2018;46:1415–44. https://doi.org/10.1214/17-aos1589.Search in Google Scholar

51. Andersen, PK, Gill, RD. Cox’s regression model for counting processes: a large sample study. Ann Stat 1982;10:1100–20. https://doi.org/10.1214/aos/1176345976.Search in Google Scholar

52. Stute, W. The central limit theorm under random censorship. Ann Stat 1995;23:422–39. https://doi.org/10.1214/aos/1176324528.Search in Google Scholar

53. Lopuhaa, HP, Nane, GF. Shape constrained non-parametric estimators of the baseline distribution in cox proportional Hazards model. Scand J Stat 2013;40:619–46. https://doi.org/10.1002/sjos.12008.Search in Google Scholar

Supplementary Material

The online version of this article offers supplementary material (https://doi.org/10.1515/ijb-2022-0047). We provide additional material to support the results of this paper. This includes the proofs of Theorems and the regularity conditions, further examples, further simulation results, and a zip file with R code and datasets used for the simulations and applications.

Received: 2022-04-19

Accepted: 2022-10-31

Published Online: 2022-11-28

You are currently not able to access this content.

Supplementary Material Details

Articles in the same Issue

https://doi.org/10.1515/ijb-2022-0047

Keywords for this article

causal effect; highly robust estimation; semiparametric model; U-statistic