Startseite Mathematik Super Learner for Survival Data Prediction
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

Super Learner for Survival Data Prediction

  • Marzieh K. Golmakani EMAIL logo und Eric C. Polley
Veröffentlicht/Copyright: 22. Februar 2020

Abstract

Survival analysis is a widely used method to establish a connection between a time to event outcome and a set of potential covariates. Accurately predicting the time of an event of interest is of primary importance in survival analysis. Many different algorithms have been proposed for survival prediction. However, for a given prediction problem it is rarely, if ever, possible to know in advance which algorithm will perform the best. In this paper we propose two algorithms for constructing super learners in survival data prediction where the individual algorithms are based on proportional hazards. A super learner is a flexible approach to statistical learning that finds the best weighted ensemble of the individual algorithms. Finding the optimal combination of the individual algorithms through minimizing cross-validated risk controls for over-fitting of the final ensemble learner. Candidate algorithms may range from a basic Cox model to tree-based machine learning algorithms, assuming all candidate algorithms are based on the proportional hazards framework. The ensemble weights are estimated by minimizing the cross-validated negative log partial likelihood. We compare the performance of the proposed super learners with existing models through extensive simulation studies. In all simulation scenarios, the proposed super learners are either the best fit or near the best fit. The performances of the newly proposed algorithms are also demonstrated with clinical data examples.

  1. Conflict of Interest: None.

References

[1] Cox DR. Regression models and life-tables. J R Stat Soc. Ser B. 1972;34:187–220.10.1007/978-1-4612-4380-9_37Suche in Google Scholar

[2] Tibshirani R. The lasso method for variable selection in the cox model. Stat Med. 1997;16:385–95.10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3Suche in Google Scholar

[3] Verweij PJ, van Houwelingen HC. Penalized likelihood in cox regression. Stat Med. 199413:2427–36.10.1002/sim.4780132307Suche in Google Scholar

[4] Simon N, Friedman J, Hastie T, Tibshirani R. Regularization paths for cox’s proportional hazards model via coordinate descent. J Stat Softw 2011;39:1.10.18637/jss.v039.i05Suche in Google Scholar

[5] Schapire RE. The strength of weak learnability. Mach Learn. 1990;5:197–227.10.1109/SFCS.1989.63451Suche in Google Scholar

[6] Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting. Ann Stat. 2000;28:337–407.10.1214/aos/1016218223Suche in Google Scholar

[7] Bühlmann P, Yu B. Boosting with the l2 loss: regression and classification. J Am Stat Assoc 2003;98:324–39.10.1198/016214503000125Suche in Google Scholar

[8] Tutz G, Binder H. Generalized additive modeling with implicit variable selection by likelihood-based boosting. Biometrics 2006;62:961–71.10.1111/j.1541-0420.2006.00578.xSuche in Google Scholar

[9] De Bin R. Boosting in cox regression: a comparison between the likelihood-based and the model-based approaches with focus on the r-packages coxboost and mboost. Comput Stat 2016;31:513–31.10.1007/s00180-015-0642-2Suche in Google Scholar

[10] Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS. Random survival forests. Ann Appl Stat 2008;2:841–60.10.1002/9781118445112.stat08188Suche in Google Scholar

[11] Breiman L. Random forests. Mach Learn. 2001;45:5–32.10.1023/A:1010933404324Suche in Google Scholar

[12] Nelson W. Theory and applications of hazard plotting for censored failure data. Technometrics 1972;14:945–66.10.1080/00401706.1972.10488991Suche in Google Scholar

[13] Aalen O. Nonparametric inference for a family of counting processes. Ann Stat. 1978;6:701–726.10.1214/aos/1176344247Suche in Google Scholar

[14] van der Laan MJ, Polley EC, Hubbard AE. ‘Super learner. Stat Appl Genet Mol Biol. 2007;6:1–23.10.2202/1544-6115.1309Suche in Google Scholar

[15] Wolpert DH. Stacked generalization. Neural networks 1992;5:241–59.10.1016/S0893-6080(05)80023-1Suche in Google Scholar

[16] Breiman L. Stacked regressions. Mach Learn. 1996;24:49–64.10.1007/BF00117832Suche in Google Scholar

[17] van der Laan MJ, Dudoit S. Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive epsilon-net estimator: finite sample oracle inequalities and examples, Uc berkeley division of biostatistics working papers series, paper 130, U.C. Berkeley, 2003. https://biostats.bepress.com/ucbbiostat/paper130.Suche in Google Scholar

[18] van der Vaart A, Dudoit S, van der Laan MJ. Oracle inequalities for multi-fold cross validation. Stat Decisions. 2006;24:351–71.10.1524/stnd.2006.24.3.351Suche in Google Scholar

[19] Polley EC, Rose S, van der Laan MJ. Super learner in prediction. In MJ van der Laan, S Rose, editors. Targeted learning: causal inference for observational and experimental data. New York: Springer, 2011.10.1007/978-1-4419-9782-1Suche in Google Scholar

[20] Polley EC, van der Laan MJ. Super learning for right-censored data. New York, NY: Springer New York, 2011.10.1007/978-1-4419-9782-1_16Suche in Google Scholar

[21] Wey A, Connett J, Rudser K. Combining parametric, semi-parametric, and non-parametric survival models with stacked survival models. Biostatistics 2015;16:537–49.10.1093/biostatistics/kxv001Suche in Google Scholar

[22] Hastie TJ, Tibshirani RJ. Generalized additive models, monographs on statistics and applied probability. London: Chapman & Hall, CRC, 1990.Suche in Google Scholar

[23] Lorbert A, Ramadge P. Descent methods for tuning parameter refinement. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010:469–476.Suche in Google Scholar

[24] Breslow NE. Contribution to the discussion of paper by d.r. cox. J R Stat Soc. Ser B. 1972;34:216–7.Suche in Google Scholar

[25] Therneau TM, Lumley T. survival: R package version 2.42, 2018. https://CRAN.R-project.org/package=survival.Suche in Google Scholar

[26] Binder H. CoxBoost: cox models by likelihood based boosting for a single survival endpoint or competing risks, R package version 1.0, 2013. https://CRAN.R-project.org/package=CoxBoost.Suche in Google Scholar

[27] Hothorn T, Bühlmann P, Kneib T, Schmid M, Hofner B. mboost: Model-based boosting, R package version 2.5-0, 2015. https://CRAN.R-project.org/package=mboost.Suche in Google Scholar

[28] Ridgeway G. gbm: Generalized boosted regression models. R package version 1.6-3, 2007. https://CRAN.R-project.org/package=gbm.Suche in Google Scholar

[29] Ishwaran H, Kogalur UB. randomForestSRC. R package version 2.7, 2018. https://CRAN.R-project.org/package=randomForestSRC.Suche in Google Scholar

[30] Harrell FE, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996;15:361–87.10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4Suche in Google Scholar

[31] Harrell FE, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of medical tests. Jama 1982;247:2543–6.10.1001/jama.1982.03320430047030Suche in Google Scholar

[32] Loprinzi CL, Laurie JA, Wieand HS, Krook JE, Novotny PJ, Kugler JW, et al. Prospective evaluation of prognostic variables from patient-completed questionnaires. north central cancer treatment group. J Clin Oncol 1994;12:601–7.10.1200/JCO.1994.12.3.601Suche in Google Scholar PubMed

[33] Mantel N, Bohidar NR, Ciminera JL. Mantel-haenszel analyses of litter-matched time-to-response data, with modifications for recovery of interlitter information. Cancer Res. 1977;37:3863–8.Suche in Google Scholar

[34] Laurie JA, Moertel C, Fleming TR, Wieand HS, Leigh JE, Rubin J et al. Surgical adjuvant therapy of large-bowel carcinoma: an evaluation of levamisole and the combination of levamisole and fluorouracil. the north central cancer treatment group and the mayo clinic. J Clin Oncol. 1989;7:1447–56.10.1200/JCO.1989.7.10.1447Suche in Google Scholar PubMed

[35] Lin DY. Cox regression analysis of multivariate failure time data: the marginal approach. Stat Med 1994;13:2233–47.10.1002/sim.4780132105Suche in Google Scholar PubMed

[36] Moertel CG, Fleming TR, Macdonald JS, Haller DG, Laurie JA, Goodman PJ, et al. Levamisole and fluorouracil for adjuvant therapy of resected colon carcinoma. N Engl J Med. 1990;322:352–8.10.1056/NEJM199002083220602Suche in Google Scholar PubMed

[37] Moertel CG, Fleming TR, Macdonald JS, Haller DG, Laurie JA, Tangen CM, et al. Fluorouracil plus levamisole as effective adjuvant therapy after resection of stage iii colon carcinoma: a final report. Ann Int Med. 1995;122:321–6.10.7326/0003-4819-122-5-199503010-00001Suche in Google Scholar PubMed

[38] Kalbfleisch JD, Prentice RL. The statistical analysis of failure time data. New York: Wiley, 1980.Suche in Google Scholar

[39] McGilchrist CA, Aisbett CW. Regression with frailty in survival analysis. Biometrics, 1991;47:461–6.10.2307/2532138Suche in Google Scholar

[40] Huster WJ, Brookmeyer R, Self SG. Modelling paired survival data with covariates. Biometrics. 1989;45:145–56.10.2307/2532041Suche in Google Scholar

[41] Blair AL, Hadden DR, Weaver JA, Archer DB, Johnston PB, Maguire CJ. The 5-year prognosis for vision in diabetes. Ulster Med J. 1980;49:139.Suche in Google Scholar

Received: 2019-06-01
Revised: 2020-01-22
Accepted: 2020-01-24
Published Online: 2020-02-22

© 2020 Walter de Gruyter GmbH, Berlin/Boston

Heruntergeladen am 1.1.2026 von https://www.degruyterbrill.com/document/doi/10.1515/ijb-2019-0065/pdf
Button zum nach oben scrollen