Abstract
While there are many well-developed data science methods for classification and regression, there are relatively few methods for working with right-censored data. Here, we review survival stacking, a method for casting a survival regression analysis problem as a classification problem, thereby allowing the use of general classification methods and software in a survival setting. Inspired by the Cox partial likelihood, survival stacking collects features and outcomes of survival data in a large data frame with a binary outcome. We show that survival stacking with logistic regression is approximately equivalent to the Cox proportional hazards model. We further illustrate survival stacking on real and simulated data. By reframing survival regression problems as classification problems, survival stacking removes the reliance on specialized tools for survival regression, and makes it straightforward for data scientists to use well-known learning algorithms and software for classification in the survival setting. This in turn lowers the barrier for flexible survival modeling.
Funding source: National Science Foundation
Award Identifier / Grant number: 19 DMS1208164
Funding source: National Institutes of Health
Award Identifier / Grant number: 5R01 EB001988-16
Acknowledgments
The authors thank Terry Therneau for the argument in Section 2.3, and Terry Therneau, Thomas Gerds, Lu Tian, Trevor Hastie and Stephen Pfohl for helpful discussions. We also thank the peer reviewers for their valuable suggestions.
-
Research ethics: Not applicable.
-
Informed consent: Not applicable.
-
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
-
Use of Large Language Models, AI and Machine Learning Tools: None declared.
-
Conflict of interest: The authors state no conflict of interest.
-
Research funding: National Science Foundation, award number 19 DMS1208164; National Institutes of Health, award number 5R01 EB001988-16.
-
Data availability: Not applicable.
Appendix A: The Cox partial and profile likelihoods
Recall, the Cox partial likelihood is:
For simplicity, we have assumed (and will continue to assume) that there are no tied times: no two subjects have the event at the exact same time. Once fitted, the coefficients β are often used to describe the relative risk between subjects for different values of x. Optimizing the partial likelihood, however, does not allow us to say anything about the absolute risk for any individual subject: the baseline hazard λ 0(t) does not appear anywhere in the partial likelihood.
To jointly model the baseline hazard, we can look at the full log-likelihood for the Cox model. We will assume that the baseline hazard is discrete: the function λ
0(t) takes values
where t i is the final observation time for subject i.
We can use the full likelihood to estimate the baseline hazard as a function of β. We optimize the full likelihood (Equation (11)) for
Equation (12) is known as Breslow’s estimate of the baseline hazard [38], and it is the most common method of estimating the baseline hazard. We first estimate
Lastly, if we plug Equation (12) back in to the full likelihood, we obtain the profile likelihood:
which coincides with the partial likelihood (Equation (2)), up to a constant.
Appendix B: Software: learning methods for survival data
Common R software packages for fitting survival models, and their support for attributes of survival data.
| Package | Function | Time dep. covs. | Truncation | Sample weights | Time-varying effects | Non-linear | Missing data |
|---|---|---|---|---|---|---|---|
| Linear models | |||||||
| glmnet [39] | glmnet | ✓ | ✓ | ✓ | |||
| survival [40] | coxph | ✓ | ✓ | ✓ | |||
| Parametric models | |||||||
| flexsurv [41] | flexsurvreg | ✓ | ✓ | ✓ | ✓ | ||
| survival | survreg | ✓ | ✓ | ✓ | |||
| Trees & random forests | |||||||
| aorsf [42] | orsf | ✓ | ✓ | ||||
| grf [43] | survival_forest | ✓ | ✓ | ✓ | |||
| LTRCforests [44] | ltrccif | ✓ | ✓ | ✓ | ✓ | ✓ | |
| partykit [45] | ctree | ✓ | ✓ | ✓ | |||
| randomForestSRC [34] | rfsrc | ✓ | ✓ | ✓ | |||
| ranger [46] | ranger | ✓ | ✓ | ✓ | |||
| rpart [47] | rpart | ✓ | ✓ | ✓ | |||
| Boosting | |||||||
| CoxBoost [48] | CoxBoost | ✓ | |||||
| gbm [49] | gbm | ✓ | ✓ | ✓ | |||
| mboost [50] | glmboost | ✓ | |||||
| gamboost | ✓ | ✓ | |||||
| xgboost [51] | xgboost | ✓ | ✓ | ✓ | |||
| SVMs | |||||||
| survivalsvm [52] | survivalsvm | ✓ | |||||
| Neural nets | |||||||
| survivalmodels [53] | coxtime | ✓ | ✓ | ||||
| deephit | ✓ | ✓ | |||||
| deepsurv | ✓ | ✓ | |||||
| dnnsurv | ✓ | ✓ | |||||
| loghaz (Nnet-survival) | ✓ | ✓ | |||||
| pchazard | ✓ | ✓ | |||||
| Ecosystems | |||||||
| mlr3proba [54] | ✓ | ✓ | |||||
Common Python software packages, and their support for attributes of survival data.
| Package | Function | Time dep. covs. | Truncation | Sample weights | Time-varying effects | Non-linear | Missing data |
|---|---|---|---|---|---|---|---|
| Linear models | |||||||
| lifelines [55] | CoxPHFitter | ||||||
| CoxTimeVaryingFitter | ✓ | ✓ | |||||
| PySurvival [56] | CoxPHModel | ||||||
| LinearMultiTaskModel | ✓ | ||||||
| scikit-survival [57] | CoxPHSurvivalAnalysis | ||||||
| CoxnetSurvivalAnalysis | |||||||
| Trees & random forests | |||||||
| PySurvival | RandomSurvivalForestModel | ✓ | ✓ | ✓ | |||
| ExtraSurvivalTreesModel | ✓ | ✓ | ✓ | ||||
| ConditionalSurvivalForestModel | ✓ | ✓ | ✓ | ||||
| scikit-survival | RandomSurvivalForest | ✓ | ✓ | ||||
| Boosting | |||||||
| scikit-survival | GradientBoostingSurvivalAnalysis | ✓ | ✓ | ||||
| Neural nets | |||||||
| pycox [58] | CoxPH (DeepSurv) | ✓ | ✓ | ||||
| LogisticHazard (Nnet-survival) | ✓ | ✓ | |||||
| DeepHit | ✓ | ✓ | |||||
| N-MTLR | ✓ | ✓ | |||||
| PySurvival | NeuralMultiTaskModel | ✓ | ✓ | ||||
Common R software packages to reshape survival data, and their support for attributes of survival data. Though there exist packages to fit survival models in Python, we were unable to identify packages with support for reshaping data.
| Language | Package | Function | Time dep. covs. | Truncation |
|---|---|---|---|---|
| R | discSurv [59] | dataLong | ||
| dataLongTimeDep | ✓ | |||
| survival [40] | survSplit | ✓ | ✓ | |
| pammtools [60] | as_ped | ✓ | ✓ |
Survival stacking allows the modeling of survival data – with time-varying covariates and truncation – using linear and non-linear models, and it naturally enables the modeling of time-varying effects. Moreover, this is now possible using familiar, well-developed software for classification and regression. This is important, as there are very few survival software packages that are equally flexible.
Here, we examine the support for various features of survival data in common survival software packages in R and Python, with a focus on methods discussed in this work (Tables 2–4). We note that, though individual methods may support a particular feature of survival data (e.g. Nnet-survival supports time-varying covariates), it is not always the case that the corresponding software follows suit.
References
1. Cox, DR. Regression models and life-tables. J R Stat Soc B: Stat Methodol 1972;34:187–202. https://doi.org/10.1111/j.2517-6161.1972.tb00899.x.Suche in Google Scholar
2. Wei, L-J. The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. Stat Med 1992;11:1871–9. https://doi.org/10.1002/sim.4780111409.Suche in Google Scholar PubMed
3. Gray, RJ. Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. J Am Stat Assoc 1992;87:942–51. https://doi.org/10.2307/2290630.Suche in Google Scholar
4. Royston, P, Altman, DG. External validation of a Cox prognostic model: principles and methods. BMC Med Res Methodol 2013;13:1–15. https://doi.org/10.1186/1471-2288-13-33.Suche in Google Scholar PubMed PubMed Central
5. D’Agostino, RB, Lee, M-L, Belanger, AJ, Adrienne Cupples, L, Anderson, K, Kannel, WB. Relation of pooled logistic regression to time dependent Cox regression analysis: the Framingham Heart Study. Stat Med 1990;9:1501–15. https://doi.org/10.1002/sim.4780091214.Suche in Google Scholar PubMed
6. Ingram, DD, Kleinman, JC. Empirical comparisons of proportional hazards and logistic regression models. Stat Med 1989;8:525–38. https://doi.org/10.1002/sim.4780080502.Suche in Google Scholar PubMed
7. Lim, M, Hastie, T. Learning interactions via hierarchical group-lasso regularization. J Comput Graph Stat 2015;24:627–54. https://doi.org/10.1080/10618600.2014.938812.Suche in Google Scholar PubMed PubMed Central
8. Harrell, FE, Califf, RM, Pryor, DB, Lee, KL, Rosati, RA. Evaluating the yield of medical tests. JAMA 1982;247:2543–6. https://doi.org/10.1001/jama.1982.03320430047030.Suche in Google Scholar
9. Uno, H, Cai, T, Lu, T, Wei, L-J. Evaluating prediction rules for t-year survivors with censored regression models. J Am Stat Assoc 2007;102:527–37. https://doi.org/10.1198/016214507000000149.Suche in Google Scholar
10. Mogensen, UB, Ishwaran, H, Gerds, TA. Evaluating random forests for survival analysis using prediction error curves. J Stat Software 2012;50:1. https://doi.org/10.18637/jss.v050.i11.Suche in Google Scholar PubMed PubMed Central
11. Wu, M, Ware, JH. On the use of repeated measurements in regression analysis with dichotomous responses. Biometrics 1979:513–21. https://doi.org/10.2307/2530355.Suche in Google Scholar
12. Adrienne Cupples, L, D’Agostino, RB, Anderson, K, Kannel, WB. Comparison of baseline and repeated measure covariate techniques in the Framingham Heart Study. Stat Med 1988;7:205–18. https://doi.org/10.1002/sim.4780070122.Suche in Google Scholar PubMed
13. Therneau, TM, Grambsch, PM. Modeling survival data: extending the Cox model, 1st ed. New York: Springer; 2000:50–3 pp.10.1007/978-1-4757-3294-8_1Suche in Google Scholar
14. Tutz, G, Schmid, M. Modeling discrete time-to-event data. Cham: Springer; 2016.10.1007/978-3-319-28158-2Suche in Google Scholar
15. Allison, PD. Discrete-time methods for the analysis of event histories. Socio Methodol 1982;13:61–98. https://doi.org/10.2307/270718.Suche in Google Scholar
16. Polley, EC, van der Laan, MJ. Super learning for right-censored data. In: Targeted learning: causal inference for observational and experimental data. New York: Springer; 2011:249–58 pp.10.1007/978-1-4419-9782-1_16Suche in Google Scholar
17. Fahrmeir, L. Discrete survival-time models. In: Wiley StatsRef: statistics reference online; 2014.10.1002/9781118445112.stat06012Suche in Google Scholar
18. Moore, KL, van der Laan, MJ. Covariate adjustment in randomized trials with binary outcomes: targeted maximum likelihood estimation. Stat Med 2009;28:39–64. https://doi.org/10.1002/sim.3445.Suche in Google Scholar PubMed PubMed Central
19. Stitelman, OM, De Gruttola, V, van der Laan, MJ. A general implementation of tmle for longitudinal data applied to causal inference in survival analysis. Int J Biostat 2012;8. https://doi.org/10.1515/1557-4679.1334.Suche in Google Scholar PubMed
20. Cai, W, van der Laan, MJ. One-step targeted maximum likelihood estimation for time-to-event outcomes. Biometrics 2020;76:722–33. https://doi.org/10.1111/biom.13172.Suche in Google Scholar PubMed
21. Rytgaard, HCW, van der Laan, MJ. Targeted maximum likelihood estimation for causal inference in survival and competing risks analysis. In: Lifetime data analysis; 2022:1–30 pp.10.1007/s10985-022-09576-2Suche in Google Scholar PubMed
22. Fewell, Z, Hernán, MA, Wolfe, F, Tilling, K, Choi, H, Sterne, JAC. Controlling for time-dependent confounding using marginal structural models. Stata J 2004;4:402–20. https://doi.org/10.1177/1536867x0400400403.Suche in Google Scholar
23. Benkeser, D, Gilbert, PB, Carone, M. Estimating and testing vaccine sieve effects using machine learning. J Am Stat Assoc 2019;114:1038–49. https://doi.org/10.1080/01621459.2018.1529594.Suche in Google Scholar PubMed PubMed Central
24. Ching, T, Zhu, X, Garmire, LX. Cox-nnet: an artificial neural network method for prognosis prediction of high-throughput omics data. PLoS Comput Biol 2018;14:e1006076. https://doi.org/10.1371/journal.pcbi.1006076.Suche in Google Scholar PubMed PubMed Central
25. Katzman, JL, Shaham, U, Cloninger, A, Bates, J, Jiang, T, Kluger, Y. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol 2018;18:1–12. https://doi.org/10.1186/s12874-018-0482-1.Suche in Google Scholar PubMed PubMed Central
26. Giunchiglia, E, Nemchenko, A, van der Schaar, M. Rnn-surv: a deep recurrent model for survival analysis. In: International conference on artificial neural networks. Springer; 2018:23–32 pp.10.1007/978-3-030-01424-7_3Suche in Google Scholar
27. Gensheimer, MF, Narasimhan, B. A scalable discrete-time survival model for neural networks. PeerJ 2019;7:e6257. https://doi.org/10.7717/peerj.6257.Suche in Google Scholar PubMed PubMed Central
28. Caruana, R. Multitask learning. Mach Learn 1997;28:41–75. https://doi.org/10.1023/a:1007379606734.10.1023/A:1007379606734Suche in Google Scholar
29. Yu, C-N, Greiner, R, Lin, H-C, Baracos, V. Learning patient-specific cancer survival distributions as a sequence of dependent regressors. Adv Neural Inf Process Syst 2011;24:1845–53.Suche in Google Scholar
30. Fotso, S. Deep neural networks for survival analysis based on a multi-task framework. arXiv preprint arXiv:1801.05512 2018.Suche in Google Scholar
31. Alexander Gerds, T, Sebastian Ohlendorff, J, Ozenne, B. riskRegression: risk regression models and prediction scores for survival analysis with competing risks. R Package Version 2023.03.22; 2023.Suche in Google Scholar
32. Gerds, TA, Kattan, MW. Medical risk prediction models: with ties to machine learning, 1st ed. New York: Chapman and Hall/CRC; 2021.10.1201/9781138384484-1Suche in Google Scholar
33. Rindt, D, Hu, R, Steinsaltz, D, Sejdinovic, D. Survival regression with proper scoring rules and monotonic neural networks. In: International conference on artificial intelligence and statistics. PMLR; 2022:1190–205 pp.Suche in Google Scholar
34. Ishwaran, H, Kogalur, UB, Blackstone, EH, Lauer, MS. Random survival forests. Ann Appl Stat 2008;2:841–60. https://doi.org/10.1214/08-aoas169.Suche in Google Scholar
35. Freund, Y, Schapire, R, Abe, N. A short introduction to boosting. J Jpn Soc Artif Intell 1999;14:1612.Suche in Google Scholar
36. Friedman, JH. Greedy function approximation: a gradient boosting machine. Ann Stat 2001:1189–232. https://doi.org/10.1214/aos/1013203451.Suche in Google Scholar
37. Brilleman, SL, Wolfe, R, Moreno-Betancur, M, Crowther, MJ. Simulating survival data using the simsurv R package. J Stat Software 2020;97:1–27. https://doi.org/10.18637/jss.v097.i03.Suche in Google Scholar
38. Breslow, NE. Discussion of the paper by DR Cox. J Roy Stat Soc B 1972;34:216–17.Suche in Google Scholar
39. Simon, N, Friedman, J, Hastie, T, Tibshirani, R. Regularization paths for Cox’s proportional hazards model via coordinate descent. J Stat Software 2011;39:1–13. https://doi.org/10.18637/jss.v039.i05.Suche in Google Scholar PubMed PubMed Central
40. Therneau, TM. A package for survival analysis in R. R Package Version 3.2-7; 2020.Suche in Google Scholar
41. Jackson, C. flexsurv: a platform for parametric survival modeling in R. J Stat Software 2016;70:1–33. https://doi.org/10.18637/jss.v070.i08.Suche in Google Scholar PubMed PubMed Central
42. Jaeger, BC, Welden, S, Lenoir, K, Pajewski, NM. aorsf: an r package for supervised learning using the oblique random survival forest. J Open Source Softw 2022;7:4705. https://doi.org/10.21105/joss.04705.Suche in Google Scholar
43. Tibshirani, J, Athey, S, Wager, S. grf: generalized random forests. R Package Version 1.2.0; 2020.Suche in Google Scholar
44. Yao, W, Frydman, H, Larocque, D, Simonoff, JS. LTRCforests: ensemble methods for survival data with time-varying covariates. R Package Version 0.5.5; 2021.10.32614/CRAN.package.LTRCforestsSuche in Google Scholar
45. Hothorn, T, Zeileis, A. partykit: a modular toolkit for recursive partytioning in R. J Mach Learn Res 2015;16:3905–9.Suche in Google Scholar
46. Wright, MN, Ziegler, A. ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Software 2017;77:1–17. https://doi.org/10.18637/jss.v077.i01.Suche in Google Scholar
47. Therneau, T, Atkinson, B. rpart: recursive partitioning and regression trees. R Package Version 4.1.16; 2022.Suche in Google Scholar
48. Binder, H. CoxBoost: Cox models by likelihood based boosting for a single survival endpoint or competing risks. R Package Version 1.5; 2023.Suche in Google Scholar
49. Greenwell, B, Boehmke, B, Cunningham, J, Developers, GBM. gbm: generalized boosted regression models. R Package Version 2.1.8; 2020.Suche in Google Scholar
50. Hothorn, T, Buehlmann, P, Kneib, T, Schmid, M, Hofner, B. mboost: model-based boosting. R Package Version 2.9-7; 2022.Suche in Google Scholar
51. Chen, T, He, T, Benesty, M, Khotilovich, V, Tang, Y, Cho, H, et al.. xgboost: extreme gradient boosting. R Package Version 2.0.0.1; 2022.Suche in Google Scholar
52. Fouodo, CJK. survivalsvm: survival support vector analysis. R Package Version 0.0.5; 2018.Suche in Google Scholar
53. Sonabend, R. survivalmodels: models for survival analysis. R Package Version 0.1.8; 2021.10.32614/CRAN.package.survivalmodelsSuche in Google Scholar
54. Sonabend, R, Király, FJ, Bender, A, Bischl, B, Lang, M. mlr3proba: an R package for machine learning in survival analysis. Bioinformatics 2021;37:2789–91. https://doi.org/10.1093/bioinformatics/btab039.Suche in Google Scholar PubMed PubMed Central
55. Davidson-Pilon, C. lifelines: survival analysis in python. J Open Source Softw 2019;4:1317. https://doi.org/10.21105/joss.01317.Suche in Google Scholar
56. Fotso S and others. PySurvival: open source package for survival analysis modeling; 2019. Available from: https://www.pysurvival.io/.Suche in Google Scholar
57. Pölsterl, S. scikit-survival: a library for time-to-event analysis built on top of scikit-learn. J Mach Learn Res 2020;21:1–6.10.1007/978-1-4842-5373-1_1Suche in Google Scholar
58. Kvamme, H, Borgan, Ø, Scheel, I. Time-to-event prediction with neural networks and Cox regression. arXiv preprint arXiv:1907.00825 2019.Suche in Google Scholar
59. Welchowski, T, Schmid, M. discsurv: discrete time survival analysis. R Package Version; 2015, vol 1:1 p.Suche in Google Scholar
60. Bender, A, Groll, A, Scheipl, F. A generalized additive model approach to time-to-event analysis. Stat Model Int J 2018;18:299–321. https://doi.org/10.1177/1471082x17748083.Suche in Google Scholar
© 2025 Walter de Gruyter GmbH, Berlin/Boston
Artikel in diesem Heft
- Frontmatter
- Research Articles
- Prognostic adjustment with efficient estimators to unbiasedly leverage historical data in randomized trials
- Homogeneity test and sample size of response rates for AC 1 in a stratified evaluation design
- A review of survival stacking: a method to cast survival regression analysis as a classification problem
- DsubCox: a fast subsampling algorithm for Cox model with distributed and massive survival data
- A hybrid hazard-based model using two-piece distributions
- Regression analysis of clustered current status data with informative cluster size under a transformed survival model
- Bayesian covariance regression in functional data analysis with applications to functional brain imaging
- Risk estimation and boundary detection in Bayesian disease mapping
- An improved estimator of the logarithmic odds ratio for small sample sizes using a Bayesian approach
- Short Communication
- A multivariate Bayesian learning approach for improved detection of doping in athletes using urinary steroid profiles
- Research Articles
- Guidance on individualized treatment rule estimation in high dimensions
- Weighted Euclidean balancing for a matrix exposure in estimating causal effect
- Penalized regression splines in Mixture Density Networks
Artikel in diesem Heft
- Frontmatter
- Research Articles
- Prognostic adjustment with efficient estimators to unbiasedly leverage historical data in randomized trials
- Homogeneity test and sample size of response rates for AC 1 in a stratified evaluation design
- A review of survival stacking: a method to cast survival regression analysis as a classification problem
- DsubCox: a fast subsampling algorithm for Cox model with distributed and massive survival data
- A hybrid hazard-based model using two-piece distributions
- Regression analysis of clustered current status data with informative cluster size under a transformed survival model
- Bayesian covariance regression in functional data analysis with applications to functional brain imaging
- Risk estimation and boundary detection in Bayesian disease mapping
- An improved estimator of the logarithmic odds ratio for small sample sizes using a Bayesian approach
- Short Communication
- A multivariate Bayesian learning approach for improved detection of doping in athletes using urinary steroid profiles
- Research Articles
- Guidance on individualized treatment rule estimation in high dimensions
- Weighted Euclidean balancing for a matrix exposure in estimating causal effect
- Penalized regression splines in Mixture Density Networks