A review of survival stacking: a method to cast survival regression analysis as a classification problem

Erin Craig; Chenyang Zhong; Robert Tibshirani

doi:10.1515/ijb-2022-0055

Artikel

A review of survival stacking: a method to cast survival regression analysis as a classification problem

Erin Craig , Chenyang Zhong und Robert Tibshirani

Veröffentlicht/Copyright: 28. März 2025

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen Erkunden Sie dieses Fachgebiet

Aus der Zeitschrift The International Journal of Biostatistics Band 21 Heft 1

Abstract

While there are many well-developed data science methods for classification and regression, there are relatively few methods for working with right-censored data. Here, we review survival stacking, a method for casting a survival regression analysis problem as a classification problem, thereby allowing the use of general classification methods and software in a survival setting. Inspired by the Cox partial likelihood, survival stacking collects features and outcomes of survival data in a large data frame with a binary outcome. We show that survival stacking with logistic regression is approximately equivalent to the Cox proportional hazards model. We further illustrate survival stacking on real and simulated data. By reframing survival regression problems as classification problems, survival stacking removes the reliance on specialized tools for survival regression, and makes it straightforward for data scientists to use well-known learning algorithms and software for classification in the survival setting. This in turn lowers the barrier for flexible survival modeling.

Keywords: survival regression analysis; Cox regression; censored data

Corresponding author: Erin Craig, Department of Biomedical Data Science, Stanford University, Stanford, USA, E-mail: erincr@stanford.edu

Funding source: National Science Foundation

Award Identifier / Grant number: 19 DMS1208164

Funding source: National Institutes of Health

Award Identifier / Grant number: 5R01 EB001988-16

Acknowledgments

The authors thank Terry Therneau for the argument in Section 2.3, and Terry Therneau, Thomas Gerds, Lu Tian, Trevor Hastie and Stephen Pfohl for helpful discussions. We also thank the peer reviewers for their valuable suggestions.

Research ethics: Not applicable.
Informed consent: Not applicable.
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Use of Large Language Models, AI and Machine Learning Tools: None declared.
Conflict of interest: The authors state no conflict of interest.
Research funding: National Science Foundation, award number 19 DMS1208164; National Institutes of Health, award number 5R01 EB001988-16.
Data availability: Not applicable.

Appendix A: The Cox partial and profile likelihoods

Recall, the Cox partial likelihood is:

L partial ( β ) = ∏ i : d i = 1 P i has the event at t i ∣ risk set R ( t i ) , one subject has the event at t i = ∏ i : d i = 1 exp x i T β ∑ j ∈ R ( t i ) exp x j T β

(10) ℓ partial ( β ) = log ( L partial ( β ) ) = ∑ i : d i = 1 x i T β − log ∑ j ∈ R ( t i ) exp x j T β .

For simplicity, we have assumed (and will continue to assume) that there are no tied times: no two subjects have the event at the exact same time. Once fitted, the coefficients β are often used to describe the relative risk between subjects for different values of x. Optimizing the partial likelihood, however, does not allow us to say anything about the absolute risk for any individual subject: the baseline hazard λ ₀(t) does not appear anywhere in the partial likelihood.

To jointly model the baseline hazard, we can look at the full log-likelihood for the Cox model. We will assume that the baseline hazard is discrete: the function λ ₀(t) takes values λ t 1 , λ t 2 , … , λ t k at observed event times t ₁, …, t _k, and λ ₀(t) = 0 at all other times. The full log-likelihood for the Cox model is then:

(11) ℓ full { λ t i } i = 1 k , β = ∑ i : d i = 1 log ( λ t i ) + x i T β − λ t i ∑ j ∈ R ( t i ) exp x j T β ,

where t _i is the final observation time for subject i.

We can use the full likelihood to estimate the baseline hazard as a function of β. We optimize the full likelihood (Equation (11)) for λ t i to obtain:

(12) λ t i ( β ) = 1 ∑ j ∈ R ( t i ) exp x j T β .

Equation (12) is known as Breslow’s estimate of the baseline hazard [38], and it is the most common method of estimating the baseline hazard. We first estimate β ̂ by maximizing the partial likelihood, and then estimate the baseline hazard as { λ t i ( β ̂ ) } i = 1 k .

Lastly, if we plug Equation (12) back in to the full likelihood, we obtain the profile likelihood:

(13) ℓ profile ( β ) = ∑ i : d i = 1 x i T β − log ∑ j ∈ R ( t i ) exp x j T β − 1 ,

which coincides with the partial likelihood (Equation (2)), up to a constant.

Appendix B: Software: learning methods for survival data

Table 2:

Common R software packages for fitting survival models, and their support for attributes of survival data.

Package	Function	Time dep. covs.	Truncation	Sample weights	Time-varying effects	Non-linear	Missing data
Linear models
glmnet [39]	glmnet	✓	✓	✓
survival [40]	coxph	✓	✓	✓
Parametric models
flexsurv [41]	flexsurvreg	✓	✓	✓		✓
survival	survreg	✓	✓	✓
Trees & random forests
aorsf [42]	orsf					✓	✓
grf [43]	survival_forest			✓		✓	✓
LTRCforests [44]	ltrccif	✓	✓		✓	✓	✓
partykit [45]	ctree			✓		✓	✓
randomForestSRC [34]	rfsrc			✓		✓	✓
ranger [46]	ranger			✓		✓	✓
rpart [47]	rpart			✓		✓	✓
Boosting
CoxBoost [48]	CoxBoost			✓
gbm [49]	gbm			✓		✓	✓
mboost [50]	glmboost			✓
	gamboost			✓		✓
xgboost [51]	xgboost			✓		✓	✓
SVMs
survivalsvm [52]	survivalsvm					✓
Neural nets
survivalmodels [53]	coxtime				✓	✓
	deephit				✓	✓
	deepsurv				✓	✓
	dnnsurv				✓	✓
	loghaz (Nnet-survival)				✓	✓
	pchazard				✓	✓
Ecosystems
mlr3proba [54]						✓	✓

Table 3:

Common Python software packages, and their support for attributes of survival data.

Package	Function	Time dep. covs.	Truncation	Sample weights	Time-varying effects	Non-linear	Missing data
Linear models
lifelines [55]	CoxPHFitter
	CoxTimeVaryingFitter	✓	✓
PySurvival [56]	CoxPHModel
	LinearMultiTaskModel				✓
scikit-survival [57]	CoxPHSurvivalAnalysis
	CoxnetSurvivalAnalysis
Trees & random forests
PySurvival	RandomSurvivalForestModel			✓		✓	✓
	ExtraSurvivalTreesModel			✓		✓	✓
	ConditionalSurvivalForestModel			✓		✓	✓
scikit-survival	RandomSurvivalForest					✓	✓
Boosting
scikit-survival	GradientBoostingSurvivalAnalysis					✓	✓
Neural nets
pycox [58]	CoxPH (DeepSurv)				✓	✓
	LogisticHazard (Nnet-survival)				✓	✓
	DeepHit				✓	✓
	N-MTLR				✓	✓
PySurvival	NeuralMultiTaskModel				✓	✓

Table 4:

Common R software packages to reshape survival data, and their support for attributes of survival data. Though there exist packages to fit survival models in Python, we were unable to identify packages with support for reshaping data.

Language	Package	Function	Time dep. covs.	Truncation
R	discSurv [59]	dataLong
		dataLongTimeDep	✓
	survival [40]	survSplit	✓	✓
	pammtools [60]	as_ped	✓	✓

Survival stacking allows the modeling of survival data – with time-varying covariates and truncation – using linear and non-linear models, and it naturally enables the modeling of time-varying effects. Moreover, this is now possible using familiar, well-developed software for classification and regression. This is important, as there are very few survival software packages that are equally flexible.

Here, we examine the support for various features of survival data in common survival software packages in R and Python, with a focus on methods discussed in this work (Tables 2–4). We note that, though individual methods may support a particular feature of survival data (e.g. Nnet-survival supports time-varying covariates), it is not always the case that the corresponding software follows suit.

References

1. Cox, DR. Regression models and life-tables. J R Stat Soc B: Stat Methodol 1972;34:187–202. https://doi.org/10.1111/j.2517-6161.1972.tb00899.x.Suche in Google Scholar

2. Wei, L-J. The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. Stat Med 1992;11:1871–9. https://doi.org/10.1002/sim.4780111409.Suche in Google Scholar PubMed

3. Gray, RJ. Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. J Am Stat Assoc 1992;87:942–51. https://doi.org/10.2307/2290630.Suche in Google Scholar

4. Royston, P, Altman, DG. External validation of a Cox prognostic model: principles and methods. BMC Med Res Methodol 2013;13:1–15. https://doi.org/10.1186/1471-2288-13-33.Suche in Google Scholar PubMed PubMed Central

5. D’Agostino, RB, Lee, M-L, Belanger, AJ, Adrienne Cupples, L, Anderson, K, Kannel, WB. Relation of pooled logistic regression to time dependent Cox regression analysis: the Framingham Heart Study. Stat Med 1990;9:1501–15. https://doi.org/10.1002/sim.4780091214.Suche in Google Scholar PubMed

6. Ingram, DD, Kleinman, JC. Empirical comparisons of proportional hazards and logistic regression models. Stat Med 1989;8:525–38. https://doi.org/10.1002/sim.4780080502.Suche in Google Scholar PubMed

7. Lim, M, Hastie, T. Learning interactions via hierarchical group-lasso regularization. J Comput Graph Stat 2015;24:627–54. https://doi.org/10.1080/10618600.2014.938812.Suche in Google Scholar PubMed PubMed Central

8. Harrell, FE, Califf, RM, Pryor, DB, Lee, KL, Rosati, RA. Evaluating the yield of medical tests. JAMA 1982;247:2543–6. https://doi.org/10.1001/jama.1982.03320430047030.Suche in Google Scholar

9. Uno, H, Cai, T, Lu, T, Wei, L-J. Evaluating prediction rules for t-year survivors with censored regression models. J Am Stat Assoc 2007;102:527–37. https://doi.org/10.1198/016214507000000149.Suche in Google Scholar

10. Mogensen, UB, Ishwaran, H, Gerds, TA. Evaluating random forests for survival analysis using prediction error curves. J Stat Software 2012;50:1. https://doi.org/10.18637/jss.v050.i11.Suche in Google Scholar PubMed PubMed Central

11. Wu, M, Ware, JH. On the use of repeated measurements in regression analysis with dichotomous responses. Biometrics 1979:513–21. https://doi.org/10.2307/2530355.Suche in Google Scholar

12. Adrienne Cupples, L, D’Agostino, RB, Anderson, K, Kannel, WB. Comparison of baseline and repeated measure covariate techniques in the Framingham Heart Study. Stat Med 1988;7:205–18. https://doi.org/10.1002/sim.4780070122.Suche in Google Scholar PubMed

13. Therneau, TM, Grambsch, PM. Modeling survival data: extending the Cox model, 1st ed. New York: Springer; 2000:50–3 pp.10.1007/978-1-4757-3294-8_1Suche in Google Scholar

14. Tutz, G, Schmid, M. Modeling discrete time-to-event data. Cham: Springer; 2016.10.1007/978-3-319-28158-2Suche in Google Scholar

15. Allison, PD. Discrete-time methods for the analysis of event histories. Socio Methodol 1982;13:61–98. https://doi.org/10.2307/270718.Suche in Google Scholar

16. Polley, EC, van der Laan, MJ. Super learning for right-censored data. In: Targeted learning: causal inference for observational and experimental data. New York: Springer; 2011:249–58 pp.10.1007/978-1-4419-9782-1_16Suche in Google Scholar

17. Fahrmeir, L. Discrete survival-time models. In: Wiley StatsRef: statistics reference online; 2014.10.1002/9781118445112.stat06012Suche in Google Scholar

18. Moore, KL, van der Laan, MJ. Covariate adjustment in randomized trials with binary outcomes: targeted maximum likelihood estimation. Stat Med 2009;28:39–64. https://doi.org/10.1002/sim.3445.Suche in Google Scholar PubMed PubMed Central

19. Stitelman, OM, De Gruttola, V, van der Laan, MJ. A general implementation of tmle for longitudinal data applied to causal inference in survival analysis. Int J Biostat 2012;8. https://doi.org/10.1515/1557-4679.1334.Suche in Google Scholar PubMed

20. Cai, W, van der Laan, MJ. One-step targeted maximum likelihood estimation for time-to-event outcomes. Biometrics 2020;76:722–33. https://doi.org/10.1111/biom.13172.Suche in Google Scholar PubMed

21. Rytgaard, HCW, van der Laan, MJ. Targeted maximum likelihood estimation for causal inference in survival and competing risks analysis. In: Lifetime data analysis; 2022:1–30 pp.10.1007/s10985-022-09576-2Suche in Google Scholar PubMed

22. Fewell, Z, Hernán, MA, Wolfe, F, Tilling, K, Choi, H, Sterne, JAC. Controlling for time-dependent confounding using marginal structural models. Stata J 2004;4:402–20. https://doi.org/10.1177/1536867x0400400403.Suche in Google Scholar

23. Benkeser, D, Gilbert, PB, Carone, M. Estimating and testing vaccine sieve effects using machine learning. J Am Stat Assoc 2019;114:1038–49. https://doi.org/10.1080/01621459.2018.1529594.Suche in Google Scholar PubMed PubMed Central

24. Ching, T, Zhu, X, Garmire, LX. Cox-nnet: an artificial neural network method for prognosis prediction of high-throughput omics data. PLoS Comput Biol 2018;14:e1006076. https://doi.org/10.1371/journal.pcbi.1006076.Suche in Google Scholar PubMed PubMed Central

25. Katzman, JL, Shaham, U, Cloninger, A, Bates, J, Jiang, T, Kluger, Y. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol 2018;18:1–12. https://doi.org/10.1186/s12874-018-0482-1.Suche in Google Scholar PubMed PubMed Central

26. Giunchiglia, E, Nemchenko, A, van der Schaar, M. Rnn-surv: a deep recurrent model for survival analysis. In: International conference on artificial neural networks. Springer; 2018:23–32 pp.10.1007/978-3-030-01424-7_3Suche in Google Scholar

27. Gensheimer, MF, Narasimhan, B. A scalable discrete-time survival model for neural networks. PeerJ 2019;7:e6257. https://doi.org/10.7717/peerj.6257.Suche in Google Scholar PubMed PubMed Central

28. Caruana, R. Multitask learning. Mach Learn 1997;28:41–75. https://doi.org/10.1023/a:1007379606734.10.1023/A:1007379606734Suche in Google Scholar

29. Yu, C-N, Greiner, R, Lin, H-C, Baracos, V. Learning patient-specific cancer survival distributions as a sequence of dependent regressors. Adv Neural Inf Process Syst 2011;24:1845–53.Suche in Google Scholar

30. Fotso, S. Deep neural networks for survival analysis based on a multi-task framework. arXiv preprint arXiv:1801.05512 2018.Suche in Google Scholar

31. Alexander Gerds, T, Sebastian Ohlendorff, J, Ozenne, B. riskRegression: risk regression models and prediction scores for survival analysis with competing risks. R Package Version 2023.03.22; 2023.Suche in Google Scholar

32. Gerds, TA, Kattan, MW. Medical risk prediction models: with ties to machine learning, 1st ed. New York: Chapman and Hall/CRC; 2021.10.1201/9781138384484-1Suche in Google Scholar

33. Rindt, D, Hu, R, Steinsaltz, D, Sejdinovic, D. Survival regression with proper scoring rules and monotonic neural networks. In: International conference on artificial intelligence and statistics. PMLR; 2022:1190–205 pp.Suche in Google Scholar

34. Ishwaran, H, Kogalur, UB, Blackstone, EH, Lauer, MS. Random survival forests. Ann Appl Stat 2008;2:841–60. https://doi.org/10.1214/08-aoas169.Suche in Google Scholar

35. Freund, Y, Schapire, R, Abe, N. A short introduction to boosting. J Jpn Soc Artif Intell 1999;14:1612.Suche in Google Scholar

36. Friedman, JH. Greedy function approximation: a gradient boosting machine. Ann Stat 2001:1189–232. https://doi.org/10.1214/aos/1013203451.Suche in Google Scholar

37. Brilleman, SL, Wolfe, R, Moreno-Betancur, M, Crowther, MJ. Simulating survival data using the simsurv R package. J Stat Software 2020;97:1–27. https://doi.org/10.18637/jss.v097.i03.Suche in Google Scholar

38. Breslow, NE. Discussion of the paper by DR Cox. J Roy Stat Soc B 1972;34:216–17.Suche in Google Scholar

39. Simon, N, Friedman, J, Hastie, T, Tibshirani, R. Regularization paths for Cox’s proportional hazards model via coordinate descent. J Stat Software 2011;39:1–13. https://doi.org/10.18637/jss.v039.i05.Suche in Google Scholar PubMed PubMed Central

40. Therneau, TM. A package for survival analysis in R. R Package Version 3.2-7; 2020.Suche in Google Scholar

41. Jackson, C. flexsurv: a platform for parametric survival modeling in R. J Stat Software 2016;70:1–33. https://doi.org/10.18637/jss.v070.i08.Suche in Google Scholar PubMed PubMed Central

42. Jaeger, BC, Welden, S, Lenoir, K, Pajewski, NM. aorsf: an r package for supervised learning using the oblique random survival forest. J Open Source Softw 2022;7:4705. https://doi.org/10.21105/joss.04705.Suche in Google Scholar

43. Tibshirani, J, Athey, S, Wager, S. grf: generalized random forests. R Package Version 1.2.0; 2020.Suche in Google Scholar

44. Yao, W, Frydman, H, Larocque, D, Simonoff, JS. LTRCforests: ensemble methods for survival data with time-varying covariates. R Package Version 0.5.5; 2021.10.32614/CRAN.package.LTRCforestsSuche in Google Scholar

45. Hothorn, T, Zeileis, A. partykit: a modular toolkit for recursive partytioning in R. J Mach Learn Res 2015;16:3905–9.Suche in Google Scholar

46. Wright, MN, Ziegler, A. ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Software 2017;77:1–17. https://doi.org/10.18637/jss.v077.i01.Suche in Google Scholar

47. Therneau, T, Atkinson, B. rpart: recursive partitioning and regression trees. R Package Version 4.1.16; 2022.Suche in Google Scholar

48. Binder, H. CoxBoost: Cox models by likelihood based boosting for a single survival endpoint or competing risks. R Package Version 1.5; 2023.Suche in Google Scholar

49. Greenwell, B, Boehmke, B, Cunningham, J, Developers, GBM. gbm: generalized boosted regression models. R Package Version 2.1.8; 2020.Suche in Google Scholar

50. Hothorn, T, Buehlmann, P, Kneib, T, Schmid, M, Hofner, B. mboost: model-based boosting. R Package Version 2.9-7; 2022.Suche in Google Scholar

51. Chen, T, He, T, Benesty, M, Khotilovich, V, Tang, Y, Cho, H, et al.. xgboost: extreme gradient boosting. R Package Version 2.0.0.1; 2022.Suche in Google Scholar

52. Fouodo, CJK. survivalsvm: survival support vector analysis. R Package Version 0.0.5; 2018.Suche in Google Scholar

53. Sonabend, R. survivalmodels: models for survival analysis. R Package Version 0.1.8; 2021.10.32614/CRAN.package.survivalmodelsSuche in Google Scholar

54. Sonabend, R, Király, FJ, Bender, A, Bischl, B, Lang, M. mlr3proba: an R package for machine learning in survival analysis. Bioinformatics 2021;37:2789–91. https://doi.org/10.1093/bioinformatics/btab039.Suche in Google Scholar PubMed PubMed Central

55. Davidson-Pilon, C. lifelines: survival analysis in python. J Open Source Softw 2019;4:1317. https://doi.org/10.21105/joss.01317.Suche in Google Scholar

56. Fotso S and others. PySurvival: open source package for survival analysis modeling; 2019. Available from: https://www.pysurvival.io/.Suche in Google Scholar

57. Pölsterl, S. scikit-survival: a library for time-to-event analysis built on top of scikit-learn. J Mach Learn Res 2020;21:1–6.10.1007/978-1-4842-5373-1_1Suche in Google Scholar

58. Kvamme, H, Borgan, Ø, Scheel, I. Time-to-event prediction with neural networks and Cox regression. arXiv preprint arXiv:1907.00825 2019.Suche in Google Scholar

59. Welchowski, T, Schmid, M. discsurv: discrete time survival analysis. R Package Version; 2015, vol 1:1 p.Suche in Google Scholar

60. Bender, A, Groll, A, Scheipl, F. A generalized additive model approach to time-to-event analysis. Stat Model Int J 2018;18:299–321. https://doi.org/10.1177/1471082x17748083.Suche in Google Scholar

Received: 2022-05-06

Accepted: 2025-01-07

Published Online: 2025-03-28

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Artikel in diesem Heft

https://doi.org/10.1515/ijb-2022-0055

Schlagwörter für diesen Artikel

survival regression analysis; Cox regression; censored data