Regression analysis of interval-censored failure time data under semiparametric transformation models with missing covariates

Yichen Lou; Mingyue Du

doi:10.1515/ijb-2024-0016

Artikel

Regression analysis of interval-censored failure time data under semiparametric transformation models with missing covariates

Yichen Lou und Mingyue Du

Veröffentlicht/Copyright: 29. August 2025

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift The International Journal of Biostatistics

Abstract

This paper discusses regression analysis of interval-censored failure time data arising from semiparametric transformation models in the presence of covariates that are missing at random (MAR). We define a specific formulation of the MAR mechanism tailored to the interval censoring, where the timing of observation adds complexity to handling missing covariates. To overcome the limitations and computational challenges present in the existing methods, we propose a multiple imputation procedure that can be easily implemented with the use of the standard software. The proposed method makes use of two predictive scores for each individual and the distance defined by these scores. Furthermore, it utilizes partial information from incomplete observations and thus yields more efficient estimators than the complete-case analysis and the inverse probability weighting approach. An extensive simulation study is conducted to assess the performance of the proposed method and indicates that it performs well in practical situations. Finally we apply the proposed approach to an Alzheimer’s Disease study that motivated this work.

Keywords: interval-censoring; multiple imputation; predictive score matching; semiparametric transformation model

Corresponding author: Mingyue Du, School of Mathematics, Jilin University, Changchun, China, E-mail: mingydu@jlu.edu.cn

Acknowledgement

The authors want to thank Prof. Olivier Bouaziz, the Associate Editor and two anonymous referees for their many insightful comments and suggestions that greatly improved the paper.

Research ethics: Not applicable.
Informed consent: Not applicable.
Author contributions: The authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Use of Large Language Models, AI and Machine Learning Tools: None declared.
Conflict of interest: The authors state no conflict of interest.
Research funding: None declared.
Data availability: Not applicable.

Appendix A: Imputation algorithm for situations of q > 1

In this Appendix, we extend the algorithm given in Section 4 for the situation of q = 1 to the case of q > 1. For simplicity, we focus on the case of q = 2 and the case with larger q can be generalized similarly. First, let X = ( X 1 , X 2 ) T denote the covariates with missing, and the missing indicator denoted by χ = ( χ 1 , χ 2 ) T . Based on the MAR assumptions in Section 2, we impute X ₁ and X ₂ in turn, then perform the analysis using the imputed X ₁ and X ₂.

Similar to the definition in Section 4, with W ̃ , we fit two linear/generalized linear regression models (Model H _1,(1)(⋅) and H _1,(2)(⋅)) using W ̃ as the covariate for the complete cases to obtain the predictive scores for the values of X ₁ and X ₂, denoted as ζ ̃ ( 1 ) and ζ ̃ ( 2 ) , respectively. Simultaneously, we fit logistic regression models (Model H _2,(1)(⋅) and H _2,(2)(⋅)) using W ̃ as the covariate to predict the missing indicators χ ₁ and χ ₂, denoted as ζ ̄ ( 1 ) and ζ ̄ ( 2 ) , respectively.

The detailed multiple imputation procedures are outlined as follows.

[Step 1] Bootstrap Sampling.

Obtain a bootstrap sample B ( s ) = B 1 ( s ) , … , B n ( s ) from the original dataset O. The complete cases within this bootstrap sample are denoted as B obs ( s ) = B [ 1 ] ( s ) , … , B [ n ( s ) ] ( s ) .

[Step 2] Self-Consistency Algorithm.

Apply Turnbull’s self-consistency algorithm to B ^(s) to obtain W ̃ ( s ) .
Perform the same procedure on O to obtain W ̃ .

[Step 3] Model Fitting.

Fit models H _1,(1)(⋅) and H _1,(2)(⋅), as well as H _2,(1)(⋅) and H _2,(2)(⋅) using B obs ( s ) and B ^(s), respectively. This will yield the estimates H ̂ 1 , ( 1 ) ( s ) ( ⋅ ) , H ̂ 1 , ( 2 ) ( s ) ( ⋅ ) , H ̂ 2 , ( 1 ) ( s ) ( ⋅ ) and H ̂ 2 , ( 2 ) ( s ) ( ⋅ ) .
Calculate the initial predictive scores ζ ̃ ( 1 ) 0 ( s ) and ζ ̃ ( 2 ) 0 ( s ) for each individual in B obs ( s ) and ζ ̄ ( 1 ) 0 ( s ) and ζ ̄ ( 2 ) 0 ( s ) for each individual in B ^(s).
Standardize the scores by subtracting the sample mean and dividing by the standard deviation, resulting in standardized scores ( ζ ̃ ( 1 ) ( s ) , ζ ̃ ( 2 ) ( s ) , ζ ̄ ( 1 ) ( s ) , ζ ̄ ( 2 ) ( s ) ) for the individual in B obs ( s ) .

[Step 4] Impute Missing Values for X ₁.

For each missing value X _1,k associated with individual O _k in O, apply the estimated working models H ̂ 1 , ( 1 ) ( s ) ( ⋅ ) and H ̂ 2 , ( 1 ) ( s ) ( ⋅ ) to compute its initial predictive score.
Standardize these scores by subtracting the sample mean of the bootstrap predictive scores and dividing by the standard deviation of the bootstrap predictive scores, resulting in standardized scores ( ζ ̃ k ( 1 ) , ζ ̄ k ( 1 ) ) .
Use the pre-specified ω ₁ to calculate the distance D _k,(1) = {D(k, [1]), …, D(k, [n ^(s)])} between O _k and the individuals in B obs ( s ) as
D ( k , [ j ] ) = ω 1 ζ ̃ k ( 1 ) − ζ ̃ [ j ] ( 1 ) ( s ) 2 + ( 1 − ω 1 ) ζ ̄ k ( 1 ) − ζ ̄ [ j ] ( 1 ) ( s ) 2 1 / 2 , j = 1 , … , n ( s ) .
Define a neighborhood set consisting of nearest neighbor (NN) subjects from B ^(s) by sorting distances in ascending order. Then, randomly draw a pseudo observation value for X _1,k from the observed values of X 1 ( s ) within this neighborhood set.

[Step 5] Impute Missing Values for X ₂.

For each missing value X _2,c associated with individual O _c in O, apply the estimated working models H ̂ 1 , ( 2 ) ( s ) ( ⋅ ) and H ̂ 2 , ( 2 ) ( s ) ( ⋅ ) to compute its initial predictive score.
Standardize these scores by subtracting the sample mean of the bootstrap predictive scores and dividing by the standard deviation of the bootstrap predictive scores, resulting in standardized scores ( ζ ̃ c ( 2 ) , ζ ̄ c ( 2 ) ) .
Use the pre-specified ω ₁ to calculate the distance D _c,(2) between O _c and the individuals in B obs ( s ) similarly.
Define the neighborhood set from B ^(s), and randomly draw a pseudo observation value for X _2,c from the observed values of X 2 ( s ) within this neighborhood set.

[Step 6] Final Estimation.

After completing the imputation for all missing covariates, proceed with the sieve maximum likelihood procedure using the log-likelihood function to obtain the final estimator θ ̂ ( s ) = ( α ̂ ( s ) , β ̂ ( s ) , Λ ̂ 0 ( s ) ) .

The entire imputation procedure will be repeated K times, generating K multiple imputed data sets. The overall estimates and their variances will be obtained by applying Rubin’s combination rule, similar to the method described in Section 4.

Appendix B: Additional details about numerical studies

We also consider a scenario with q > 2. In this case, we maintain Z as previously defined, while letting X = ( X 1 , X 2 ) T . The variable X ₁ was generated from the normal distribution with the mean Z ₁ and variance of 1, and X ₂ is generated from a Bernoulli distribution with success probability given by 1/{1 + exp(−0.3 + 0.9Z ₁) + exp(0.3 − 0.6Z ₂)}. The missing mechanisms for X ₁ and X ₂ are defined as P(χ ₁ = 1∣O) = 1/{1 + exp(−2.5 − 0.5Z ₁ + 0.5L + 0.5R)} and P(χ ₂ = 1∣O) = 1/{1 + exp(−2.0 − 0.5Z ₁ + 0.5L + 0.5R)}, respectively, leading to missing rates of approximately 25 % for X ₁ and 35 % for X ₂, resulting in a total missing rate of around 50 %. The other settings remain the same as those in Section 5, and the results for n = 200 under the PO model are presented in Table 7. These results confirm the same conclusions as before.

Table 7:

Simulation results under q > 1.

Parameters	Bias	SD	ESE	CP	Bias	SD	ESE	CP
	FULL				CCA
α ₁ = 0.8	0.0173	0.1583	0.1515	0.948	−0.0528	0.2286	0.2167	0.930
α ₂ = −0.8	−0.0132	0.2943	0.2909	0.952	0.0705	0.4378	0.4212	0.940
β ₁ = 0.5	0.0036	0.2710	0.2738	0.948	−0.0252	0.4107	0.3916	0.929
β ₂ = −0.6	−0.0063	0.3158	0.3103	0.946	−0.1939	0.4572	0.4492	0.942
	IPW-I				IPW-II
α ₁ = 0.8	0.0373	0.2694	0.2827	0.963	−0.0526	0.2355	0.2492	0.943
α ₂ = −0.8	−0.0201	0.5242	0.5476	0.951	0.0639	0.4482	0.4806	0.967
β ₁ = 0.5	0.0422	0.4944	0.5147	0.958	−0.0288	0.4196	0.4361	0.951
β ₂ = −0.6	−0.1190	0.5607	0.5766	0.951	−0.2008	0.4773	0.5102	0.958
	NN-1				NN-3
α ₁ = 0.8	0.0006	0.1960	0.1978	0.949	−0.0020	0.1921	0.1952	0.954
α ₂ = −0.8	0.0564	0.4077	0.4867	0.972	0.0522	0.4030	0.4604	0.964
β ₁ = 0.5	−0.0064	0.2876	0.2959	0.954	−0.0049	0.2860	0.2940	0.960
β ₂ = −0.6	0.0206	0.3556	0.3541	0.949	0.0301	0.3486	0.3505	0.949
	NN-5				NN-10
α ₁ = 0.8	−0.0046	0.1918	0.1934	0.952	−0.0146	0.1905	0.1914	0.955
α ₂ = −0.8	0.0579	0.4000	0.4469	0.955	0.0620	0.3960	0.4302	0.955
β ₁ = 0.5	−0.0070	0.2861	0.2927	0.958	−0.0089	0.2836	0.2904	0.954
β ₂ = −0.6	0.0378	0.3464	0.3476	0.946	0.0594	0.3423	0.3423	0.944

References

1. Sun, J. The statistical analysis of interval-censored failure time data. New York: Springer; 2006.Suche in Google Scholar

2. Sun, J, Chen, D. Emerging topics in modeling interval-censored survival data. Switzerland: Springer Nature; 2022.10.1007/978-3-031-12366-5Suche in Google Scholar

3. Kalbfleisch, JD, Prentice, RL. The statistical analysis of failure time data. Hoboken: John Wiley & Sons; 2011.Suche in Google Scholar

4. Fine, J, Ying, Z, Wei, L. On the linear transformation model for censored data. Biometrika 1998;85:980–6.10.1093/biomet/85.4.980Suche in Google Scholar

5. Chen, K, Jin, Z, Ying, Z. Semiparametric analysis of transformation models with censored data. Biometrika 2002;89:659–68. https://doi.org/10.1093/biomet/89.3.659.Suche in Google Scholar

6. Zhang, Z, Zhao, Y. Empirical likelihood for linear transformation models with interval-censored failure time data. J Multivariate Anal 2013;116:398–409. https://doi.org/10.1016/j.jmva.2013.01.003.Suche in Google Scholar

7. Zeng, D, Mao, L, Lin, D. Maximum likelihood estimation for semiparametric transformation models with interval-censored data. Biometrika 2016;103:253–71. https://doi.org/10.1093/biomet/asw013.Suche in Google Scholar PubMed PubMed Central

8. Zhou, Q, Hu, T, Sun, J. A sieve semiparametric maximum likelihood approach for regression analysis of bivariate interval-censored failure time data. J Am Stat Assoc 2017;112:664–72. https://doi.org/10.1080/01621459.2016.1158113.Suche in Google Scholar

9. Weiner, MW, Veitch, DP, Aisen, PS, Beckett, LA, Cairns, NJ, Green, RC, et al.. The alzheimer’s disease neuroimaging initiative: a review of papers published since its inception. Alzheim Dement 2013;9:e111–e1 https://doi.org/10.1016/j.jalz.2013.05.1769.Suche in Google Scholar PubMed PubMed Central

10. Weiner, MW, Veitch, DP, Aisen, PS, Beckett, LA, Cairns, NJ, Cedarbaum, J, et al.. Impact of the alzheimer’s disease neuroimaging initiative, 2004 to 2014. Alzheim Dement 2015;11:865–84. https://doi.org/10.1016/j.jalz.2015.04.005.Suche in Google Scholar PubMed PubMed Central

11. Little, A, Rubin, DB. Statistical analysis with missing data. New York: Wiley; 2002.10.1002/9781119013563Suche in Google Scholar

12. Hsu, C-H, Yu, M. Cox regression analysis with missing covariates via nonparametric multiple imputation. Stat Methods Med Res 2019;28:1676–88. https://doi.org/10.1177/0962280218772592.Suche in Google Scholar PubMed PubMed Central

13. Lou, Y, Ma, Y, Du, M. A new and unified method for regression analysis of interval-censored failure time data under semiparametric transformation models with missing covariates. Stat Med 2024a;43:2062–82. https://doi.org/10.1002/sim.10035.Suche in Google Scholar PubMed

14. Wen, C-C, Lin, C-T. Analysis of current status data with missing covariates. Biometrics 2011;67:760–9. https://doi.org/10.1111/j.1541-0420.2010.01505.x.Suche in Google Scholar PubMed

15. Li, H, Zhang, H, Zhu, L, Li, N, Sun, J. Estimation of the additive hazards model with interval-censored data and missing covariates. Can J Stat 2020a;48:499–517. https://doi.org/10.1002/cjs.11544.Suche in Google Scholar

16. Zhou, R, Li, H, Sun, J, Tang, N. A new approach to estimation of the proportional hazards model based on interval-censored data with missing covariates. Lifetime Data Anal 2022;28:335–55. https://doi.org/10.1007/s10985-022-09550-y.Suche in Google Scholar PubMed

17. Long, Q, Hsu, C-H, Li, Y. Doubly robust nonparametric multiple imputation for ignorable missing data. Stat Sin 2012;22:149. https://doi.org/10.5705/ss.2010.069.Suche in Google Scholar

18. Turnbull, BW. The empirical distribution function with arbitrarily grouped, censored and truncated data. J Roy Stat Soc B 1976;38:290–5. https://doi.org/10.1111/j.2517-6161.1976.tb01597.x.Suche in Google Scholar

19. Rubin, DB. Inference and missing data. Biometrika 1976;63:581–92. https://doi.org/10.2307/2335739.Suche in Google Scholar

20. Little, RJ, Rubin, DB. Statistical analysis with missing data. Hoboken: John Wiley & Sons; 2019.10.1002/9781119482260Suche in Google Scholar

21. Allison, P. Missing data. Thousand Oaks, CA: Sage; 2001.Suche in Google Scholar

22. Du, M, Lou, Y, Sun, J. Estimation and variable selection for interval-censored failure time data with random change point and application to breast cancer study. J Am Stat Assoc 2025:1–12. https://doi.org/10.1080/01621459.2024.2441522.Suche in Google Scholar

23. White, IR, Royston, P. Imputing missing covariate values for the cox model. Stat Med 2009;28:1982–98. https://doi.org/10.1002/sim.3618.Suche in Google Scholar PubMed PubMed Central

24. Heitjan, D F, Little, R J. Multiple imputation for the fatal accident reporting system. J Roy Stat Soc Series C Appl Stat 1991;40:13–29. https://doi.org/10.2307/2347902.Suche in Google Scholar

25. Lou, Y, Ma, Y, Xiang, L, Sun, J. A multiple imputation approach for flexible modelling of interval-censored data with missing and censored covariates. Comput Stat Data Anal 2025:108177. https://doi.org/10.1016/j.csda.2025.108177.Suche in Google Scholar

26. Nielsen, SF. Proper and improper multiple imputation. Int Stat Rev 2003;71:593–607. https://doi.org/10.1111/j.1751-5823.2003.tb00214.x.Suche in Google Scholar

27. Rubin, D B. Multiple imputation for nonresponse in surveys. New York: John Wiley & Sons; 1987.10.1002/9780470316696Suche in Google Scholar

28. Huang, J, Rossini, A. Sieve estimation for the proportional-odds failure-time regression model with interval censoring. J Am Stat Assoc 1997;92:960–7. https://doi.org/10.2307/2965559.Suche in Google Scholar

29. Chen, D-GD, Sun, J, Peace, KE. Interval-censored time-to-event data: methods and applications. Boca Raton: CRC Press; 2012.10.1201/b12290Suche in Google Scholar

30. Li, K, Chan, W, Doody, RS, Quinn, J, Luo, S, Initiative, ADN, et al.. Prediction of conversion to alzheimer’s disease with longitudinal measures and time-to-event data. J Alzheim Dis 2017;58:361–71. https://doi.org/10.3233/jad-161201.Suche in Google Scholar

31. Li, S, Wu, Q, Sun, J. Penalized estimation of semiparametric transformation models with interval-censored data and application to alzheimer’s disease. Stat Methods Med Res 2020b;29:2151–66. https://doi.org/10.1177/0962280219884720.Suche in Google Scholar PubMed

32. Leffondré, K, Touraine, C, Helmer, C, Joly, P. Interval-censored time-to-event and competing risk with death: is the illness-death model more accurate than the cox model? Int J Epidemiol 2013;42:1177–86. https://doi.org/10.1093/ije/dyt126.Suche in Google Scholar PubMed

33. Anderson-Bergman, C. icenreg: regression models for interval censored data in r. J Stat Software 2017;81:1–23. https://doi.org/10.18637/jss.v081.i12.Suche in Google Scholar

34. Du, M, Zhou, Q. Analysis of informatively interval-censored case-cohort studies with application to hiv vaccine trials. Commun Math Stat 2023;13:195–215. https://doi.org/10.1007/s40304-022-00322-6.Suche in Google Scholar

35. Lou, Y, Sun, J, Wang, P. Semiparametric cure regression models with informative case k interval-censored failure time data. Stat Sin 2026;36:1–22.10.5705/ss.202022.0343Suche in Google Scholar

36. Du, M, Li, H, Sun, J. Regression analysis of censored data with nonignorable missing covariates and application to alzheimer disease. Comput Stat Data Anal 2021;157:107157. https://doi.org/10.1016/j.csda.2020.107157.Suche in Google Scholar

Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/ijb-2024-0016).

Received: 2024-02-14

Accepted: 2025-05-14

Published Online: 2025-08-29

Sie haben derzeit keinen Zugang zu diesem Inhalt.

https://doi.org/10.1515/ijb-2024-0016

Schlagwörter für diesen Artikel

interval-censoring; multiple imputation; predictive score matching; semiparametric transformation model

Regression analysis of interval-censored failure time data under semiparametric transformation models with missing covariates

Artikel

Abstract

Acknowledgement

Appendix A: Imputation algorithm for situations of q > 1

Appendix B: Additional details about numerical studies

References

Supplementary Material

Zusatzmaterial