Abstract
In this paper, we study inference methods for regression analysis of clustered current status data with informative cluster sizes. When the correlated failure times of interest arise from a general class of semiparametric transformation frailty models, we develop a nonparametric maximum likelihood estimation based method for regression analysis and conduct an expectation-maximization algorithm to implement it. The asymptotic properties including consistency and asymptotic normality of the proposed estimators are established. Extensive simulation studies are conducted and indicate that the proposed method works well. The developed approach is applied to analyze a real-life data set from a tumorigenicity study.
Acknowledgments
The authors thank the Editor, the Associate Editor, and the two reviewers for their insightful comments and suggestions that greatly improved the article. We thank the Supercomputing Center of Wuhan University for computing support.
-
Research ethics: Not applicable.
-
Informed consent: Not applicable.
-
Author contributions: The authors have accepted responsibility for the entire content of this manuscript and approved its submission.
-
Use of Large Language Models, AI and Machine Learning Tools: None declared.
-
Conflict of interest: The authors state no conflict of interest.
-
Research funding: None declared.
-
Data availability: Not applicable.
Appendix: Some supplementary derivations and proofs of the asymptotic properties
A Some supplementary derivations about the proposed EM algorithm
Denote the estimate of θ obtained from the k-th iteration in the EM algorithm as θ (k). Using the law of iterated expectations, we have
and by the Bayes’ theorem, we have
where f(⋅|⋅) is the conditional probability density function. Thus, we obtain that
Similarly, we have
B Proofs of the asymptotic properties
Let
Lemma 1.
Under conditions (A4) and (A5), with probability one, we have
where c is a generic constant that may vary from place to place and is independent of β , γ and Λ.
The proof of this lemma can be obtained by similar arguments in Li et al. [8].
Now we are ready to prove the asymptotic properties of
Proof of Theorem 4.1:
To establish the consistency, we first define
where f
1(t) is the Radon-Nikodym derivative of E[I(C
ij
< t)]. Because
Define
and
where BV
ω
[τ
1, τ
2] denotes functions which have total variation in [τ
1, τ
2] bounded by a given constant ω. Note that any function in
Note from conditions (A4) and (A5) that ℓ
w
in (A.1) is bounded away from zero. Therefore,
So we obtain that
for a constant c
2. Combining Condition (A5) and inequality (A.2), we obtain that
Define p(
β
, γ, Λ) = exp{n
i
ℓ
wi
(
β
, γ, Λ)}, and let
which yields that
Next, setting δ ij = 0 for j = 1, 2, …, n i and integrate s from 0 to t j ∈ [τ 1, τ 2], for any j = 1, …, n i , we set t j′ = 0 if j′ ≠ j. Due to p( β 0, γ 0, Λ0) = p( β *, γ*, Λ*), we have
By the arguments in the proof of Theorem 1 in Elbers and Ridder [30], we can find that γ* = γ 0, and
Note that both exp(⋅) and G(⋅) are monotonically increasing functions, we have
We differentiate both sides of the above equation with respect to t j and take the logarithm to obtain
for t j ∈ [τ 1, τ 2] and j = 1, …, n i . Based on Condition (A3), we conclude that β 0 = β *, γ 0 = γ* and Λ0 = Λ*. This completes the proof of the consistency.
Proof of Theorem 4.2:
Similar to Li et al. [8], to prove the asymptotic normality, it is sufficient to verify the four conditions stated in Theorem 2 of Murphy [31]. First, we consider parameter submodels
β
ɛ
=
β
+ ɛ
h
1, γ
ɛ
= γ + ɛh
2 and
where S n, β ( β , γ, Λ)( h 1), S n,γ ( β , γ, Λ)(h 2) and S n,Λ( β , γ, Λ)(h 3) are score functions along the submodels, which have the following specific expressions:
and
where
with
As long as we can verify the four conditions stated in Theorem 2 of Murphy [31], the required asymptotic normality will hold. The first one that
Let
where
and
where B
1 is a p × (p + 1) constant matrix, B
2 and B
3 are (p + 1)-vectors. The elements in matrices
It can be seen that the invertibility of
References
1. Sun, J. The statistical analysis of interval-censored failure time data. New York: Springer; 2006.Search in Google Scholar
2. Huang, J. Efficient estimation for the proportional hazards model with interval censoring. Ann Stat 1996;24:540–68. https://doi.org/10.1214/aos/1032894452.Search in Google Scholar
3. Wang, N, Wang, L, Mcmahan, CS. Regression analysis of bivariate current status data under the gamma-frailty proportional hazardsmodel using the EM algorithm. Comput Stat Data Anal 2015;83:140–50. https://doi.org/10.1016/j.csda.2014.10.013.Search in Google Scholar
4. Hu, T, Zhou, Q, Sun, J. Regression analysis of bivariate current status data under the proportional hazards model. Can J Stat 2017;45:410–24. https://doi.org/10.1002/cjs.11344.Search in Google Scholar
5. Lin, D, Oakes, D, Ying, Z. Additive hazards regression with current status data. Biometrika 1998;85:289–98. https://doi.org/10.1093/biomet/85.2.289.Search in Google Scholar
6. Kulich, M, Lin, DY. Additive hazards regression for case-cohort studies. Biometrika 2000;87:73–87. https://doi.org/10.1093/biomet/87.1.73.Search in Google Scholar
7. Feng, Y, Sun, J, Ma, L. Regression analysis of current status data under the additive hazards model with auxiliary covariates. Scand J Stat 2015;42:118–36. https://doi.org/10.1111/sjos.12098.Search in Google Scholar
8. Li, H, Zhang, H, Sun, J. Estimation of the additive hazards model with current status data in the presence of informative censoring. Stat Interface 2019;12:321–30. https://doi.org/10.4310/sii.2019.v12.n2.a12.Search in Google Scholar
9. Cong, X, Yin, G, Shen, Y. Marginal analysis of correlated failure time data with informative cluster sizes. Biometrics 2007;63:663–72. https://doi.org/10.1111/j.1541-0420.2006.00730.x.Search in Google Scholar PubMed
10. Chen, L, Feng, Y, Sun, J. Regression analysis of clustered failure time data with informative cluster size under the additive transformation models. Lifetime Data Anal 2017;23:651–70. https://doi.org/10.1007/s10985-016-9384-x.Search in Google Scholar PubMed
11. Feng, Y, Lin, S, Li, Y. Semiparametric regression of clustered current status data. J Appl Stat 2019;46:1724–37. https://doi.org/10.1080/02664763.2018.1564022.Search in Google Scholar
12. Feng, Y, Prasangika, KD, Zuo, G. Regression analysis of multivariate current status data under a varying coefficients additive hazards frailty model. Can J Stat 2023;51:216–34.10.1002/cjs.11689Search in Google Scholar
13. Li, J, Ma, S. Interval-censored data with repeated measurements and a cured subgroup. J R Stat Soc C-Appl 2010;59:693–705. https://doi.org/10.1111/j.1467-9876.2009.00702.x.Search in Google Scholar
14. Shao, F, Li, J, Ma, S, Lee, M-LT. Semiparametric varying-coefficient model for interval censored data with a cured proportion. Stat Med 2014;33:1700–12. https://doi.org/10.1002/sim.6054.Search in Google Scholar PubMed
15. Rosner, B, Bay, C, Glynn, RJ, Ying, G, Maguire, MG, Lee, MLT. Estimation and testing for clustered interval-censored bivariate survival data with application using the semi-parametric version of the Clayton-Oakes model. Lifetime Data Anal 2023;29:854–87. https://doi.org/10.1007/s10985-022-09588-y.Search in Google Scholar PubMed PubMed Central
16. Finkelstein, DM. A proportional hazards model for interval-censored failure time data. Biometrics 1986;4:845–54. https://doi.org/10.2307/2530698.Search in Google Scholar
17. Pan, W. A multiple imputation approach to Cox regression with interval-censored data. Biometrics 2000;56:199–203. https://doi.org/10.1111/j.0006-341x.2000.00199.x.Search in Google Scholar PubMed
18. Sun, J, Feng, Y, Zhao, H. Simple estimation procedures for regression analysis of interval-censored failure time data under the proportional hazards model. Lifetime Data Anal 2015;21:138–55. https://doi.org/10.1007/s10985-013-9282-4.Search in Google Scholar PubMed
19. Wen, C, Chen, Y. Nonparametric maximum likelihood analysis of clustered current status data with the gamma-frailty Cox model. Comput Stat Data Anal 2011;55:1053–60. https://doi.org/10.1016/j.csda.2010.08.013.Search in Google Scholar
20. Su, Y, Wang, J. Semiparametric efficient estimation for shared-frailty models with doubly censored clustered data. Ann Stat 2016;44:1298–331. https://doi.org/10.1214/15-aos1406.Search in Google Scholar PubMed PubMed Central
21. Li, S, Hu, T, Zhao, S, Sun, J. Regression analysis of multivariate current status data with semiparametric transformation frailty models. Stat Sin 2020;30:1117–34. https://doi.org/10.5705/ss.202017.0156.Search in Google Scholar
22. Lou, Y, Wang, P, Sun, J. A semi-parametric weighted likelihood approach for regression analysis of bivariate interval-censored outcomes from case-cohort studies. Lifetime Data Anal 2023;29:628–53. https://doi.org/10.1007/s10985-023-09593-9.Search in Google Scholar PubMed
23. Chen, K, Jin, Z, Ying, Z. Semiparametric analysis of transformation models with censored data. Biometrika 2002;89:659–68. https://doi.org/10.1093/biomet/89.3.659.Search in Google Scholar
24. Zeng, D, Mao, L, Lin, DY. Maximum likelihood estimation for semiparametric transformation models with interval-censored data. Biometrika 2016;103:253–71. https://doi.org/10.1093/biomet/asw013.Search in Google Scholar PubMed PubMed Central
25. Williamson, JM, Datta, S, Satten, GA. Marginal analyses of clustered data when cluster size Is informative. Biometrics 2003;59:36–42. https://doi.org/10.1111/1541-0420.00005.Search in Google Scholar PubMed
26. Nelson, KP, Lipsitz, SR, Fitzmaurice, GM, Ibrahim, J, Parzen, M, Strawderman, R. Use of the probability integral transformation to fit nonlinear mixed-effects models with nonnormal random effects. J Comput Graph Stat 2006;15:39–57. https://doi.org/10.1198/106186006x96854.Search in Google Scholar
27. National Toxicology Program. Toxicology and carcinogenesis studies of chloroprene (case no. 126-99-8) in F344/N rats and B6C3F1 mice (inhalation studies) Technical Report 467. Bethesda, Maryland: U.S. Department of Health and Human Services, Public Health Service, National Institutes of Health; 1998.Search in Google Scholar
28. Wang, L, Sun, J, Tong, X. Efficient estimation for the proportional hazards model with bivariate current status data. Lifetime Data Anal 2008;14:134–53. https://doi.org/10.1007/s10985-007-9058-9.Search in Google Scholar PubMed PubMed Central
29. van der Vaart, AW, Wellner, JA. Weak convergence and empirical processes. New York: Springer; 1996.10.1007/978-1-4757-2545-2Search in Google Scholar
30. Elbers, C, Ridder, G. True and spurious duration dependence: the identifiability of the proportional hazard model. Rev Econ Stud 1982;49:403–9. https://doi.org/10.2307/2297364.Search in Google Scholar
31. Murphy, SA. Asymptotic theory for the frailty model. Ann Stat 1995;23:182–98. https://doi.org/10.1214/aos/1176324462.Search in Google Scholar
32. Bickel, PJ, Klaassen, CAJ, Ritov, Y, Wellner, JA. Efficient and adaptive estimation for semiparametric models. Baltimore, MD: Johns Hopkins University Press; 1993.Search in Google Scholar
33. Rudin, W. Functional analysis. New York, NY: McGraw-Hill; 1973.Search in Google Scholar
© 2025 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Research Articles
- Prognostic adjustment with efficient estimators to unbiasedly leverage historical data in randomized trials
- Homogeneity test and sample size of response rates for AC 1 in a stratified evaluation design
- A review of survival stacking: a method to cast survival regression analysis as a classification problem
- DsubCox: a fast subsampling algorithm for Cox model with distributed and massive survival data
- A hybrid hazard-based model using two-piece distributions
- Regression analysis of clustered current status data with informative cluster size under a transformed survival model
- Bayesian covariance regression in functional data analysis with applications to functional brain imaging
- Risk estimation and boundary detection in Bayesian disease mapping
- An improved estimator of the logarithmic odds ratio for small sample sizes using a Bayesian approach
- Short Communication
- A multivariate Bayesian learning approach for improved detection of doping in athletes using urinary steroid profiles
- Research Articles
- Guidance on individualized treatment rule estimation in high dimensions
- Weighted Euclidean balancing for a matrix exposure in estimating causal effect
- Penalized regression splines in Mixture Density Networks
Articles in the same Issue
- Frontmatter
- Research Articles
- Prognostic adjustment with efficient estimators to unbiasedly leverage historical data in randomized trials
- Homogeneity test and sample size of response rates for AC 1 in a stratified evaluation design
- A review of survival stacking: a method to cast survival regression analysis as a classification problem
- DsubCox: a fast subsampling algorithm for Cox model with distributed and massive survival data
- A hybrid hazard-based model using two-piece distributions
- Regression analysis of clustered current status data with informative cluster size under a transformed survival model
- Bayesian covariance regression in functional data analysis with applications to functional brain imaging
- Risk estimation and boundary detection in Bayesian disease mapping
- An improved estimator of the logarithmic odds ratio for small sample sizes using a Bayesian approach
- Short Communication
- A multivariate Bayesian learning approach for improved detection of doping in athletes using urinary steroid profiles
- Research Articles
- Guidance on individualized treatment rule estimation in high dimensions
- Weighted Euclidean balancing for a matrix exposure in estimating causal effect
- Penalized regression splines in Mixture Density Networks