Startseite Semiparametric Regression Estimation for Recurrent Event Data with Errors in Covariates under Informative Censoring
Artikel Öffentlich zugänglich

Semiparametric Regression Estimation for Recurrent Event Data with Errors in Covariates under Informative Censoring

  • Hsiang Yu , Yu-Jen Cheng EMAIL logo und Ching-Yun Wang
Veröffentlicht/Copyright: 9. August 2016
Veröffentlichen auch Sie bei De Gruyter Brill

Abstract

Recurrent event data arise frequently in many longitudinal follow-up studies. Hence, evaluating covariate effects on the rates of occurrence of such events is commonly of interest. Examples include repeated hospitalizations, recurrent infections of HIV, and tumor recurrences. In this article, we consider semiparametric regression methods for the occurrence rate function of recurrent events when the covariates may be measured with errors. In contrast to the existing works, in our case the conventional assumption of independent censoring is violated since the recurrent event process is interrupted by some correlated events, which is called informative drop-out. Further, some covariates may be measured with errors. To accommodate for both informative censoring and measurement error, the occurrence of recurrent events is modelled through an unspecified frailty distribution and accompanied with a classical measurement error model. We propose two corrected approaches based on different ideas, and we show that they are numerically identical when estimating the regression parameters. The asymptotic properties of the proposed estimators are established, and the finite sample performance is examined via simulations. The proposed methods are applied to the Nutritional Prevention of Cancer trial for assessing the effect of the plasma selenium treatment on the recurrence of squamous cell carcinoma.

1 Introduction

In many longitudinal follow-up studies, recurrent event data are collected when subjects experience an event multiple times. For example, patients with superficial bladder cancer may experience tumor recurrence many times; patients with cystic fibrosis may experience repeated lung exacerbations; patients with chronic granulomatous disease may experience repeated pyogenic infections [1, 2]. Models for recurrent event data can be categorized into two different classes: time-to-event or gap time models. In time-to-event models, interest focuses on the occurrence rate of an event over time [35]. In gap time models, interest lies in the gap time between two consecutive events [6].

In this study, we focus on the time-to-event models. The time-to-event models may be constructed on the basis of an intensity function [7] or a rate function [3, 8]. The intensity function uniquely determines the probability structure of the recurrent event process. However, it needs to specify the occurrence of an event given the prior event history correctly. On the other hand, the rate function allows for arbitrary dependence among the recurrent events and provides a direct interpretation on the occurrence rate without conditioning on the prior event history. Our primary focus is to assess the average effects of treatments or risk factors, that is, we are mainly interested in the inference of the rate function. Lawless and Nadeau [9] estimated the cumulative rate function nonparametrically, and applied their approach to industrial warranty data. In addition, Hu and Lagakos [8] proposed a nonparametric method to study the rate function of viral load changing process for HIV infected patients. Nevertheless, all of the above approaches need to assume non-informative censoring or the observation mechanism is independent of the recurrent process. In practice, the assumption is usually violated; for example, when the recurrent event process is interrupted by some terminal events that are related to the recurrent events. A potential remedy is to consider a frailty model which allows dependence between the recurrent event process and the informative drop-out through a non-negative frailty variable. In general, the distribution of the frailty variable is assumed to be known [10] and thus the likelihood-based approach [11] is preferred. More recently, Kalbfleisch et al. [12] proposed a weighted estimating equation approach with the weight specified by a gamma frailty distribution. However, in general it is not easy to verify the frailty distribution due to invisibility of the frailty variables. To avoid specification of the frailty distribution, Wang et al. [13] and Wang and Huang [14] considered a conditional likelihood approach, where the unobserved frailty variables are “conditioned away” in their proposed estimating equations.

The aforementioned approaches, nevertheless, require that the covariates are correctly measured. In many epidemiologic or medical studies, the covariates may suffer from measurement errors. For example, baseline plasma selenium level is an important predictor for the occurrence of skin cancers in the Nutritional Prevention of Cancer (NPC) trial study [15]. However, the true value of plasma selenium level can never be measured because of intrinsic biological variability or limited instrumental precision. Instead, the values we observed are contaminated with measurement errors. The most convenient approach is to treat the observed covariates as the true covariates in the regular estimating procedure, which is also referred to as the naive approach. However, the naive estimator obtained from this approach is generally known to be inconsistent ([16], Chapter 3). In survival and longitudinal data analysis, intensive research has been done to deal with measurement error problems. For Cox regression, Prentice [17] proposed likelihood approaches with normal measurement error and rare disease assumptions. Wang et al. [18] applied regression calibration to the partial score function, and investigated the performance of the regression calibration estimator through simulation studies; whereas, Nakamura [19] constructed unbiased estimating equations based on the concept of corrected scores. For nonlinear mixed models, Wu [20], Liu and Wu [21] and Wu et al. [22] proposed estimating approaches for longitudinal response data when the covariates are measured with errors, which can also handle censoring in the response and missing data. In recurrent event analysis, nonetheless, little has been addressed for measurement error problems. Under a normal measurement error assumption, Jiang et al. [23] proposed a moment corrected method to adjust for the bias of a naive estimator under a semi-parametric model. However, their approach not only requires the assumption of non-informative censoring but also assumes that the censoring distribution is independent of the covariates.

The present study is motivated by the NPC trial study, which aimed to assess the efficacy of oral supplement of plasma selenium in preventing the development of skin cancers such as squamous cell carcinoma (SCC). This clinical trial began in 1983 and had included approximately 1,300 patients with dermatologic cancer histories. Nearly half of the patients in the NPC trial were randomly assigned to the placebo or treatment group respectively. Patients in the treatment arm were supposed to take 200μg of plasma selenium supplement per day. In the study period, the patients in the trial might experience SCC events repeatedly. Each incidence of a new SCC was diagnosed and recorded by certified doctors. The medical records were reviewed by the clinical coordinators at the semi-annual visit, the annual contact or by self-report to ensure the completeness of the data. At the time of randomization, many prognostic risk factors of SCC were recorded including the baseline plasma selenium level. As we mentioned, the plasma selenium level may be measured with errors. In the original study, Clark et al. [15] did not take measurement error into account and found a nonsignificant negative plasma selenium effect on developing SCC. The result contradicted the evidence of the previous studies which showed high correlation between plasma selenium level and several kinds of cancer. Later, many studies focused on the effect of plasma selenium level on the recurrences of SCC by assuming an independent censoring assumption, some of which also took measurement error into account [23]. However, we found a significant negative relationship between the censoring time and the SCC occurrence rate. This implies that the independent censoring assumption is not satisfied. Therefore, the existing methods are not appropriate for the NPC trial data.

This paper is organized as follows. In Section 2, statistical models for recurrent events and measurement errors are given. In Section 3, we propose a regression calibration method and a moment corrected method to correct the measurement errors in the presence of informative censoring. The simulation results are given in Section 4 to investigate the finite sample performance of the proposed methods. Then, we applied the proposed methods to the NPC trial data to evaluate the effect of selenium on the recurrence of SCC in Section 5, and concluded with a discussion in Section 6. The regularity conditions and technical proofs are provided in the Appendix and the Supplementary Information.

2 Model illustration

2.1 Recurrent event model

Assume that there are n independent individuals in the cohort. Let subscript i be the index for a subject, i=1,,n. For the ith subject, let Ni(t) denote the number of recurrent events occurring up to t within a fixed time period [0,τ], where the recurrent event process could be observed beyond τ. Let Zi be a q×1 vector of covariates that is precisely measured and Xi be a p×1 vector of covariates that can be measured with errors. Let E denote expectation over the samples, νi be the unobserved frailty variable with mean E(νi|Xi,Zi)=μν which does not depend on (Xi,Zi), and Ci be the informative censoring time, i=1,,n. Suppose that conditional on (νi,Xi,Zi), Ni(t) follows a Poisson process with a multiplicative intensity function

(1)λ(t|νi,Xi,Zi)=νiλ0(t)eβXXi+βZZi

where λ0(t) is a baseline function and (βX,βZ) is a vector of regression parameters. Note that when ν is given, model (1) is also a rate function due to the assumption of the Poisson process. In general, regression parameters can be estimated by either a likelihood-based approach or by solving a set of unbiased estimating equations. If the distribution of the frailty variable, ν, is assumed and the true covariates can be observed, then the standard procedure of the likelihood-based approaches can be conducted by integrating ν out ([24], Chapter 3). There are several popular choices for the frailty distribution such as gamma, log-normal, and positive stable distribution. Balakrishnan and Peng [25] advocated using the generalized gamma distribution as the frailty distribution since it includes many distributions (e. g., Weibull, log-normal, gamma, positive stable distribution) as special cases. Recently, Mazroui et al. [26] and Zeng et al. [27] proposed a joint frailty model with two independent frailty variables to distinguish the dependence within the recurrent events and the association between the recurrent event process and terminal events. However, the determination of the frailty distribution usually depends on computational convenience instead of biological reasons or data characteristics. Further, Balakrishnan and Peng [25] pointed out that an inappropriate frailty distribution may result in large bias in the estimation.

Alternatively, we can construct a set of unbiased estimating equations based on the cumulative rate function. According to model (1), the cumulative rate function up to time t is

(2)E(Ni(t)|Xi,Zi)=E(E(Ni(t)|νi,Xi,Zi)|Xi,Zi)=Λ0(t)eα0+βXXi+βZZi,t[0,τ]

where Λ0(t)=0tλ0(u)du and α0=log(μν). It should be noted that an advantage of using estimating equations over a likelihood-based approach is to avoid misspecification of the frailty distribution. However, to solve estimating equations based on eq. (2), Λ0(t) needs to be known and the true covariates need to be observed. Both deficiencies motivate us to consider the recurrent event process with an unspecified distribution of the frailty variable and an unknown Λ0(t) in this article.

2.2 Measurement error model

For subject i, let Wij be the jth replicated surrogate measurement of the true covariate vector Xi, and ki be the number of the replicates of Wi. Assume that the surrogate measurement satisfies the classical measurement error model,

Wij=Xi+Uij,i=1,,n,j=1,,ki,

where Uij are random errors. Suppose that Uij are independent of (νi,Xi,Zi) and Ci, which implies that the measurement errors are non-differential. In other words, Wi provides no additional information about the event process when the true covariate Xi is given ([16], Chapter 2). Let μs and Σs be the mean and covariance matrix of a random vector s, Σsh be the covariance matrix of two random vectors (s,h), and γ=(μX,μZ,ΣU,ΣX,ΣZ,ΣXZ) be the parameter of the distribution of X given (W,Z). We assume that X given (W,Z) follows a multivariate normal distribution with mean

E(X|W,Z,γ)=μX+ΣXΣXZΣX+ΣU/kΣXZΣZXΣZ1WμXZμZ

and variance

Σ(γ)=ΣXΣXΣXZΣX+ΣU/kΣXZΣZXΣZ1ΣXΣZX.

As in Carroll et al. ([16], Chapter 4), the formula given above is the best linear approximation of E(X|W,Z,γ), and it can also be applied when Z is discrete.

3 Correction for errors-in-variable

Assume the observed data {(Ci,(Ti1,,Timi),{Wi1,,Wiki},Zi),i=1,,n} are independent and identically distributed (iid), where Tij denotes the observed event times for j=1,,mi, and mi denotes the number of recurrent events occurred before Ci for subject i. As we mentioned in Section 2.1, C is conditionally independent of the recurrent event process N(t) given (ν,X,Z). Then, by eq. (2) we have

E(N(C)Λ01(C)|X,Z)=E(E(N(C)Λ01(C)|ν,C,X,Z)|X,Z)=eα0+βXX+βZZ.

If Λ0(t) and X are known, the estimating equations i=1n(1,Xi,Zi){miΛ01(Ci)ea0+βXXi+βZZi}=0 for (βX,βZ) are unbiased. In practice, they cannot be implemented since Xi is unobserved and Λ0(t) is unknown. To deal with the unknown function Λ0(t), we start with the conditional likelihood function of (Ti1,,Timi) given (Ci,νi,mi,Xi,Zi). Under the Poisson process assumption, such a conditional likelihood can be constructed from a set of iid random variables with truncated density j=1miλ0(Tij)/Λ0(Ci)I(0TijCi). Define a rescaled baseline function ϕ(t)λ0(t)/Λ0(τ) and Φ(t)=0tϕ(u)du=Λ0(t)/Λ0(τ) for t[0,τ], where Φ(τ)=1. The conditional likelihood is given by i=1nP(Ti1,,Timi|Ci,νi,mi,Xi,Zi) which is proportional to i=1nj=1miϕ(Tij)/Φ(Ci). As pointed out by Wang et al. [13], the conditional likelihood shares the same form as the nonparametric likelihood for right-truncated data. Thus, Φ(t) can be consistently estimated by the product limit estimator

Φˆ(t)=T(l)>t1n(l)N(l),

where {T(l)} are the ordered and distinct values of {Tij}i=1,,n;j=1,,mi, n(l) is the number of events occurred at T(l), and N(l) is the number of events which satisfy TijT(l)Ci. Note that the non-parametric estimation of Φ does not require any information from the covariates and the unobserved frailty variable. Hence, Φˆ(t) is a consistent estimator even if X is measured with errors or the frailty distribution is unspecified.

For the issue of identifiability, let μν=1 without loss of generality. The expectation of the event number divided by the rescaled baseline function before time C is

E(N(C)Φ1(C)|X,Z)=E(E(N(C)Φ1(C)|C,ν,X,Z)|X,Z)=eβ0+βXX+βZZ,

where β0=log(Λ0(τ)). With the above equation, we can construct the unbiased estimating equations by Φ(t) instead of the unknown Λ0(t). After replacing the unknown X with the average of the replicates Wi=j=1kiWij/ki, we can obtain the naive estimating equations

(3)UN(b)=n1i=1n1WiZimiΦˆ1(Ci)eb0+bXWi+bZZi=0.

Then, the naive estimator βˆN=(βˆN,0,βˆN,X,βˆN,Z) is obtained by solving eq. (3) and Λ0(t) can be estimated by Λˆ0N(t)=Φˆ(t)exp(βˆN,0). Due to the measurement error in W, it can be shown that βˆN does not converge to the true parameter β, where β=(β0,βX,βZ). Based on eq. (3), we develop a regression calibration method and a moment corrected method to adjust for covariate measurement errors in the following subsections.

3.1 Regression calibration approach

The regression calibration (RC) method is based on the assumption that the induced model of the response conditioning on (W,Z) can be well approximated by the underlying model with X being replaced by the conditional mean E(X|W,Z). The RC estimator is obtained by treating E(X|W,Z) as the true covariate X in the standard estimating procedure ([16], Chapter 4). Although the RC method generally yields to an inconsistent estimator in non-linear models, it is still valuable with the advantage of computational efficiency and limited bias under some conditions [16, 17].

Under our framework, the RC method substitutes W with E(X|W,Z,γ) in eq. (3). If the measurement error covariance matrix ΣU is known, we can estimate the other components of γ by using the observed data without replicates. If not, replicated data is needed to estimate ΣU [16, 18, 28]. By the method of moments, the estimator γˆ of γ can be obtained by solving the equations n1i=1nΨi(γ)=0 where Ψi(γ) is given in Appendix A. Then, the RC estimator βˆR=(βˆR,0,βˆR,X,βˆR,Z) can be obtained by solving the equations

(4)UR(b)=n1i=1n1E(Xi|Wi,Zi,γˆ)ZimiΦˆ1(Ci)eb0+bXE(Xi|Wi,Zi,γˆ)+bZZi=0.

Coincidently, the conditional expectation of mΦ1(C) given the observed covariate (W,Z) is exp(β0+βXΣ(γ)βX/2+βXE(X|W,Z,γ)+βZZ). Thus, the RC estimator βˆR converges to a limit βR=(β0+βXΣ(γ)βX/2,βX,βZ). The result implies that the RC estimator is consistent for the regression coefficients but not for the intercept.

Note that βˆR,0 converges to β0+βXΣ(γ)βX/2. Let Σˆ be the estimator of Σ(γ) which is calculated as ΣˆX(ΣˆXΣˆXZΣˆZ1ΣˆZX)(ΣˆWΣˆXZΣˆZ1ΣˆZX)1(ΣˆXΣˆXZΣˆZ1ΣˆZX)ΣˆXZΣˆZ1ΣˆZX where ΣˆW=ΣˆX+ΣˆUi=1n(nki)1. The RC estimator of Λ0(t) can be adjusted as Λˆ0R(t)=Φˆ(t)exp(βˆR,0βˆR,XΣˆβˆR,X/2) which converges to Λ0(t). In the Supplementary Information, we show that n(βˆRβR) is asymptotically normally distributed with mean zero and variance A1Σg{A1}, where A and Σg are defined in Proposition 1 in Appendix A. The covariance matrix estimation of the RC estimator is also given in Appendix B.

3.2 Moment corrected approach

The moment corrected (MC) method is motivated by the bias-correction method proposed by Stefanski [29]. Under the classical measurement error model, Stefanski [29] showed that the naive estimator converges to a limit which is a function of the true parameter and the error variance. Accordingly, the bias of the naive estimator can be corrected based on the relationship between the limit of the naive estimator and the true parameter.

Based on this idea, we can show that the naive estimator βˆN converges to a limit βN=(βN,0,βN,X,βN,Z) which satisfies

(5)E{UN(βN)|W,Z}=E1WZE(mΦ1(C)|W,Z)eβN,0+βN,XW+βN,ZZ=0.

In the Supplementary Information, we have shown that the root of eq. (5) is unique. As described in Section 2.2, we assume that X given (W,Z) follows a multivariate normal distribution. For the convenience of derivation, we re-parametrize the conditional mean as E(X|W,Z,γ)=η0+ηWW+ηZZ, where Ip denotes an identity matrix of size p, η0=(IpηW)μXηZμZ, ηW=(ΣXΣXZΣZ1ΣZX)(ΣWΣXZΣZ1ΣZX)1, and ηZ={Ip(ΣXΣXZΣZ1ΣZX)(ΣWΣXZΣZ1ΣZX)1}ΣXZΣZ1. By the non-differential error assumption, it follows that E(mΦ1(C)|W,Z)=E(E(mΦ1(C)|X,Z)|W,Z)=exp(β0+βXE(X|W,Z,γ)+βXΣ(γ)βX/2+βZZ). Thus, we can easily show that the unique root βN of eq. (5) is related to the true parameter β as βN,0=β0+βXη0+βXΣ(γ)βX/2, βN,X=ηWβX and βN,Z=βZ+ηZβX. Specifically, βN=D(β,η) is a one to one function of the true parameter β=(β0,βX,βZ) when the nuisance parameter η is given. Therefore, substituting the estimators of bN and η in the inverse function D1 results in the moment corrected estimator

βˆM=D1(βˆN,ηˆ)=βˆN,0βˆN,XηˆW1ηˆ0βˆN,XηˆW1Σˆ{ηˆW}1βˆN,X/2{ηˆW}1βˆN,XβˆN,ZηˆZ{ηˆW}1βˆN,X,

where βˆM=(βˆM,0,βˆM,X,βˆM,Z) and ηˆ0=(IpηˆW)μˆXηˆZμˆZ, ηˆW=(ΣˆXΣˆXZΣˆZ1ΣˆZX)(ΣˆWΣˆXZΣˆZ1ΣˆZX)1, ηˆZ={Ip(ΣˆXΣˆXZΣˆZ1ΣˆZX)(ΣˆWΣˆXZΣˆZ1ΣˆZX)1}ΣˆXZΣˆZ1. Since βˆM,0 is consistent for the true intercept β0, Λ0(t) can also be consistently estimated by Λˆ0M(t)=Φˆ(t)exp(βˆM,0). In summary, the estimating procedure of the MC method is

  1. Solve eq. (3) and i=1nΨi(γ)=0 illustrated in Appendix A to obtain the naive estimator βˆN and γˆ.

  2. Apply βˆN and ηˆ=η(γˆ) to the function D1 to obtain the MC estimator βˆM=D1(βˆN,ηˆ).

In the Supplementary Information, we show that n(βˆMβ) is asymptotically normally distributed with mean zero and covariance matrix B1Σh{B1} where B and Σh are defined in Proposition 2 in Appendix A. The covariate matrix estimation of the MC estimator is also illustrated in Appendix C.

An important feature of the MC estimator is that it is numerically identical to the RC estimator for the regression parameter (βX,βZ) but not for the intercept β0. That is, the estimating equations for the two estimators will have exactly the same root for the regression parameters. The proof of βˆM,X=βˆR,X and βˆM,Z=βˆR,Z is provided in Appendix D.

4 Simulation study

In this section, we evaluate the performance of the RC and MC methods with the naive approach under the semi-parametric model via the simulation studies. Additionally, the corrected partial likelihood (CPL) approach proposed by Jiang et al. [23] is also listed for comparison. The CPL estimator takes measurement error into account but assumes non-informative and covariate-independent censoring.

We consider a regression model with a continuous covariate X and a discrete covariate Z. Let XN(0,σX2=1/3) be the error-prone covariate which is unobserved, while ZBin(0.5) be a random treatment assignment and is precisely obtained. For subject i, we generate ki repeated surrogates Wij=Xi+Uij for Xi where ki is generated from a discrete uniform distribution ranging from 1 to 4 and UijN(0,σU2). With the repeated surrogates, we estimate the nuisance parameter γ by solving i=1nΨi(γ)/n=0, where Ψ is shown in the Appendix. We conduct the simulations with reliability ratio (RR) σX2/(σX2+σU2)=0.8 and 0.5. The reliability ratio is used to represent the magnitude of the error contamination, and lower reliability ratio indicates higher error contamination. We generate νi from a mixture model of which νi follows a uniform distribution ranging from 0.5 to 1.5 when Zi=0, and follows a uniform distribution ranging from 1.5 to 4 otherwise. Then, the frailty variable is νi=exp(Zlog(2.75))νi. When (νi,Xi,Zi) is given, the recurrent event process {Ni(t)} is generated with intensity function λ(t|νi,Xi,Zi)=νiλ0(t)exp(βXXi+βZZi) in which λ0(t)=(t6)3/360+0.6,t[0,τ], τ=10 for i=1,,n. We consider two distinct coefficient parameters (βX,βZ)=(log(1.5),log(1.5)) and (βX,βZ)=(log(3),log(1.5)). To show the robustness of the proposed estimators, the first two scenarios are conducted under different censoring time settings. In Scenario 1, we let the censoring time C depend on W. When Wi1>0, Ci is generated from an exponential distribution with mean 10νi1 and is truncated after τ=10; otherwise, Ci is generated from an exponential distribution with mean 0.5νi1 and is truncated after τ=10. In Scenario 2, we let the censoring time C depend on X. We generate Ci from the mixed exponential distribution in the same way as in Scenario 1 with Wi replaced by Xi. In addition, we conduct two cases to investigate the sensitivity of the conditional normal assumption imposed on the covariate X. In Scenario 3, X is uniformly distributed over the interval 3σX2,3σX2 and Z is allowed to be correlated with X. Let Z=X+ε where εN(0,σX2), and Z=1 if Z0 and Z=0 otherwise. The other variables are generated the same as those in Scenario 2. Further, a non-normal measurement error case is considered in Scenario 4. We generate measurement error U from a skew normal distribution with mean 0, variance σU2 and skewness parameter α=2, and X from N(0,σX2=1/3). The remaining variables are generated the same as those in Scenario 3. A total of 200 replicates with sample sizes n=300 and n=600 are generated in each simulation configuration. In the tables, BIAS denotes the average bias, ASE denotes the average standard error estimation, ESD denotes the empirical sample standard deviation, and CP and CL denote the coverage probability and average interval length of the 95 % confidence interval based on 200 runs. The standard errors of the proposed estimators are obtained by taking the square roots of the diagonal elements from the sandwich variance estimators given in Appendices B and C.

The results of Scenarios 1 to 4 are demonstrated in Tables 1 to 4. In general, the naive estimator for the error-prone covariate X has large biases and disastrous coverage probabilities as shown in all tables. This phenomenon is due to the common attenuation effect. The degree of bias becomes critical when the error-prone covariate effect is large and the reliability ratio is low. In Scenarios 1 and 2, the naive estimation of the effect of Z is not affected by the measurement errors since X and Z are generated to be mutually independent. In Scenario 3 in which X and Z are correlated, the naive estimator for βZ also has low coverage probabilities which is shown in Tables 3 and 4. Further, the numerical equivalence of the RC and MC estimators is also seen in the simulation results.

Table 1:

Censoring time depends on W; X follows a normal distribution, and X and Z are independent.

n=300n=600
NaiveRCMCCPL(NaiveRCMCCPL
(βX,βZ)=(log(15),log(15)); RR = 0.8
βXBIAS×103–83–1–124–71131339
ASE×1031371721721279612012087
ESD×1031331671671209411711786
CP0.930.970.970.960.910.940.940.91
CL×103537675676497375470470343
βZBIAS×103–2–2–213–4–4–45
ASE×10316416416410611411411474
ESD×10315715715710212012012075
CP0.970.960.960.960.960.960.960.95
CL×103643644644416446447447292
(βX,βZ)=(log(15),log(15)); RR = 0.5
βXBIAS×103–216–20–20–89–198111149
ASE×10310220920915078156156117
ESD×10310321121113969142142115
CP0.430.920.920.890.230.960.960.925
CL×103424868869650304612613460
βZBIAS×1035771478812
ASE×10316216316410811711711777
ESD×10316816916910711511711779
CP0.940.940.940.940.960.960.960.93
CL×103650655655427458460460300
(βX,βZ)=(log(3),log(15)); RR = 0.8
βXBIAS×103–241–24–243–2147723
ASE×10313016316313692115115100
ESD×1031291611611399411911999
CP0.530.950.950.940.340.960.960.96
CL×103508639639532360451451392
βZBIAS×103222525108992
ASE×10314614614611010310310379
ESD×1031471481481169910210279
CP0.950.950.950.940.950.950.950.94
CL×103573574574433402403403311
(βX,βZ)=(log(3),log(15)); RR = 0.5
βXBIAS×103–539343467–551–4–412
ASE×10310622122222675154155158
ESD×10311122822823476146146151
CP0.000.940.940.950.000.960.960.96
CL×103417867868887295606606619
βZBIAS×10378815–7–8–82
ASE×10315515915913011011211295
ESD×10315716316313811912112192
CP0.950.940.940.940.940.940.940.95
CL×103607622623511430438439371

Note: BIAS denotes the average of βˆβ from 200 samplings, ASE denotes the average standard error from 200 samplings, ESD denotes the empirical standard deviation from 200 samplings, CP denotes the coverage probability of Wald 95 % confidence interval, CL denotes the average length of Wald 95 % confidence interval from 200 samplings.

Table 2:

Censoring time depends on X; X follows a normal distribution, and X and Z are independent.

n=300n=600
NaiveRCMCCPLNaiveRCMCCPL
(βX,βZ)=(log(15),log(15)); RR = 0.8
βXBIAS×103–8023–14–102–25–20–30
ASE×1031331671671189612012084
ESD×1031361711711319311811787
CP0.930.940.950.920.730.970.970.90
CL×103523655653462375470471331
βZBIAS×10316151513–20–19–21–6
ASE×10316116116110511511511575
ESD×10316116216111810910811074
CP0.950.950.950.890.970.970.970.97
CL×103632632632412451451451294
(βX,βZ)=(log(15),log(15)); RR = 0.5
βXBIAS×103–216–20–20–89–198111149
ASE×10310220920915078156156117
ESD×10310321121113969142142115
CP0.430.920.920.890.230.960.960.925
CL×103401821821587289586586409
βZBIAS×1035771478812
ASE×10316216316410811711711777
ESD×10316816916910711511711779
CP0.940.940.940.940.960.960.960.93
CL×103523655653462375470471331
(βX,βZ)=(log(3),log(15)); RR = 0.8
βXBIAS×103–225–8–8–91–21644–95
ASE×1031261591591338811111194
ESD×1031241571571418410610689
CP0.580.960.960.870.330.950.950.83
CL×103496622622520346434434368
βZBIAS×10334473334
ASE×10314314414410910110210278
ESD×10314714614610998979782
CP0.950.940.940.940.980.980.980.93
CL×103562563563429398398398305
(βX,βZ)=(log(3),log(15)); RR = 0.5
βXBIAS×103–558–6–6–238–552–1–1–229
ASE×1039920720819570144144139
ESD×10310020220218668147147133
CP0.000.970.970.740.000.950.950.59
CL×103389812814763273565565546
βZBIAS×10366611–10–12–121
ASE×10315115415412510610810888
ESD×10315415815811511011511595
CP0.930.950.950.960.970.940.940.92
CL×103591603604489416424424345

Note: BIAS denotes the average of βˆβ from 200 samplings, ASE denotes the average standard error from 200 samplings, ESD denotes the empirical standard deviation from 200 samplings, CP denotes the coverage probability of Wald 95 % confidence interval, CL denotes the average length of Wald 95 % confidence interval from 200 samplings.

Table 3:

Censoring time depends on X; X follows a uniform distribution, and X and Z are correlated.

n=300n=600
NaiveRCMCCPLNaiveRCMCCPL
(βX,βZ)=(log(15),log(15)); RR = 0.8
βXBIAS×103–11466–71–119–2–2–78
ASE×10315121321313210815215293
ESD×10316323023014210414714785
CP0.880.940.940.880.830.950.950.91
CL×103591834834519422595595363
βZBIAS×1039812128181–3–388
ASE×10320022322313814115715794
ESD×10321623923913813915715794
CP0.920.930.930.920.910.960.960.84
CL×103785875875541552616616370
(βX,βZ)=(log(15),log(15)); RR = 0.5
βXBIAS×103–25322–170–2442424–152
ASE×10310929729815277207207107
ESD×10312534034015374207207100
CP0.370.930.930.760.100.950.950.69
CL×10342611661168595302811811419
βZBIAS×103178–7–7178180–13–13153
ASE×10319527527513413719019095
ESD×10320830330312213418618691
CP0.850.920.920.750.760.940.940.59
CL×10376610771077524537744744372
(βX,βZ)=(log(3),log(15)); RR = 0.8
βXBIAS×103–351–42–42–306–364–66–66–324
ASE×10313819719714198139139100
ESD×1031411991991409613313387
CP0.250.930.930.420.050.950.950.08
CL×103542771772552385544545391
βZBIAS×10323715152092402626215
ASE×103189209209148134147147101
ESD×103215235235142138149149102
CP0.750.930.930.700.580.940.940.46
CL×103743819819581527577577398
(βX,βZ)=(log(3),log(15)); RR = 0.5
βXBIAS×103–722–89–89–548–715–85–85–554
ASE×1039627327416967187187121
ESD×1038725425415969191191117
CP0.000.940.930.110.000.910.920.01
CL×10337710711073664263732733476
βZBIAS×10347011113494873535368
ASE×103188261261162133180180115
ESD×103190248248138150193193107
CP0.310.950.960.400.050.940.940.10
CL×10373810211021636522707705449

Note: BIAS denotes the average of βˆβ from 200 samplings, ASE denotes the average standard error from 200 samplings, ESD denotes the empirical standard deviation from 200 samplings, CP denotes the coverage probability of Wald 95 % confidence interval, CL denotes the average length of Wald 95 % confidence interval from 200 samplings.

Table 4:

Censoring time depends on X; U follows a skew normal distribution, and X and Z are correlated.

n=300n=600
NaiveRCMCCPLNaiveRCMCCPL
(βX,βZ)=(log(15),log(15)); RR = 0.8
βXBIAS×103–118–10–10–43–991616–44
ASE×10315020720712710414214291
ESD×10315020820811810314214284
CP0.890.950.950.950.840.950.950.94
CL×103589811811499406558559357
βZBIAS×1039726268450–25–2575
ASE×10319821721713013815015089
ESD×10319721721712413314614689
CP0.890.950.950.900.940.940.940.88
CL×103777851851511541588588349
(βX,βZ)=(log(15),log(15)); RR = 0.5
βXBIAS×103–24455–109–2313232–115
ASE×10311128528515577195195110
ESD×10311429629613779202202103
CP0.420.940.940.920.160.930.940.82
CL×10343711151118606302763764432
βZBIAS×1031791616149134–36–36139
ASE×10319125425513013317417488
ESD×10318725225212012817417488
CP0.840.940.940.830.850.950.950.68
CL×103747997998508522681682346
(βX,βZ)=(log(3),log(15)); RR = 0.8
βXBIAS×103–2714040–115–2615151–121
ASE×10313718919014296133133100
ESD×1031441951951449212912993
CP0.500.930.920.870.210.940.940.78
CL×103536742743558377522522394
βZBIAS×103157–46–46148173–29–29166
ASE×10318520120113613114214295
ESD×10317218818812213114414493
CP0.890.960.960.830.740.950.950.59
CL×103725790790534512556556373
(βX,βZ)=(log(3),log(15)); RR = 0.5
βXBIAS×103–6239898–295–6159999–273
ASE×10310929029120279215216148
ESD×10311230530517288220220150
CP0.000.950.950.700.000.930.930.48
CL×10342811391142791309805808582
βZBIAS×103419–45–45342411–56–56335
ASE×103185255255147130179180103
ESD×10318926926914012818518588
CP0.390.930.930.360.110.930.930.08
CL×1037259981000576509703705405

Note: BIAS denotes the average of βˆβ from 200 samplings, ASE denotes the average standard error from 200 samplings, ESD denotes the empirical standard deviation from 200 samplings, CP denotes the coverage probability of Wald 95 % confidence interval, CL denotes the average length of Wald 95 % confidence interval from 200 samplings.

Table 1 demonstrates the results when C depends on W. Comparing to the proposed estimators, we can see that the CPL estimator generally has larger but not significant biases when reliability ratio becomes lower (RR=0.5). However, when C depends on X, the coverage probabilities of the CPL estimator for βX dramatically decline due to the substantial biased problem which is presented in Table 2. The bias problem becomes more serious as the coefficient parameter βX increases or the reliability ratio decreases. Table 3 shows the results when X follows a uniform distribution. We can see that the coverage probabilities of the CPL estimators for (βX,βZ) are both nearly zero in the setting with a large coefficient parameter, a large sample size and a low reliability ratio. In contrast, the proposed methods have good performance with at least 92 % coverage probabilities and limited biases even if the conditional normal assumption on X is violated. In Table 4, it can be seen that the proposed estimators still have good performance in terms of bias and coverage probability compared to the CPL estimator. However, when the sample size increases to n=2000, the coverage probabilities of the 95 % confidence intervals for the proposed estimators may be lower than 90 %.

To summarize, the simulation study reveals that the proposed methods can effectively correct the bias due to measurement errors even when the conditional normal assumption of X is violated. However, the CPL estimator is sensitive to the assumption of the independence between the censoring time and the covariates, and is also sensitive to the distributional assumption imposed on the covariates. The both assumptions may not be verified since X is unobservable. The simulation study also shows that the naive approach which ignores measurement errors in the covariates in general will cause a large bias. We note that the proposed estimators are not consistent in Scenarios 3 and 4 because of a violation of the normal assumption imposed on X given (W,Z). Hence, the corresponding coverage probabilities obtained from the 95 % confidence intervals may be lower than 90 % when the sample size is large (such as n=2000), especially under a skewed measurement error distribution.

5 Data analysis

In this section, we apply the proposed methods to the NPC trial dataset to assess the effect of plasma selenium treatment on SCC recurrences. This randomized, double-blinded clinical trial recruited 1312 patients with histories of skin cancer, including 653 and 659 patients in the treatment and placebo groups respectively, and the study period had lasted up to 12 years.

Many critical risk factors for SCC were recorded at baseline, particularly the plasma selenium level. As we mentioned, the plasma selenium level is measured with error due to the measuring instrument or temporary biological fluctuation. Some patients in the placebo group had more than one plasma selenium measurement that can be treated as replicates. However, the repeat plasma selenium measurements of the treatment group patients can not represent the baseline value. Therefore, the treatment group patients had only one baseline plasma selenium measurement. Multiple occurrences of SCC can be observed for each patient because each new incidence of SCC was diagnosed and recorded during the follow-up time.

In this analysis, we consider two covariates: the baseline plasma selenium measurement, and the treatment assignment indicator. The latter is our primary covariate of interest, whereas the former is an important predictor for adjusting the model but is contaminated with measurement errors. After taking logarithm, the plasma selenium measurement follows a normal distribution (shown in Figure 1). Thus, we let X be the logarithm of the baseline plasma selenium measurement (abbreviated as log(selenium)), and Z be the treatment assignment. We assume that the recurrence of SCC follows a non-homogeneous Poisson process with intensity function λ(t|ν,X,Z)=νλ0(t)exp(βXX+βZZ), where the frailty variable ν accounts for the correlations among the SCC recurrences and between the SCC event process and informative censoring time. Here, X is independent of Z since the NPC trial is a randomized clinical trial. Assume that X given W follows a conditional normal distribution. By using the replicate data, the variance of X given W is estimated by σˆ2=σˆU2σˆX2/σˆW2=0.15620.1332/0.2052=0.1012.

Figure 1: Histograms of estimated true covariate {Xˆi}$\{{\widehat X_i}\} $ and estimated error terms {Uˆi}$\{{\widehat U_i}\} $ by using 292 placebo grouped patients with more than 10 plasma selenium measurements.
Figure 1:

Histograms of estimated true covariate {Xˆi} and estimated error terms {Uˆi} by using 292 placebo grouped patients with more than 10 plasma selenium measurements.

To verify the distributional assumptions imposed on the covariates, a subset consisting of 292 placebo-grouped patients with 10 or more selenium measurements is used. Because the numbers of replicates of these patients are large enough, the average of replicates should be very close to the true value of the plasma selenium level. Thus, we estimate Xi by Xˆi=j=1kiWij/ki and Ui by Uˆi=Wi1Xˆi, in which Xi is the true plasma selenium level of the ith patient in the subset. Figure 1 shows the histograms of Xˆ and Uˆ, which suggest the marginal normal distributions for X and U. The correlation between Xˆ and Uˆ is only –0.069 with P-value=0.234. Under the assumption of normality, the non-significant correlation implies the independence between the two variables. Therefore, the conditional normal assumption of X is appropriate for the NPC dataset.

The patients in the trial were arranged to receive the dermatologic examination periodically. Let Ci denote the last examination time from the randomization for subject i, and τ=149.5(months) denote the maximum time of Ci’s. Fifty five patients without any record of dermatologic examination and SCC event are excluded from the data analysis. The existing recurrent event studies [15, 23] for the NPC data assumed that the censoring is non-informative, which might be improper. Figure 2 shows the weighted average of the SCC recurrences versus time for subjects in the four selected risk sets (t1=54.9, t2=86.3, t3=115.5, t4=135.2). Note that for a subject i the number of SCC recurrences by time t is calculated as Ni(tCi), where ab=min(a,b). If the censoring time is independent of the SCC recurrence, we expect that all lines should be close to each other. However, it can be seen that the subjects stayed in the trial longer (censoring time after 115.5 months and 135.2 months) tended to have fewer SCC recurrences in the early and middle stages. The result implies that the independent censoring assumption is not satisfied and the proposed methods are necessary.

Figure 2: Weighted average of the SCC recurrences versus time (month since randomization) for subjects in the four selected risk sets (t1=54.9${t_1} = 54.9$, t2=86.3${t_2} = 86.3$, t3=115.5${t_3} = 115.5$,t4=135.2${t_4} = 135.2$), where the weighted average of the SCC recurrences for subjects in the r$r$th risk set at time t$t$ is calculated by ∑i=1nNi(t∧Ci)I(Ci>tr)/∑i=1nI(Ci>tr)$\sum\nolimits_{i = 1}^n {N_i}(t \wedge {C_i})I({C_i} \gt {t_r})/\sum\nolimits_{i = 1}^n I({C_i} \gt {t_r})$, 0≤t≤τ=149.5$0 \le t \le \tau = 149.5$ where r=1,2,3,4$r = 1, 2, 3, 4$.
Figure 2:

Weighted average of the SCC recurrences versus time (month since randomization) for subjects in the four selected risk sets (t1=54.9, t2=86.3, t3=115.5,t4=135.2), where the weighted average of the SCC recurrences for subjects in the rth risk set at time t is calculated by i=1nNi(tCi)I(Ci>tr)/i=1nI(Ci>tr), 0tτ=149.5 where r=1,2,3,4.

After excluding 55 patients without any record of examination and SCC event and 2 without baseline plasma selenium measurements, 1,255 patients are included in the analysis to fit the semi-parametric model for the SCC recurrences. Among these patients, 473 had at least one SCC occurrence. The result of the fitted model is presented in Table 5. Since the RC and MC estimates are identical, only the RC estimates are shown in the table. As illustrated, the treatment effect estimates of all approaches are positive but statistically non-significant. That is, the supplement of plasma selenium has no significant effect on preventing the recurrence of SCC. This result agrees with the previous studies [15, 23]. In Table 5, we can also observe the attenuation phenomenon exists in the naive estimation of the plasma selenium effect. Under the 95 % confidence level, the adjusted estimates obtained from the RC and MC methods are significant with values equal to 1.502. The result implies that patients with higher plasma selenium level at baseline have fewer SCC recurrences.

Table 5:

Regression analysis of the SCC recurrences in the NPC trial.

NaiveRCCPL
log (Selenium)EST–0.555–1.502–1.109
SE0.2920.7900.842
Z-value–1.897–1.902–1.317
TreatmentEST0.1850.2230.125
SE0.1400.1410.125
Z-value1.3171.5811.002

Note: EST denotes the estimate, SE denotes the standard error which is estimated by the square root of the asymptotic variance estimator. The MC estimates are identical to the RC estimates.

6 Discussion

To identify the population risk factor in the recurrent event analysis, inference on the rate function is commonly preferred. The existing methods depend on the assumptions of either accurately measured covariates or independent censoring, which may not be always realistic. In this article, we consider statistical methods for recurrent event data with measurement error and informative censoring. Under the informative censoring and normal error assumption, our proposed estimators are consistent. In our estimating procedure, we do not need any additional assumptions on the frailty distribution or on the censoring time. The numerical results have shown that the naive method which ignores measurement errors in the covariates leads to a large biased estimator and that the CPL method strongly depends on the independence between the covariates and censoring time. Whereas, our proposed methods correct measurement errors effectively and give accurate confidence intervals under different scenarios.

The corrected methods considered in this paper are developed under a parametric distribution for the covariates and measurement errors, in which the distributions of the errors and covariates are specified. In the NPC data example, the distributional assumptions for the error model can be validated via adequate replicates. In practice, we may not have enough information to validate these distributional assumptions of the errors and covariates. To relax such assumptions, a non-parametric correction method, similar to Huang and Wang [30] for Cox regression with measurement error, might be further developed. However, the extension of nonparametric correction to the regression analysis of recurrent event data is not straight-forward, and hence future research is warranted. The idea of measurement error correction can be applied not only to recurrent event data but also to panel count data, of which the number of events can only be observed at several random times.

Acknowledgements

We thank the editor and referees for their very helpful comments and suggestions that greatly improved the paper. This research was partially supported by Taiwan Ministry of Science and Technology MOST 104-2118-M-007-002 (Cheng and Yu), National Institutes of Health grants CA53996, ES017030, HL121347, and MH105857 (Wang), and a MOST travel award from the Mathematics Research Promotion Center (Wang).

Appendices

A Asymptotic properties

Let BR, B be any compact neighborhoods of βR and β which are the roots of the limits of RC and MC estimating equations. Also, denote Wi=(1,Wi,Zi) and Xi=(1,E(Xi|Wi,Zi,γ),Zi). To prove the asymptotic properties of the proposed estimators, we impose the following regularity conditions:

  1. Λ0(τ)>0;

  2. Pr(Cτ,ν>0)>0;

  3. G(u)E[νI(Cu)] is a continuous function for u[0,τ];

  4. E{supbBWWexp(D(b,η)W)} and E{supbBRXXexp(bX)} are bounded. Moreover, E{WWexp(D(β,η)W)} and E{XXexp(βRX)} are non-singular.

Note that condition (a4) can be satisfied under the normality assumption imposed on the covariates.

Define Q1(t)G(t)Λ0(t), Q2(t)0tG(u)dΛ0(u). Under conditions (a1) through (a3), Wang et al. [13] had shown that

(6)Φˆ(t)Φ(t)=1ni=1nΦ(t)di(t)+op(n1/2),inf{s:Λ0(s)>0}<tτ,

where di(t)j=1mitτI(TijuCi)/Q12(u)dQ2(u)I(t<Tijτ)/Q1(Tij) are iid terms with zero expectation. By the central limit theorem, n(Φˆ(t)Φ(t)) converges to a multivariate normal distribution with mean zero and variance Φ2(t)E[di2(t)].

By the method of moments, the nuisance parameter estimator γˆ is obtained by solving

n1i=1nΨi(γ)=n1i=1nki(WiμX)ZiμZj=1ki(WijWˉi)(WijWˉi)(ki1)ΣUki(Wiμx)(Wiμx)ΣUkiΣX(ZiμZ)(ZiμZ)ΣZ(WiμX)(ZiμZ)ΣXZ=0,

where Ψi(γ) are iid terms. With the same techniques as these in M-estimators [31], it can be shown that γˆ converges in probability to γ. Let RE{Ψi(γ)/γ} where R is non-singular under condition (a4), and thus by a Taylor expansion,

(7)γˆγ=R1n1i=1nΨi(γ)+op(n1/2).

By the central limit theorem, n(γˆγ) converges to a normal distribution with mean zero and a covariance-matrix R1E{Ψi(γ)Ψi(γ)}{R1}.

With the consistencies of γˆ and Φˆ(t),t[0,τ], we can prove the following propositions of which the proofs are given in the Supplementary Information. Define V as the joint density of (W,Z,m,c), ΠX/γ, and ΓD/γ. Let

gi=XimiΦ(Ci)eβRXiXmdi(C)Φ(C)dV
+mΦ(C)eβRXIleβRXβRXΠdVR1Ψi(γ),

and

hi=Wigg{miΦ(Ci)eD(β,η)Wigg}mWdi(C)Φ(C)dVgg{WWeD(β,η)WdV}ΓR1Ψi.
Proposition 1

Under conditions (a1) through (a4), βˆRconverges in probability toβR. Further, n(βˆRβR)asymptotically follows a normal distribution with mean zero and a covariance matrixA1Σg{A1}whereA=E(gi/βR), Σg=E(gigi).

Proposition 2

Under conditions (a1) through (a4), βˆMconverges in probability toβ. Further, n(βˆMβ)is asymptotically normally distributed with mean zero and a covariance matrixB1Σh{B1}whereB=E(hi/β), Σh=E(hihi).

B Covariance estimation of RC

To develop covariance estimation of the RC estimator, we first illustrate the covariance estimation of n(γˆγ) and n(Φˆ(t)Φ(t)),t[0,τ].

Let Rn=n1i=1nΨi(γ)/γ|γ=γˆ, and Πˆi=Xi/γ|γ=γˆ. The covariance matrix of n(γˆγ) can be estimated by Rn1n1i=1nΨi(γˆ)Ψi(γˆ){Rn1}. Define that Qˆ1(u)=n1i=1nj=1miI(TijuCi), dQˆ2(u)=n1i=1nj=1miI(Tij=u), and

dˆi(t)=j=1migg[T(l)[t,τ]I(TijT(l)Ci)dQˆ2(T(l))Qˆ1(T(l))2I(t<Tijτ)Qˆ1(Tij)gg],

where T(l) are ordered and distinct values of {Tij}i=1,,n;j=1mi. By Wang et al. [13], we can show that the covariance matrix of n(Φˆ(t)Φ(t)) can be consistently estimated by Φˆ2(t)n1i=1ndˆi2(t).

Denote as a Kronecker product, and Ia as an identity matrix with size a. Let Xˆi=(1,E(Xi|Wi,Zi,γˆ),Zi). Finally, the covariance matrix of n(βˆRβR) can be consistently estimated by An1Σˆg{An1} where An=n1i=1nXˆiXˆieβˆRXˆi, and Σˆg=n1i=1ngˆigˆi with

gˆi=Xˆigg{miΦˆ(Ci)eβˆRXˆigg}j=1nXˆjmjdˆi(Cj)Φˆ(Cj)
+j=1nmjΦˆ(Cj)eβˆRXˆjI1+p+qeβˆRXˆjβˆRXˆjΠˆjRn1Ψi(γˆ).

C Covariance estimation of MC

Let Dˆ=D(βˆM,ηˆ), Γˆ=D/γ|γ=γˆ. The covariance matrix of n(βˆMβM) can be consistently estimated by Bn1Σˆh{Bn1} where

Bn=n1i=1nWigg{WiD(β,ηˆ)β|β=βˆMgg}eDˆWi,

and Σˆh=n1i=1nhˆihˆi with

hˆi=Wigg{miΦˆ(Ci)eDˆWigg}j=1nmjWjdˆi(Cj)Φˆ(Cj)gg{j=1nWjWjeDˆWjgg}ΓˆRn1Ψi(γˆ).

D Proof of RC = MC for regression parameters

Recall that E(Xi|Wi,Zi,γ)=η0+ηWWi+ηZZi, where η0,ηW and ηZ are functions of γ. Let 0r×s be a r×s matrix of 0’s. With simple algebra, we can write Xi=HWi,i=1,,n where

H=101×p01×qη0ηWηZ0q×10q×pIq.

Since H remains the same for i=1,,n, for any fixed γ, eq. (4) can be written as

(8)n1i=1nWimiΦˆ1(Ci)e(HβˆR)Wi=0.

Recall that bˆN is the unique root of the equations with form

n1i=1nWimiΦˆ1(Ci)ebWi=0.

It is easy to see that eq. (8) has the same form as the above equation. Thus, we have HβˆR=bˆN. Besides, by definition bˆN=D(bˆM,γ) for any fixed γ. Therefore, we have

βˆR,0+η0βˆR,XηWβˆR,XηZβˆR,X+βˆR,Z=HβˆR=D(bˆM,γ)=βˆM,0+η0βˆM,X+12βˆM,XΣβM,XηWβˆM,XηZβˆM,X+βˆM,Z,

for any fixed γ. The above equation implies that βˆR,0=βˆM,0+βˆM,XΣβˆM,X/2, βˆR,X=βˆM,X, and βˆR,Z=βˆM,Z. Hence the proof is complete.

References

1. Fleming TR, Harrington DP. Counting processes and survival analysis. New York: John Wiley & Sons, 1991.Suche in Google Scholar

2. Morgan WJ, Butler SM, Johnson CA, Colin AA, FitzSimmons SC, Geller DE, et al. Epidemiologic study of cystic fibrosis: design and implementation of a prospective, multicenter, observational study of patients with cystic fibrosis in the US and Canada. Pediatr Pulmonol 1999;28:231–41.10.1002/(SICI)1099-0496(199910)28:4<231::AID-PPUL1>3.0.CO;2-2Suche in Google Scholar

3. Hu XJ, Lagakos SW, Lockhart RA. Generalized least squares estimation of the mean function of a counting process based on panel counts. Stat Sinica 2009;19:561–80.Suche in Google Scholar

4. Hu XJ, Lawless JF. Estimation of rate and mean functions from truncated recurrent event data. J Am Stat Assoc 1996;91:300–10.10.1080/01621459.1996.10476689Suche in Google Scholar

5. Lawless JF, Hu J, Cao J. Methods for the estimation of failure distributions and rates from automobile warranty data. Lifetime Data Anal 1995;1:227–40.10.1007/BF00985758Suche in Google Scholar

6. Lin DY, Sun W, Ying Z. Nonparametric estimation of the gap time distribution for serial events with censored data. Biometrika 1999;86:59–70.10.1093/biomet/86.1.59Suche in Google Scholar

7. Prentice RL, Williams BJ, Peterson AV. On the regression analysis of multivariate failure time data. Biometrika 1981;68:373–9.10.1093/biomet/68.2.373Suche in Google Scholar

8. Hu XJ, Lagakos SW. Nonparametric estimation of the mean function of a stochastic process with missing observations. Lifetime Data Analysis 2007;13:51–73.10.1007/s10985-006-9030-0Suche in Google Scholar

9. Lawless JF, Nadeau C. Some simple robust methods for the analysis of recurrent events. Technometrics 1995;37:158–68.10.1080/00401706.1995.10484300Suche in Google Scholar

10. Lancaster T, Intrator O. Panel data with survival: hospitalization of HIV-positive patients. J Am Stat Assoc 1998;93:46–53.10.1080/01621459.1998.10474086Suche in Google Scholar

11. Nielsen GG, Gill RD, Andersen PK, Sørensen TI. A counting process approach to maximum likelihood estimation in frailty models. Scand J Stat 1992;19:25–43.Suche in Google Scholar

12. Kalbfleisch JD, Schaubel DE, Ye Y, Gong Q. An estimating function approach to the analysis of recurrent and terminal events. Biometrics 2013;69:366–74.10.1111/biom.12025Suche in Google Scholar

13. Wang MC, Qin J, Chiang CT. Analyzing recurrent event data with informative censoring. J Am Stat Assoc 2001;96:1057–65.10.1198/016214501753209031Suche in Google Scholar

14. Wang MC, Huang CY. Statistical inference methods for recurrent event processes with shape and size parameters. Biometrika 2014;101:553–66.10.1093/biomet/asu016Suche in Google Scholar PubMed PubMed Central

15. Clark LC, Combs GF, Turnbull BW, Slate EH, Chalker DK, Chow J, et al. Effects of selenium supplementation for cancer prevention in patients with carcinoma of the skin: a randomized controlled trial. J Am Med Assoc 1996;276:1957–63.10.1001/jama.1996.03540240035027Suche in Google Scholar

16. Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement error in nonlinear models: a modern perspective. London: Chapman & Hall, 2006.10.1201/9781420010138Suche in Google Scholar

17. Prentice RL. Covariate measurement errors and parameter estimation in a failure time regression model. Biometrika 1982;69:331–42.10.1093/biomet/69.2.331Suche in Google Scholar

18. Wang CY, Hsu L, Feng ZD, Prentice RL. Regression calibration in failure time regression. Biometrics 1997;53:131–45.10.2307/2533103Suche in Google Scholar

19. Nakamura T. Proportional hazards model with covariates subject to measurement error. Biometrics 1992;48:829–38.10.2307/2532348Suche in Google Scholar

20. Wu L. A joint model for nonlinear mixed-effects models with censoring and covariates measured with error, with application to AIDS studies. J Am Stat Assoc 2002;97:955–64.10.1198/016214502388618744Suche in Google Scholar

21. Liu W, Wu L. Simultaneous inference for semiparametric nonlinear mixed-effects models with covariate measurement errors and missing responses. Biometrics 2007;63:342–50.10.1111/j.1541-0420.2006.00687.xSuche in Google Scholar PubMed

22. Wu L, Liu W, Hu XJ. Joint inference on HIV viral dynamics and immune suppression in presence of measurement errors. Biometrics 2010;66:327–35.10.1111/j.1541-0420.2009.01308.xSuche in Google Scholar PubMed

23. Jiang W, Turnbull BW, Clark LC. Semiparametric regression models for repeated events with random effects and measurement error. J Am Stat Assoc 1999;94:111–24.10.1080/01621459.1999.10473828Suche in Google Scholar

24. Cook RJ, Lawless JF. The statistical analysis of recurrent events. New York: Springer, 2007.Suche in Google Scholar

25. Balakrishnan N, Peng Y. Generalized gamma frailty model. Stat Med 2006;25:2797–816.10.1002/sim.2375Suche in Google Scholar

26. Mazroui Y, Mathoulin-Pelissier S, Soubeyran P, Rondeau V. General joint frailty model for recurrent event data with a dependent terminal event: application to follicular lymphoma data. Stat Med 2012;31:1162–76.10.1002/sim.4479Suche in Google Scholar

27. Zeng D, Ibrahim J, Chen M, Hu K, Jia C. Multivariate recurrent events in the presence of multivariate informative censoring with applications to bleeding and transfusion events in myelodysplastic syndrome. J Biopharm Stat 2014;24:429–42.10.1080/10543406.2013.860159Suche in Google Scholar

28. Wang CY. Robust sandwich covariance estimation for regression calibration estimator in cox regression with measurement error. Stat Probab Lett 1999;45:371–8.10.1016/S0167-7152(99)00079-6Suche in Google Scholar

29. Stefanski LA. The effects of measurement error on parameter estimation. Biometrika 1985;72:583–92.10.1093/biomet/72.3.583Suche in Google Scholar

30. Huang Y, Wang CY. Cox regression with accurate covariates unascertainable: a nonparametric-correction approach. J Am Stat Assoc 2000;95:1209–19.10.1080/01621459.2000.10474321Suche in Google Scholar

31. Huber PJ. Robust statistics. New Jersey: John Wiley & Sons, 2009.10.1002/9780470434697Suche in Google Scholar


Supplemental Material

The online version of this article (DOI: 10.1515/ijb-2016-0001) offers supplementary material, available to authorized users.


Published Online: 2016-8-9
Published in Print: 2016-11-1

© 2016 Walter de Gruyter GmbH, Berlin/Boston

Heruntergeladen am 19.9.2025 von https://www.degruyterbrill.com/document/doi/10.1515/ijb-2016-0001/html
Button zum nach oben scrollen