Startseite Joint Model for Mortality and Hospitalization
Artikel Öffentlich zugänglich

Joint Model for Mortality and Hospitalization

  • Yuqi Chen , Wensheng Guo , Peter Kotanko , Len Usvyat und Yuedong Wang EMAIL logo
Veröffentlicht/Copyright: 9. November 2016
Veröffentlichen auch Sie bei De Gruyter Brill

Abstract:

Modeling hospitalization is complicated because the follow-up time can be censored due to death. In this paper, we propose a shared frailty joint model for survival time and hospitalization. A random effect semi-parametric proportional hazard model is assumed for the survival time and conditional on the follow-up time, hospital admissions or total length of stay is modeled by a generalized linear model with a nonparametric offset function of the follow-up time. We assume that the hospitalization and the survival time are correlated through a latent subject-specific random frailty. The proposed model can be implemented using existing software such as SAS Proc NLMIXED. We demonstrate the feasibility through simulations. We apply our methods to study hospital admissions and total length of stay in a cohort of patients on hemodialysis. We identify age, albumin, neutrophil to lymphocyte ratio (NLR) and vintage as significant risk factors for mortality, and age, gender, race, albumin, NLR, pre-dialysis systolic blood pressure (preSBP), interdialytic weight gain (IDWG) and equilibrated Kt/V (eKt/V) as significant risk factors for both hospital admissions and total length of stay. In addition, hospitalization admissions is positively associated with vintage.

1 Introduction

Hospitalization is a main contributor to the total cost of care and identification of the related risk factors is of interest in many health care studies. The main difficulty in modeling hospitalization data is due to the fact that the frequency of hospitalization and the total length of hospitalizations are functions of follow-up time that can be informatively censored due to death. Since both the hospitalization outcome and time-to-death are related to the underlying health, it is desirable to jointly model them as bivariate outcomes. Mixed types of multivariate outcomes are common in many fields of science and social science. Various statistical models and methods have been proposed to deal with different types of mixed outcomes [1]. For example, Fitzmaurice and Laird [2] proposed regression models for continuous and binary outcomes. They focused on marginal regression models with a set of covariates and treated the association between continuous and binary response as a nuisance characteristic of the data. Sammel, Ryan, and Legler [3] proposed latent variable models for mixed discrete and continuous outcomes. They modeled the associations among the outcomes by an unobserved latent variable which depends on a set of covariates. Catalano [4] proposed a latent variable model for continuous and ordinal outcomes, and extended it to allow for clustering of the bivariate outcomes. Dunson and Herring [5] proposed latent variable models for mixed discrete outcomes including count, binary and discrete event time. A Bayesian approach was introduced for inference where conditionally-conjugate priors were chosen to facilitate posterior computation. However, these methods can not handle censored data which is needed for joint modeling of survival time and hospitalization in health studies.

Our research is motivated by the need for improvement in care for end-stage renal disease (ESRD) patients. Hemodialysis (HD) is the most frequently used treatment modality for ESRD patients. In general, HD patients suffer from multiple comorbidities, such as diabetes and cardiovascular diseases, resulting in frequent hospitalizations and substantial mortality. In spite of improvements over the years, hospitalization and mortality rates of ESRD patients on HD remain much higher than those of the general population [6]. In this article we are interested in identifying risk factors for hospitalization and mortality. The data come from an observational study of patients on HD in Fresenius Medical Care. Covariates at baseline and outcomes including survival time, hospital admissions and total length of hospital stay at follow-up were collected. Approximately 20 % of patients died during the follow-up period and observational times for hospitalization outcomes of these patients are censored due to death. Since both survival time and hospitalization are associated with the underlying health condition, it is likely that these outcomes from the same subject are correlated. Therefore, it is necessary to develop a joint model for survival time and hospitalization. Details of the data are given in Section 5.

In this article we propose a semi-parametric latent variable model for joint modeling of a survival time and an outcome from exponential family. The survival time is modeled by a semi-parametric proportional hazard model with a subject-specific random effect. The hospitalization related endpoint, such as the number of admissions, length of stay or whether a subject has ever been hospitalized, can be modeled by a generalized linear mixed effects model. Since the hospitalization outcome may only be observed before death, an offset function will be included in the generalized linear model to take into account the follow-up time. To allow a flexible relationship between the hospitalization endpoint and the follow-up time, we introduce a nonparametric smooth offset function that includes parametric functions, such as logarithm, as special cases. When the offset function is parametric, these models reduce to the standard generalized mixed effects models and parameters of interest may be interpreted in terms of the constant conditional means such as incident rate, mean duration and average probability. The smooth offset function allows deviation from this rigid assumption. The forms of the baseline hazard function and the offset function are usually unknown. They will be modeled non-parametrically using spline functions with non-negative and, when appropriate, monotone constraints. A latent random variable will be used to model potential correlation between survival time and hospitalization outcome from the same subject [7].

We note that there is a large body of literature on the joint modeling of survival hazard function and hospitalization rate. See for example Lancaster and Intrator [8], Wang, Qin, and Chiang [9], Huang and Wolfe [10], Liu, Wolfe, and Huang [11], Huang, Qin, and Wang [12], and the references therein. These studies treated hospitalizations as recurrent events and focused on modeling the intensity function of the recurrent process. In this article, our main interest is on the expected number of hospitalizations and expected total length of stays which account for a major part of the total cost of care. We also note that there have been various proposals on the joint modeling of survival time and longitudinal data [13, 14]. We are interested in identifying risk factors at the baseline for the bivariate cross-sectional outcomes of hospitalization and time-to-death in the follow-up. Therefore methods for the joint modeling of longitudinal and survival data do not apply to our situation.

The rest of this article is organized as follows. Section 2 introduces the semi-parametric latent variable model. Section 3 provides details about our estimation procedure. Section 4 and Section 5 present simulation results and applications to patients on HD. The article ends with a discussion in Section 6.

2 The semi-parametric latent variable model

2.1 The overall model

For subject i, we denote Di as the death time, Ci as the censoring time, Ti=min{Ci,Di} as the observed time, Δi=I(Di<Ci) as the event indicator and hi(t) as the hazard function. Let Yi be another outcome variable from exponential family. For example, it could be the number of hospitalizations or the total length of hospital stays of subject i. Let ZiD and ZiY be covariates associated with the outcomes Di and Yi respectively. We will consider the following joint model:

(1)hi(t)=h0(t)exp(βZiD+νi),gE(Yi|Ti,νi)=w(Ti)+αZiY+ηνi,

where h0 is the baseline hazard, g is the link function, νiiidN(0,σ2) is a shared frailty for subject i, α, β and η are unknown parameters, and w is an offset function. The first equation in (1) is a Cox proportional hazard model for survival time while the second equation in (1) is a generalized linear model for Yi. The shared frailty is introduced to model heterogeneity among subjects and correlation between Di and Yi within a subject. For simplicity we consider a normal distribution for the shared frailty. Extensions to other distributions are straightforward. The offset term w(Ti) is introduced to account for the fact that Yi is only observed prior to time Ti.

2.2 A spline model for the baseline hazard

The form of the baseline hazard function h0(t) is generally unknown in practice. We will assume that h0(t) is a smooth function and model it using B-spline basis functions:

h0(t)=k=1K+1+LhdkBk(t|K,τh),

where Bk(t|K,τh) denote the evaluation at t of the K-degree B-spline basis functions generated with Lh internal knots τh={th1,th2,,thLh}. We will use the constraints dk0 to enforce the non-negativity constraint of the function h0(t). The function h0(t) is decided by coefficients dk as well as the number and locations of knots. The estimation of coefficients and the selection of knots will be discussed in Section 3.

2.3 A spline or monotone spline model for the offset function

When Yi represents counts such as hospital admissions, one possible assumption is that Yi is generated from a homogeneous Poisson process. Under this assumption and canonical link for Poisson data, the offset function w(t)=log(t). However in practice Yi may be generated from a non-homogeneous Poisson process [15]. It is therefore desirable to leave the functional form of w unspecified. Again we model w nonparametrically using B-spline basis functions:

w(t)=k=1K+1+LwckBk(t|K,τw),

where Bk(t|K,τw) denote the evaluation at t of the K-degree B-spline basis functions generated with Lw internal knots τw={tw1,tw2,,twLw}.

For Poisson data, it is natural to assume that the expectation of Yi increase with the observational time Ti. In this case we assume that w(t) is a smooth non-decreasing function. Ramsay [16] used integrated M-splines to fit a monotone spline. We will adopt a similar approach using integrated B-splines. Specifically, denote integrated B-splines as Ik(t|K,τ)=0tBk(u|K,τ)du for k=1,,K. Since Bk’s are non-negative, Ik’s provide a set of non-decreasing basis functions. We model w using integrated B-spline basis functions:

w(t)=k=1K+1+LwckIk(t|K,τw)+c,

where c is an unknown constant and ck’s are coefficients with constraints ck0.

3 Estimation

The full likelihood is

(2)L=i=1nf(Yi|Ti,Δi,νi)li(Ti,Δi|νi)fν(νi)dνi,

where n is the total number of subject, f(Yi|Ti,Δi,νi) is the conditional density of Yi in the exponential family, fν(νi) is the density function of the latent random variable ν, and

li(Ti,Δi|νi)=h0(Ti)exp(βZiD+νi)Δiexp0Tih0(t)exp(βZiD+νi)dt.

Our goal is then to obtain parameter estimates by maximizing the likelihood. Since there is no closed form solution, we apply the Newton-Raphson methods to compute parameter estimates numerically. For stability, we apply the Newton-Raphson ridge optimization where a pure Newton step is used when the Hessian is positive definite and when the Newton step successfully increases the value of the likelihood, otherwise a multiple of the identity matrix is added to the Hessian matrix [17]. To calculate the gradient and Hessian matrix, we need to evaluate integrals derived from the likelihood function. The Gaussian quadrature method is used to approximate these integrals. We estimate random effects νi by their empirical Bayes estimators νˆi that maximize f(yi|Ti,Δi,νi)li(Ti,Δi|νi)fν(νi).

Numerically stable implementations of these methods can be obtained from a variety of publicly available softwares [18]. In our simulation and example, we employed SAS procedure Proc NLMIXED to perform the computation. Proc NLMIXED has an appealing feature which allows a user-specified log likelihood functions with respect to the random effects. See Littell et al. [17] for details on this procedure.

The number and location of knots are fixed in the above discussion. While increasing the number of knots has the capability to model a more flexible function, having too many knots will increase the complexity of the model and result in over-fitting. A data-driven procedure for the selection of number and location of knots is desirable. We allow h0(t) and w(t) to have different numbers and locations of knots. In practice one may place knots evenly in a range or at equally spaced quantiles of data. We select the numbers of knots by minimizing the following AIC Akaike [19]:

(3)AIC(Lh,Lw)=2logL+2(Lh+Lw+8).

4 Simulations

We generate simulation samples from the following model

(4)hi(t|νi)=h0(t)exp(βZi+νi),log(E(Yi|Ti,νi))=w(Ti)+αZi+ηνi,

where Zi’s are iid random variables with P(Zi=0)=P(Zi=1)=0.5, νiiidN(0,0.5), and conditional on Ti and νi, Yi follows a Poisson distribution with mean exp(w(Ti)+αZi+ηνi). The censoring time Ci=min{Ei,4} where EiiidExp(0.1). The true parameters are set to be (α,β,η)=(0.5,0.5,1). We consider two baseline hazard functions, Exponential baseline h0(t)=1/2 and Weibull baseline h0(t)=t/2, and two offset functions, linear function w(t)=t/2 and log function w(t)=log(t). The censoring rates in all 4 cases are about 20 %.

The baseline hazard h0(t) is estimated using cubic B-spline basis functions. The offset function w(t) is estimated using cubic integrated B-spline basis functions under the monotone constraint. Interior knots are equally spaced within the time period (0,4], and the number of knots for h0(t) and w(t) range from 2 to 4 respectively. The optimal combination of number of knots is selected by minimizing the AIC (3).

Simulation under each setting is repeated 500 times. For the estimation of parameters, we compute bias, mean squared error (MSE) and coverage probability of 95 % confidence intervals (CP). The 95 % confidence interval is constructed as the MLE plus-minus 1.96 times the standard errors obtained from the variance-covariance matrix. For the estimation of functions h0(t) and w(t), we compute the integrated mean square error (IMSE)

IMSE(fˆ)=04(fˆ(t)f(t))2dt

for each replicate, where f is either h0 or w.

Table 1:

Bias, mean squared error (MSE) and coverage probability of 95 % confidence intervals (CP) based on the joint model when h0(t)=1/2 and w(t)=t/2.

h0(t)=1/2w(t)=t/2αβησ2
n=300Bias0.0070.045−0.0640.337
MSE0.0170.0370.0660.65
CP0.9380.9810.8090.965
n=500Bias0.0020.014−0.0080.149
MSE0.0100.0220.8710.936
CP0.9460.9460.8710.936
n = 1,000Bias0.0020.0080.0030.063
MSE0.0050.010.0310.062
CP0.940.9480.9160.94
Table 2:

Bias, mean squared error (MSE) and coverage probability of 95 % confidence intervals (CP) based on the joint model when h0(t)=1/2 and w(t)=log(t).

h0(t)=1/2w(t)=log(t)αβησ2
n=300Bias0.0330.0840.1060.779
MSE0.0250.0640.1092.833
CP0.9660.9680.7740.957
n = 500Bias0.0160.0460.030.381
MSE0.0160.0300.0880.912
CP0.9550.9730.8420.953
n = 1,000Bias0.0040.0170.0050.127
MSE0.0070.0110.0530.156
CP0.9470.9660.8900.951
Table 3:

Bias, mean squared error (MSE) and coverage probability of 95 % confidence intervals (CP) based on the joint model when h0(t)=t/2 and w(t)=t/2.

h0(t)=t/2w(t)=t/2αβηησ2
n=300Bias−0.0030.0.0160.056
MSE0.0110.0250.060.075
CP0.9680.9630.9250.951
n = 500n=500Bias−0.0060.0080.0070.044
MSE0.0070.0150.0380.052
CP0.9440.9620.9120.930
n=1000Bias0.0020.0030.0110.017
MSE0.0030.0080.0220.025
CP0.9500.9460.9420.928
Table 4:

Bias, mean squared error (MSE) and coverage probability of 95 % confidence intervals (CP) based on the joint model when h0(t)=t/2 and w(t)=log(t).

h0(t)=t/2w(t)=log(t)αβησ2
n=300Bias0.0330.0640.0250.346
MSE0.0200.064−0.0250.346
CP0.9580.9730.8590.936
n=500Bias0.0140.036−0.0140.227
MSE0.0130.0270.0700.386
CP0.9450.9550.8500.951
n=1000Bias0.0090.015−0.0090.117
MSE0.0060.0110.0400.100
CP0.9540.9500.8920.942
Table 5:

Integrated Mean Square Error (IMSE) of the baseline hazard h0(t) and offset function w(t) fitted by the joint model.

h0(t)w(t)
h0(t)=t/2n=3000.0780.079
w(t) = t/2n=5000.0500.052
n=10000.0270.027
h0(t)=t/2n=3000.1090.151
w(t)=log(t)n = 5000.0630.097
n=10000.0330.052
h0(t)=t/2n=3000.6650.066
w(t)=t/2n=5000.4560.043
n=10000.2300.025
h0(t)=t/2n=3000.8560.165
w(t)=log(t)n=5000.6620.114
n=10000.3400.057

Table 1Table 5 summarize performances of parameter and function estimates under four simulation settings. Overall the proposed estimation procedure perform well: bias and MSE are small, and the coverages of 95 % confidence intervals are close to the nominal value except for η. The coverages of 95 % confidence intervals for η are below the nominal value. This is not surprising because η is associated with the variance within subject and only limited information contributes to its estimations. One way to improve the coverage probability is to construct a confidence region for both η and σ2 since the two estimates are highly correlated. The performances improve as sample size increases.

As an illustration, Figure 1 shows the 5th, 25th, 50th, 75th and 95th best estimates of hˆ0(t) and wˆ(t) ordered by the IMSE under the simulation setting when h0(t)=t/2, w(t)=log(t) and n=500. Overall, the estimates are close to the true functions except for the baseline hazard with large t. The poor estimation of the baseline hazard with large t is likely caused by censoring.

Figure 1: True function (solid lines) and estimates (dashed lines) of h0(t) = t/2 (left) and w(t) = log(t) (right) correspond to the 5th, 25th, 50th, 75th and 95th percentiles of the IMSE when h0(t) = t/2, w(t) = log(t) and n = 500.
Figure 1:

True function (solid lines) and estimates (dashed lines) of h0(t) = t/2 (left) and w(t) = log(t) (right) correspond to the 5th, 25th, 50th, 75th and 95th percentiles of the IMSE when h0(t) = t/2, w(t) = log(t) and n = 500.

We have also evaluated performance of our estimation procedure in a more complicated simulation setting. The data was generated from the following model

(5)hi(t|νi)=h0(t)exp(β1Z1i+β2Z2i+νi),log(E(Yi|Ti,νi))=w(Ti)+α1Z1i+α2Z2i+ηνi,

where Z1i’s are iid random variables with P(Z1i=0)=P(Z1i=1)=0.5, Z2i is a continuous random variable generated from Uniform(0,1), and νiiidN(0,0.2). The sample size n=1000 and the true parameters are set to be (α1,α2,β1,β2,η)=(0.5,1,0.5,1,1). We consider Weibull baseline hazard h0(t)=t2 and w(t)=log(t). The censoring rate is about 15 %.

We summarize bias, MSE and coverage probability of 95 % CP for the estimations of parameters in Table 6. The 5th, 25th, 50th, 75th and 95th best estimated baseline hazard and offset function are shown in Figure 2. Overall the proposed estimation method performs well.

Table 6:

Bias, mean squared error (MSE) and coverage probability of 95 % confidence intervals (CP) based on the joint model when h0(t)=t2 and w(t)=log(t).

h0(t)=t2w(t)=log(t)α1α2β1β2ησ2
n=1000Bias−0.020−0.0270.037−0.068−0.1130.146
MSE0.0060.0180.0100.0370.0830.086
CP0.9490.9650.9630.9460.8390.979
Table 7:

Summary statistics of covariates.

(Min, Max)Mean (Std)
Age (year)(1.00, 96.62)62.39 (14.84)
BMI (kg/m2)(13.75, 49.51)27.65 (6.46)
Albumin (g/dL)(1.60, 4.74)3.84 (0.37)
IDWG (%)(0.41, 7.99)3.48 (1.05)
PreSBP (mmHg)(81.88, 219.29)149.38 (18.86)
eKt/V(0.68, 3.77)1.46 (0.26)
NLR(0.51, 31.18)3.70 (2.32)
Vintage (year)(0.08, 7.90)2.56 (1.92)
Figure 2: True function (solid lines) and estimates (dashed lines) of h0(t)=t2${h_0}(t) = {t^2}$ (left) and w(t)=log(t)$w(t) = \log (t)$ (right) correspond to the 5th$5th$, 25th$25th$, 50th$50th$, 75th$75th$ and 95th$95th$ percentiles of the IMSE when h0(t)=t2${h_0}(t) = {t^2}$, w(t)=log(t)$w(t) = \log (t)$ and n=1000$n = 1000$.
Figure 2:

True function (solid lines) and estimates (dashed lines) of h0(t)=t2 (left) and w(t)=log(t) (right) correspond to the 5th, 25th, 50th, 75th and 95th percentiles of the IMSE when h0(t)=t2, w(t)=log(t) and n=1000.

5 Application

We now apply the proposed method to model mortality and hospitalization outcomes for patients on HD. Baseline covariates are collected from 1999 HD patients from 1 January 2007 to 31 December 2007. Survival time, the number of hospital admissions and total length of stay of these patients during the period of 1 January 2008 and 31 December 2009 are collected. 1078 (53.93 %) patients are male. 984 (49.22 %) patients are black, 834 (41.72 %) patients are white, the rest are from other races. Time-varying covariates are calculated as the averages in baseline period for each patient. The summary statistics for these covariates are listed in Table 7.

In previous studies, albumin and systolic blood pressure before dialysis (preSBP) have been found as significant risk factors for mortality [2022]. Erdem, Kaya, Karatas, Dilek, and Akpolat [23] observed that HD patients with high neutrophil to lymphocyte ratio (NLR) levels have increased risk of short term mortality. Our preliminary analysis indicates that time in years since initiation of dialysis (vintage), inter-dialytic weight gain (IDWG) and a measure of dialysis capability eKt/V also have significant effect on mortality. In addition, we will include gender, race and BMI.

In modeling the hospitalization, the number of hospital admissions is usually the primary outcome which will be studied in Section 5.1 using a Poisson model. We are sometimes also interested in whether a patient has ever been hospitalized as a binary outcome. Since the probability of ever been hospitalized can be derived from the Poisson model, we omit the details of modeling the binary outcome in this paper. Given the subject has been hospitalized, a further goal is to identify the risk factors that lead to longer total length of stay which will be studied in Section 5.2 using a Gamma model. For simplicity we will consider the same set of covariates for all models.

5.1 Joint analysis of mortality and hospital admission

359 (17.96 %) patients died during the follow-up period. The number of hospital admissions in the data ranges from 0 to 37 with mean 2.53. We consider the following joint model:

(6)hi(t|νi)=h0(t)exp{β1Agei+β2Albumini+β3PreSBPi+β4NLRi+β5BMIi+β6Malei+β7IDWGi+β8eKt/Vi+β9Vintagei+β10RaceWhitei+β11RaceBlack+νi},g(E(Yi|Ti,νi))=w(Ti)+α1Agei+α2Albumini+α3PreSBPi+α4NLRi+α5BMIi+α6Malei+α7IDWGi+α8eKt/Vi+α9Vintagei+α10RaceWhitei+α11RaceBlacki+ηνi,

where Yi represents the number of hospital admissions of patient i and is assumed to follow a Poisson distribution, and νiiidN(0,σ2).

As in the previous section we set the interior knots for baseline hazard and offset function equally spaced within the time period. The number of knots ranges from 2 to 4. Among all the combinations, the AIC selects 2 knots for the baseline hazard and 2 knots for the offset function.

We summarize the estimation results in Table 8. Tests are constructed based on asymptotic properties of the MLEs after selection of the knots. All covariates except BMI are significantly associated with the expected number of hospital admissions, while age, albumin, NLR, eKt/V and vintage are significantly associated with the hazard function. Overall age, NLR and vintage are positively associated with both hazard and the number of hospital admissions, while albumin and eKt/V are negatively associated with the outcomes. Furthermore, pre-dialysis SBP and IDWG are positively associated with the number of hospital admissions, and female patients tend to have more hospital admissions.

The latent random variable is significant (σˆ2=0.6008, p=0.0057), which supports the model with random effect. Furthermore ηˆ is significantly larger than 0 (p<0.0001). It implies that the survival time and the number of hospital admissions are positive correlated. The estimated baseline function h0(t) and offset function w(t) are shown in Figure 3 with 95 % point-wise confidence intervals. The confidence intervals are constructed based on asymptotic variances of the MLEs of coefficients associated with the B-spline bases. While our model allows for inhomogeneous Poison model, the logarithm function is close to the estimated offset function and well within the 95 % confidence intervals, suggesting that it is reasonable to model the offset function by the logarithm function in this case.

Table 8:

Joint modeling of mortality and hospitalization of ESRD data.

CovariatesEstimateSEp-value
MortalityAge0.03550.0048<0.0001
Albumin−1.27360.1681<0.0001
PreSBP−0.00040.00310.8990
NLR0.10610.0217<0.0001
BMI−0.01980.01060.0619
Male0.07480.12100.5365
IDWG0.06840.06250.2737
eKt/V−0.59360.24740.0165
Vintage0.12440.0307<0.0001
Race(White)0.13410.21510.5329
Race(Black)−0.24460.21840.2629
HospitalizationAge0.00890.0022<0.0001
Albumin−0.81260.0856<0.0001
PreSBP0.00720.0015<0.0001
NLR0.07760.0129<0.0001
BMI−0.00260.00490.5974
Male−0.16120.06000.0073
IDWG0.10180.03070.0009
eKt/V−0.23600.11700.0437
Vintage0.03860.01570.0140
Race(White)0.26340.10940.0162
Race(Black)0.31300.10690.0035
σ20.60080.21720.0057
η1.22250.2039<0.0001
Figure 3: The estimated baseline function h0(t)${h_0}(t)$ and offset function w(t)$w(t)$ for the joint model of mortality and number of hospitalization.
Figure 3:

The estimated baseline function h0(t) and offset function w(t) for the joint model of mortality and number of hospitalization.

5.2 Joint analysis of mortality and total length of stay

To further investigate the features of patients with hospitalizations, another interesting application is to model mortality and total length of hospital stay. We will focus on the patients who had positive length of stays (1396 patients). The total length of stay ranges from 1 to 368 with mean 26.13. We consider the following joint model:

(7)hi(t|νi)=h0(t)exp{β1Agei+β2Albumini+β3PreSBPi+β4NLRi+β5BMIi+β6Malei+β7IDWGi+β8eKt/Vi+β9Vintagei+β10RaceWhitei+β11RaceBlack+νi},g(μi|Ti,νi)=w(Ti)+α1Agei+α2Albumini+α3PreSBPi+α4NLRi+α5BMIi+α6Malei+α7IDWGi+α8eKt/Vi+α9Vintagei+α10RaceWhitei+α11RaceBlacki+ηνi,

where Yi represents the total length of stay of patient i and is assumed to follow a Gamma distribution, and νiiidN(0,σ2).

Similar process for knots selection applies, which results in 2 knots for the baseline hazard and 2 knots for the offset function. The estimation results are summarized in Table 9. All covariates except BMI and vintage are significantly associated with the expectation of total length of stay, while age, albumin, NLR and vintage are significantly associated with the hazard function. We note that conclusions about risk factors are consistent with those in the previous subsection except for race: the total length of hospital stays of white patients is not significantly different from that of other races while white patients have significantly larger number of hospitalizations than other races. The latent random variable is borderline significant (σˆ2=0.2108, p=0.0542). The estimated baseline function h0(t) and offset function w(t) are shown in Figure 4. The estimated offset function w(t) is quite different from the logarithm function in this case.

Table 9:

Joint modeling of mortality and hospitalization of ESRD data.

CovariatesEstimateSEp-value
MortalityAge0.03020.0053<0.0001
Albumin−1.02370.1730<0.0001
PreSBP−0.00400.00350.2472
NLR0.07510.02280.0010
BMI−0.02040.01180.0843
Male0.08300.13400.5359
IDWG0.09390.07040.1826
eKt/V−0.49200.07040.0812
Vintage0.09440.03320.0045
Race(White)−0.03610.23030.8754
Race(Black)−0.33730.23350.1489
Length of StayAge0.00700.00220.0017
Albumin−0.53350.0870<0.0001
PreSBP0.00470.00160.0036
NLR0.04840.01370.0004
BMI−0.00450.00510.3788
Male−0.12690.06260.0430
IDWG0.07400.03190.0205
eKt/V−0.24950.12340.0433
Vintage0.02750.01650.0955
Race(White)0.13380.11230.2336
Race(Black)0.29330.11100.0084
σ20.21080.10940.0542
η1.88830.55500.0007
Figure 4: The estimated baseline function h0(t)${h_0}(t)$ and offset function w(t)$w(t)$ for the joint model of mortality and total length of stay.
Figure 4:

The estimated baseline function h0(t) and offset function w(t) for the joint model of mortality and total length of stay.

6 Discussion

In this article, we propose a semi-parametric joint model for survival time and hospitalization. In particular, we consider the number of hospital admissions and total length of stay as hospitalization outcomes. A shared random effect is introduced to account for the within subject correlation between the two outcomes. The baseline hazard and offset functions are modeled non-parametrically through B-spline or monotone B-spline bases in order to gain flexibility. With fixed number of knots, the techniques to numerically obtain maximum likelihood estimation are presented. We have also discussed the AIC method for selecting the number of knots. Standard large sample properties of maximum likelihood estimation apply when knots are fixed. Simulation results indicate that the proposed estimation method performs well.

Throughout this article, we assume Normal distribution for the random effect. Our method can be easily generalized to other parametric distributions for the random effect. We used B-spline bases with non-negative coefficients to model the non-negative baseline hazard. An alternative approach is to model the logarithm of the baseline hazard using B-spline bases without constraints on coefficients. However the approach cannot be implemented using the SAS NLMIXED procedure since the likelihood involves an intractable integral. We have analyzed different aspects of the hospitalization separately. One future research is to build a joint model for survival time, hospital admission and length of stay. Our methodology may also be extended to the case of the zero-inflated Poisson model.

Funding statement: National Science Foundation, Grant DMS-1507620; National Institutes of Health, Grant R01GM104470.

Acknowledgements

We thank the associated editor and two referees for constructive comments that substantially improved an earlier draft.

References

1. De Leon A, Chough KC. Analysis of mixed data: methods & applications. Boca Raton, FL: Chapman and Hall/CRC, 2013.10.1201/b14571Suche in Google Scholar

2. Fitzmaurice G, Laird N. Regression models for a bivariate discrete and continuous outcome with clustering. J Am Stat Assoc 1995;90:845–852.10.1080/01621459.1995.10476583Suche in Google Scholar

3. Sammel M, Ryan L, Legler J. Latent variable models for mixed discrete and continuous outcomes. J R Stat Soc Ser B (Stat Method) 1997;59:667–678.10.1111/1467-9868.00090Suche in Google Scholar

4. Catalano P. Bivariate modelling of clustered continuous and ordered categorical outcomes. Stat Med 1997;16:883–900.10.1002/(SICI)1097-0258(19970430)16:8<883::AID-SIM542>3.0.CO;2-ESuche in Google Scholar

5. Dunson D, Herring A. Bayesian latent variable models for mixed discrete outcomes. Biostatistics 2005;6:11–25.10.1093/biostatistics/kxh025Suche in Google Scholar

6. Collins A, Foley R, Chavers B, Gilbertson D, Herzog C, Ishani A, US renal data system 2013 annual data report. American journal of kidney diseases 2014;63(1 Suppl):A7.10.1053/j.ajkd.2013.11.001Suche in Google Scholar

7. McCulloch C. Joint modelling of mixed outcome types using latent variables. Stat Methods Med Res 2008;17:53–73.10.1177/0962280207081240Suche in Google Scholar

8. Lancaster T, Intrator O. Panel data with survival: hospitalization of HIV-positive patients. J Am Stat Assoc 1998;93:46–53.10.1080/01621459.1998.10474086Suche in Google Scholar

9. Wang M-C, Qin J, Chiang C-T. Analyzing recurrent event data with informative censoring. J Am Stat Assoc 2001;96:1057–1065.10.1198/016214501753209031Suche in Google Scholar

10. Huang X, Wolfe RA. A frailty model for informative censoring. Biometrics 2002;58:510–520.10.1111/j.0006-341X.2002.00510.xSuche in Google Scholar

11. Liu L, Wolfe RA, Huang X. Shared frailty models for recurrent events and a terminal event. Biometrics 2004;60:747–756.10.1111/j.0006-341X.2004.00225.xSuche in Google Scholar

12. Huang C-Y, Qin J, Wang M-C. Semiparametric analysis for recurrent event data with time-dependent covariates and informative censoring. Biometrics 2010;66:39–49.10.1111/j.1541-0420.2009.01266.xSuche in Google Scholar

13. Rizopoulos D. JM: An R package for the joint modelling of longitudinal and time-to-event data. J Stat Softw 2010;35:1–33.10.18637/jss.v035.i09Suche in Google Scholar

14. Tsiatis A, Davidian M. Joint modeling of longitudinal and time-toevent data: an overview. Stat Sin 2004;14:809–834.Suche in Google Scholar

15. Usvyat L, Kooman J, van der Sande F, Wang Y, Maddux F, Levin N, et al. Dynamics of hospitalizations in hemodialysis patients: results from a large US provider. Nephrol Dial Transplant 2014;29:442–448.10.1093/ndt/gft219Suche in Google Scholar

16. Ramsay J. Monotone regression splines in action. Stat Sci 1988;4:425–441.10.1214/ss/1177012761Suche in Google Scholar

17. Littell R, Milliken G, Stroup W, Wolfinger R, Schabenberger O. SAS for mixed models, 2nd . Cary, NC: SAS Institute Inc., 2006.Suche in Google Scholar

18. Press W, Teukolsky S, Vetterling W, Flannery B. Numerical recipes 3rd edition: the art of scientific computing. New York, NY: Cambridge University Press, 2007.Suche in Google Scholar

19. Akaike H. Information theory and an extension of the maximum likelihood principle. Second Int Symp Inf Theory 1973;:267–281.Suche in Google Scholar

20. He J, Whelton P. Elevated systolic blood pressure and risk of cardiovascular and renal disease: overview of evidence from observational epidemiologic studies and randomized controlled trials. Am Heart J 1999;138:S211–S219.10.1016/S0002-8703(99)70312-1Suche in Google Scholar

21. Hsu C, McCulloch C, Iribarren C, Darbinian J, Go A. Body mass index and risk for end-stage renal disease. Ann Intern Med 2006;144:21–28.10.7326/0003-4819-144-1-200601030-00006Suche in Google Scholar PubMed

22. Phelan P, O’Kelly P, Walshe J, Conlon P. The importance of serum albumin and phosphorous as predictors of mortality in ESRD patients. Ren Fail 2008;30:423–429.10.1080/08860220801964236Suche in Google Scholar PubMed

23. Erdem E, Kaya C, Karatas A, Dilek M, Akpolat T. Neutrophil to lymphocyte ratio in predicting short-term mortality in hemodialysis patients. J Exp Clin Med 2013;30:129–132.10.5835/jecm.omu.30.02.008Suche in Google Scholar

Published Online: 2016-11-9
Published in Print: 2016-11-1

© 2016 Walter de Gruyter GmbH, Berlin/Boston

Heruntergeladen am 5.11.2025 von https://www.degruyterbrill.com/document/doi/10.1515/ijb-2016-0002/html
Button zum nach oben scrollen