Abstract
In this paper we extend the bivariate hazard ratio to multivariate competing risks data and show that it is equivalent to the cause-specific cross hazard ratio. Two approaches are proposed to estimate these two equivalent association measures. One extends the plug-in estimator, and the other adapts the pseudo-likelihood estimator for bivariate survival data to multivariate competing risks data. The asymptotic properties of the extended estimators are established by using empirical processes techniques. The extended plug-in and pseudo-likelihood estimators have comparable performance with the existing U-statistic when the data have no tied events. However, in many applications, there are tied events in which all the three estimators are found to produce biased results. To our best knowledge, we are not aware of any association analysis for multivariate competing risks data that has considered tied events. Hence we propose a modified U-statistic to specifically handle tied observations. The modified U-statistic clearly outperforms the other estimators when there are rounding errors. All methods are applied to the Cache County Study to examine mother–child and sibship associations in dementia among this aging population, where the event times are rounded to the nearest integers. The modified U performs consistently with our simulation results and provides more reliable results in the presence of tied events.
1 Introduction
Multivariate competing risks data occur often in genetic familial studies and demography as there are usually more than one risk involved. As a motivating example, we consider a study conducted in Cache County, Utah on dementia in an aging population. One goal of this study is to measure the association in dementia onset among family members. For each family member, the researchers recorded his or her onset age of dementia, age at death, or age at termination of the study, whichever occurred first, along with the corresponding cause indicator. This falls into the paradigm of multivariate competing risks, where there is within-subject dependence between the risk of interest and competing risks, as well as between-subject dependence among family members induced by common genetic or environmental factors.
Many researchers have studied association analysis for bivariate survival data with independent right censoring. An overview of their work was given by Hougaard [1]. The previous work, however, cannot be applied directly to multivariate competing risks data because the event of interest may not be observed due to the occurrence of a competing event. One limitation of competing risks data is that the within-subject dependence between the target event and competing events is nonparametrically untestable, and their marginal distributions are nonparametrically nonidentifiable [2, 3]. The nonidentifiability issue raises difficulties in analyzing the association in dementia in the presence of possibly dependent censoring by death, which occurred frequently among this aging population.
Some of the association analyses for bivariate survival data with independent censoring have been recently adapted to bivariate competing risks data Bandeen-Roche and Liang [4] modified Oakes’s [5] cross hazard ratio to the bivariate competing risks setting. They defined a cause-specific association measure
More recently, Shih and Albert [8] examined familial association in breast cancer based on a cross hazard ratio for paired times of first event and an odds ratio for two competing causes (cancer and non-cancer deaths) and estimated the two association measures by adopting the two-stage estimation method [9]. They noted that the cause-specific hazard ratio
Most of the previous works related to
Therefore, we first extend the association measure
The rest of the paper is organized as follows. We extend the plug-in estimator to multivariate competing risks data in Section 2.2, briefly describe the notation and the method for the existing U-estimator [10] in Section 2.3, and propose the pseudo-likelihood estimator in Section 2.4. In Section 2.5 tied event data are considered, and a modified U-statistic is proposed to specifically handle rounding errors. In Section 2.6 we will discuss some formal tests on whether the associations are constant over time. The extended plug-in estimators and pseudo-maximum-likelihood estimators are compared with the U-statistics for untied data via simulation studies in Sections 3.1–3.3. In Section 3.4 we investigate the impact of rounding errors on the performance of the three estimators in comparison to the modified U-estimator. All methods are applied to the Cache County Study in Section 4, and some remarks are given in Section 5.
2 Methods
2.1 Multivariate competing risks data
In this paper, we consider clustered competing risks data. Suppose that in each cluster there are
2.2 Bivariate cause-specific hazard ratio for clustered data and its plug-in estimator
Cheng and Fine [7] proposed a cause-specific association measure based on bivariate cause-specific hazard functions and demonstrated its equivalence to the conditional cross hazard ratio [4]. We now extend their association measure to the clustered competing risks data. To simplify the notation we focus on association analysis of cause 1 events, though the methods proposed here are applicable to other causes. Analogous to Cheng and Fine [7], for any child–mother pair
and define bivariate hazard functions of a single cause 1 event from the child or mother with the other member being at risk as the following:
Then the association measure for cause 1 events based on these bivariate hazard functions is
It quantifies the elevated risk that both a child and his/her mother failed at times
A simple plug-in estimator of
where E is the expectation operator and
Define the empirical process
where
The consistency and asymptotic normality of
The association measure
2.3 Cause-specific cross hazard ratio for clustered data
For mother and all children data, under the exchangeability assumption for the children, the child–mother cause-specific density was defined as
for
Cheng et al. [10] defined a child–mother cause-specific association measure denoted by
which shares the same interpretation as the conditional cross hazard ratio proposed by Bandeen-Roche and Liang [4]. That is, the risk of a child experiencing a cause k event by time s would be accelerated for
Let
and the discordant indicator
The extended child–mother association measure was estimated by
which was the ratio of the number of concordanst pairs to that of discordant pairs, among those pairs where concordance status was determinable, and the times and causes of failure matched those in
For any
The estimation of the sibship association
The estimators (4) and (6) are U-statistics whose asymptotic properties can be obtained with U-statistic theory [10].
2.4 Maximum-pseudo (composite)-likelihood estimator
One can show that
We first divide mother–multiple children data into several strata, each containing pairs of mothers and their dth children. Here d ranges from 1 to
We now construct the pseudo-likelihood function based on the observed mother–multiple children data
The maximum-pseudo-likelihood estimator
The above framework can be further extended to clustered sibship data for which exchangeability can be assumed among all siblings. The adaptation of the maximum-pseudo-likelihood estimation to the sibship data is more complicated as there are two meaningful ways to form a pair,
We now construct the pseudo-log-likelihood function based on the stratum
The maximum-likelihood estimator
For each extended association measures
2.5 Modified U-statistic for tied data
In practice, the data are often rounded to the nearest integers, resulting in tied events. The adaptation of the plug-in estimator to the tied data is straightforward. For the U-statistic, we followed Cheng et al. [10] and excluded those pairs where one or both event times were tied. For the pseudo-likelihood function (7), the tied observations do not affect concordant pairs much, since a mother–child pair with double events is still counted as a concordant pair after rounding. However, rounding tends to underestimate the number of discordant pairs. Suppose we have two discordant pairs
We will show through simulation studies in Section 3.4 that the plug-in estimator tends to underestimate while the other two tend to overestimate the true positive association with the presence of tied data. For the pseudo-likelihood estimator, the ties decrease the number of discordant pairs, and hence increase the estimated strength of association. For the U-statistic, rounding causes the removal of roughly similar amounts of concordant and discordant pairs. When the association is positive, i.e. the numerator is larger than the denominator, subtracting the same positive value from both the numerator and the denominator makes the ratio larger. Therefore, the U-statistic overestimates the positive association when ties are present. The similar arguments can be used to show that the U-statistic results in an estimate that is smaller than the true value when the association is negative (
To address potential loss of accuracy in the three estimators for tied data, we propose a modified U-statistic which essentially adds back those removed concordant and discordant pairs due to tied events. When two mother–child pairs have tied observations – for example, when both mothers had the same event time – we consider their corresponding cause indicators. If mother 1 died and mother 2 had dementia at the same age, we assume that mother 1 would have dementia later than mother 2, had mother 1 not died. If child event times were not tied, we could determine concordant status. For those tied pairs that can be equally concordant or discordant, we add half a pair to both concordant and discordant pairs. It becomes more complicated for sibship data under the exchangeability assumption. The section “Formula for corrected U” in Appendix provides details on how to add back all missing concordant and discordant pairs for sibship data.
The idea may be used analogously for the pseudo-likelihood estimator. In the section “Asymptotic properties of
2.6 Grid selection
The selection of grids plays an important role in capturing the local association structure. If we use too few grid points, the resulting estimates may be too smooth, missing crucial local associations. On the other hand, if there are too many grid points, the number of events would be too few and there would be too much variability in the estimators. In practice, if there is no particular scientific definition of time regions in place, we recommend that we start with some crude divisions and then further refine the grid points. We will carry out formal hypothesis tests on two neighboring associations being the same, based on which we decide whether we should keep the finer grid points. That is, we would like to test the following hypotheses:
where
where
3 Simulation studies
3.1 Constant association
We conducted simulation studies to evaluate small-sample performance of the plug-in estimator
For each family, we first drew the familywise propensity A from a Gamma
We first adopted the same frailty model as those used in Bandeen-Roche and Liang [4] and Cheng and Fine [7]. The model assumes the same strength of association between causes and results in constant association over time. Hence the estimation is carried out over the entire observation region with uniform weight. We applied the three methods to each of the 500 simulated data sets. The mean of estimates, mean of standard error (for the plug-in and U-estimators), or bootstrap standard error (for the pseudo-likelihood estimator) using family-based resampling with replacement with 500 replications, empirical standard error, and coverage rate of 95% confidence intervals are summarized in the upper left panel of Table 1. With moderate sample size, light censoring, and 50–80% of cause 1 events, all the three estimators perform well. The estimates are very close to the true values. The model-based or bootstrap standard errors agree well with the empirical standard errors. The coverage rates are close to the nominal level 0.95. There is some bias in the pseudo-likelihood estimate and U-statistic when
Simulation results for mother–multiple children data (CM) and sibship data (CC)
| STAT | ||||||||
| Mother–Children | Sibship | |||||||
| 0.2 | 6 | EST | 6.03 | 6.16 | 6.15 | 5.98 | 6.23 | 6.22 |
| MSE | 0.82 | 0.84 | 0.84 | 1.11 | 1.15 | 1.18 | ||
| ESE | 0.83 | 0.79 | 0.86 | 1.21 | 1.23 | 1.31 | ||
| COV | 0.93 | 0.96 | 0.95 | 0.90 | 0.93 | 0.92 | ||
| 0.5 | 3 | EST | 2.99 | 3.01 | 3.01 | 3.00 | 3.04 | 3.05 |
| MSE | 0.24 | 0.23 | 0.24 | 0.32 | 0.33 | 0.33 | ||
| ESE | 0.23 | 0.23 | 0.23 | 0.32 | 0.33 | 0.33 | ||
| COV | 0.95 | 0.95 | 0.96 | 0.95 | 0.94 | 0.96 | ||
| 0.8 | 2.25 | EST | 2.24 | 2.26 | 2.25 | 2.24 | 2.26 | 2.26 |
| MSE | 0.13 | 0.13 | 0.13 | 0.18 | 0.18 | 0.18 | ||
| ESE | 0.13 | 0.13 | 0.13 | 0.16 | 0.16 | 0.17 | ||
| COV | 0.94 | 0.96 | 0.94 | 0.97 | 0.96 | 0.97 | ||
| 0.2 | 6 | EST | 6.11 | 6.49 | 6.45 | 6.08 | 6.41 | 6.42 |
| MSE | 1.60 | 2.03 | 1.77 | 1.51 | 1.59 | 1.64 | ||
| ESE | 1.73 | 1.80 | 1.94 | 1.64 | 1.66 | 1.81 | ||
| COV | 0.94 | 0.98 | 0.95 | 0.91 | 0.95 | 0.94 | ||
| 0.5 | 3 | EST | 3.02 | 3.08 | 3.08 | 3.00 | 3.07 | 3.06 |
| MSE | 0.46 | 0.47 | 0.48 | 0.43 | 0.41 | 0.45 | ||
| ESE | 0.47 | 0.44 | 0.49 | 0.44 | 0.41 | 0.45 | ||
| COV | 0.95 | 0.97 | 0.95 | 0.94 | 0.96 | 0.95 | ||
| 0.8 | 2.25 | EST | 2.22 | 2.26 | 2.25 | 2.23 | 2.25 | 2.26 |
| MSE | 0.25 | 0.24 | 0.26 | 0.23 | 0.22 | 0.24 | ||
| ESE | 0.24 | 0.23 | 0.25 | 0.23 | 0.22 | 0.24 | ||
| COV | 0.94 | 0.95 | 0.94 | 0.95 | 0.94 | 0.95 | ||
| 0.2 | 6 | EST | 6.30 | 7.15 | 7.16 | 6.05 | 6.79 | 6.81 |
| MSE | 2.25 | 4.44 | 2.83 | 2.03 | 2.66 | 2.40 | ||
| ESE | 2.58 | 3.05 | 3.39 | 2.38 | 2.73 | 2.93 | ||
| COV | 0.89 | 0.98 | 0.92 | 0.87 | 0.93 | 0.90 | ||
| 0.5 | 3 | EST | 3.02 | 3.17 | 3.15 | 2.97 | 3.10 | 3.10 |
| MSE | 0.64 | 0.75 | 0.70 | 0.60 | 0.60 | 0.64 | ||
| ESE | 0.63 | 0.66 | 0.70 | 0.64 | 0.64 | 0.69 | ||
| COV | 0.93 | 0.97 | 0.94 | 0.90 | 0.93 | 0.92 | ||
| 0.8 | 2.25 | EST | 2.25 | 2.32 | 2.30 | 2.23 | 2.28 | 2.28 |
| MSE | 0.35 | 0.37 | 0.37 | 0.33 | 0.32 | 0.34 | ||
| ESE | 0.35 | 0.34 | 0.37 | 0.33 | 0.33 | 0.35 | ||
| COV | 0.94 | 0.96 | 0.95 | 0.93 | 0.92 | 0.95 | ||
Next, we evaluated the performances of the three estimators when the number of clusters was smaller and censoring was heavier. We reduced the cluster sizes for mother–children data and sibship data to 200 or 100 families and generated censoring times from an exponential function with mean 4 yielding ~35% censoring. The results are summarized in the middle and bottom panels of Table 1. The bias is more noticeable in the estimates as compared to the true values. Both the empirical standard error and the model-based or bootstrap standard error increase, though they in general agree with each other. The coverage rates are still pretty close to the nominal level 0.95, except when the number of cause 1 event is very small (
3.2 Time-varying association
In practice, familial association may not be constant over each time grid. In this case,
where we chose
with
We generated 500 clusters of mother and children data, each containing 500 or 100 families, and obtained the mean estimates, the model-based or bootstrap standard errors, the empirical standard errors, and the coverage rates where the true values were computed by averaging the estimates from 200 replications of 2,000 mother–child pairs. We examined the performance of the three estimators over a four-region grid, as well as over the entire observed time region, with uniform weight. The results are summarized in Table 2. The true values for the plug-in estimator and the U-estimator are identical and slightly different from those for the pseudo-likelihood estimator, since the latter weights the time-varying associations at various time points differently. The true values change across time regions. When we employ the entire region, the true values fall somewhere between the smallest and largest associations among all subregions. In general, all the three estimators still have good practical performance, except for the pseudo-likelihood estimator when the cluster size is
Simulation results for mother–multiple children data with time-varying association
| n = 500 | ||||||
| TRUE | 1.40 | 1.13 | 1.13 | 1.06 | 1.26 | |
| EST | 1.40 | 1.13 | 1.13 | 1.07 | 1.27 | |
| MSE | 0.13 | 0.14 | 0.15 | 0.17 | 0.08 | |
| ESE | 0.13 | 0.14 | 0.16 | 0.18 | 0.08 | |
| COV | 0.93 | 0.93 | 0.94 | 0.95 | 0.90 | |
| n = 100 | ||||||
| TRUE | 1.40 | 1.13 | 1.13 | 1.06 | 1.26 | |
| EST | 1.41 | 1.17 | 1.16 | 1.10 | 1.27 | |
| MSE | 0.28 | 0.30 | 0.32 | 0.37 | 0.18 | |
| ESE | 0.30 | 0.34 | 0.36 | 0.42 | 0.18 | |
| COV | 0.94 | 0.92 | 0.89 | 0.90 | 0.94 | |
| n = 500 | ||||||
| TRUE | 1.40 | 1.13 | 1.13 | 1.06 | 1.26 | |
| EST | 1.40 | 1.13 | 1.13 | 1.07 | 1.27 | |
| MSE | 0.13 | 0.13 | 0.15 | 0.17 | 0.08 | |
| ESE | 0.13 | 0.14 | 0.15 | 0.18 | 0.08 | |
| COV | 0.93 | 0.93 | 0.94 | 0.94 | 0.91 | |
| n = 100 | ||||||
| TRUE | 1.40 | 1.13 | 1.13 | 1.06 | 1.26 | |
| EST | 1.41 | 1.17 | 1.16 | 1.11 | 1.28 | |
| MSE | 0.29 | 0.31 | 0.34 | 0.40 | 0.19 | |
| ESE | 0.31 | 0.35 | 0.38 | 0.45 | 0.16 | |
| COV | 0.94 | 0.92 | 0.89 | 0.89 | 0.95 | |
| n = 500 | ||||||
| TRUE | 1.36 | 1.12 | 1.12 | 1.05 | 1.19 | |
| EST | 1.36 | 1.12 | 1.12 | 1.06 | 1.21 | |
| BSE | 0.12 | 0.13 | 0.14 | 0.16 | 0.07 | |
| ESE | 0.12 | 0.13 | 0.15 | 0.16 | 0.07 | |
| COV | 0.92 | 0.93 | 0.94 | 0.94 | 0.88 | |
| n = 100 | ||||||
| TRUE | 1.36 | 1.12 | 1.12 | 1.05 | 1.19 | |
| EST | 1.38 | 1.17 | 1.15 | 1.12 | 1.21 | |
| BSE | 0.13 | 0.13 | 0.15 | 0.20 | 0.07 | |
| ESE | 0.30 | 0.33 | 0.35 | 0.45 | 0.16 | |
| COV | 0.62 | 0.59 | 0.61 | 0.61 | 0.60 | |
3.3 Tied observations
Real data are often rounded as in the Cache County Study, resulting in tied observations. To evaluate how the three estimators perform when there are tied events, we rounded simulated mother–children data and sibship data in Section 3.1 to one decimal place containing 100 time points from 0 to 9.9. This is roughly comparable to 53 age points from 55 to 107 in the Cache County Study. The results are summarized in Table 3. The plug-in estimator
Simulation results for mother–multiple children data (CM) and sibship data (CC) with tied events
| STAT | ||||||||||
| Mother–Children | Sibship | |||||||||
| 0.2 | 6 | EST | 5.74 | 6.47 | 6.45 | 5.98 | 5.66 | 6.54 | 6.50 | 6.12 |
| MSE | 0.75 | 0.91 | 0.93 | 0.81 | 1.01 | 1.23 | 1.28 | 1.14 | ||
| ESE | 0.76 | 0.89 | 0.94 | 0.81 | 0.99 | 1.23 | 1.29 | 1.14 | ||
| COV | 0.90 | 0.95 | 0.96 | 0.96 | 0.88 | 0.97 | 0.96 | 0.95 | ||
| 0.5 | 3 | EST | 2.84 | 3.24 | 3.15 | 2.98 | 2.89 | 3.32 | 3.24 | 2.99 |
| MSE | 0.21 | 0.26 | 0.26 | 0.23 | 0.30 | 0.36 | 0.38 | 0.32 | ||
| ESE | 0.21 | 0.26 | 0.27 | 0.22 | 0.33 | 0.40 | 0.41 | 0.31 | ||
| COV | 0.87 | 0.88 | 0.93 | 0.94 | 0.89 | 0.89 | 0.92 | 0.94 | ||
| 0.8 | 2.25 | EST | 2.14 | 2.49 | 2.37 | 2.23 | 2.13 | 2.49 | 2.36 | 2.23 |
| MSE | 0.12 | 0.15 | 0.15 | 0.13 | 0.16 | 0.20 | 0.20 | 0.17 | ||
| ESE | 0.12 | 0.15 | 0.15 | 0.13 | 0.17 | 0.21 | 0.21 | 0.18 | ||
| COV | 0.82 | 0.67 | 0.91 | 0.93 | 0.84 | 0.81 | 0.92 | 0.94 | ||
| 0.2 | 6 | EST | 5.63 | 6.72 | 6.66 | 6.21 | 5.71 | 6.73 | 6.77 | 6.20 |
| MSE | 1.40 | 2.14 | 1.91 | 1.70 | 1.34 | 1.69 | 1.80 | 1.56 | ||
| ESE | 1.45 | 1.79 | 0.89 | 1.80 | 1.45 | 1.82 | 2.04 | 1.64 | ||
| COV | 0.87 | 0.98 | 0.94 | 0.92 | 0.87 | 0.96 | 0.95 | 0.91 | ||
| 0.5 | 3 | EST | 2.83 | 3.33 | 3.23 | 3.01 | 2.85 | 3.36 | 3.26 | 3.05 |
| MSE | 0.40 | 0.52 | 0.54 | 0.47 | 0.39 | 0.47 | 0.51 | 0.45 | ||
| ESE | 0.41 | 0.50 | 0.55 | 0.48 | 0.33 | 0.49 | 0.55 | 0.45 | ||
| COV | 0.90 | 0.96 | 0.96 | 0.94 | 0.89 | 0.94 | 0.95 | 0.93 | ||
| 0.8 | 2.25 | EST | 2.12 | 2.54 | 2.40 | 2.27 | 2.13 | 2.55 | 2.41 | 2.27 |
| MSE | 0.22 | 0.29 | 0.30 | 0.26 | 0.21 | 0.26 | 0.28 | 0.24 | ||
| ESE | 0.22 | 0.26 | 0.29 | 0.26 | 0.23 | 0.29 | 0.31 | 0.24 | ||
| COV | 0.87 | 0.90 | 0.95 | 0.95 | 0.86 | 0.85 | 0.92 | 0.95 | ||
| 0.2 | 6 | EST | 5.74 | 7.45 | 7.43 | 6.69 | 5.65 | 7.21 | 7.15 | 6.42 |
| MSE | 1.96 | 4.24 | 3.07 | 2.57 | 1.83 | 2.89 | 2.70 | 2.26 | ||
| ESE | 2.51 | 4.01 | 4.26 | 3.09 | 2.02 | 2.87 | 3.09 | 2.75 | ||
| COV | 0.84 | 0.98 | 0.93 | 0.89 | 0.84 | 0.97 | 0.94 | 0.88 | ||
| 0.5 | 3 | EST | 2.82 | 3.40 | 3.30 | 3.15 | 2.85 | 3.43 | 3.34 | 3.11 |
| MSE | 0.57 | 0.84 | 0.79 | 0.70 | 0.54 | 0.70 | 0.75 | 0.65 | ||
| ESE | 0.57 | 0.76 | 0.80 | 0.75 | 0.57 | 0.73 | 0.81 | 0.69 | ||
| COV | 0.89 | 0.97 | 0.96 | 0.95 | 0.89 | 0.98 | 0.97 | 0.93 | ||
| 0.8 | 2.25 | EST | 2.16 | 2.66 | 2.49 | 2.29 | 2.14 | 2.61 | 2.45 | 2.31 |
| MSE | 0.32 | 0.45 | 0.44 | 0.37 | 0.30 | 0.38 | 0.41 | 0.35 | ||
| ESE | 0.31 | 0.43 | 0.43 | 0.38 | 0.31 | 0.41 | 0.43 | 0.37 | ||
| COV | 0.90 | 0.94 | 0.97 | 0.94 | 0.88 | 0.91 | 0.96 | 0.93 | ||
3.4 Negative association
The proposed plug-in and pseudo-likelihood estimators, as well as the original and modified U-estimators, can be naturally applied to the clustered data where the paired events have a negative association. To evaluate their practical performance under negative association, we simulated 500 or 100 clusters of mother–multiple children data where the mother–child paired event times satisfy a Clayton model with copula parameter
Simulation results for mother–multiple children data (CM) for negative association
| Stat | Stat | ||||||||||
| Untied data n = 500 | |||||||||||
| 0.75 | EST | 0.75 | 0.75 | 0.75 | – | 0.45 | EST | 0.45 | 0.45 | 0.45 | – |
| MSE | 0.06 | 0.06 | 0.04 | – | MSE | 0.04 | 0.03 | 0.04 | – | ||
| ESE | 0.06 | 0.06 | 0.04 | – | ESE | 0.04 | 0.03 | 0.04 | – | ||
| COV | 0.95 | 0.95 | 0.95 | – | COV | 0.94 | 0.94 | 0.94 | – | ||
| Untied data n = 100 | |||||||||||
| EST | 0.78 | 0.77 | 0.77 | – | EST | 0.48 | 0.46 | 0.46 | – | ||
| MSE | 0.13 | 0.12 | 0.13 | – | MSE | 0.09 | 0.08 | 0.09 | – | ||
| ESE | 0.14 | 0.12 | 0.15 | – | ESE | 0.09 | 0.07 | 0.10 | – | ||
| COV | 0.93 | 0.95 | 0.92 | – | COV | 0.94 | 0.97 | 0.92 | – | ||
| Tied data n = 500 | |||||||||||
| EST | 0.81 | 1.02 | 0.70 | 0.79 | EST | 0.53 | 0.87 | 0.39 | 0.50 | ||
| MSE | 0.06 | 0.05 | 0.06 | 0.06 | MSE | 0.04 | 0.04 | 0.04 | 0.04 | ||
| ESE | 0.06 | 0.06 | 0.05 | 0.06 | ESE | 0.05 | 0.04 | 0.04 | 0.04 | ||
| COV | 0.86 | 0 | 0.82 | 0.92 | COV | 0.56 | 0 | 0.6 | 0.80 | ||
| Tied data n = 100 | |||||||||||
| EST | 0.82 | 0.99 | 0.70 | 0.79 | EST | 0.54 | 0.77 | 0.40 | 0.50 | ||
| MSE | 0.13 | 0.14 | 0.13 | 0.14 | MSE | 0.09 | 0.10 | 0.08 | 0.08 | ||
| ESE | 0.13 | 0.14 | 0.13 | 0.14 | ESE | 0.09 | 0.09 | 0.09 | 0.08 | ||
| COV | 0.95 | 0.66 | 0.88 | 0.95 | COV | 0.90 | 0.04 | 0.83 | 0.91 | ||
4 Cache County Study
The data from the Cache County Study on Memory in Aging have been analyzed in previous works. Bandeen-Roche and Liang [4] and Bandeen-Roche and Ning [6] applied their U-type estimator, and Cheng and Fine [7] used a plug-in estimator to investigate the child–mother association in dementia based on 3,635 pairs of mothers and their eldest children. In order to utilize the information on other siblings, Cheng et al. [10] adapted the U-estimator for bivariate competing risks data to multivariate cases, assuming exchangeability among all siblings. In this paper, we were interested in both the child–mother and the sibship associations. Hence we extended the plug-in estimator from bivariate competing risks data and the pseudo-likelihood estimator from a typical bivariate survival setting to more complicated data structures such as mother–multiple children data and sibship data. The two extended estimators were applied to 3,635 families containing mothers and all her children for the mother–child association and to the sibship data containing 4,770 families of siblings for the sibship association. There are ~60% censored cases among mothers and 70% censored cases among children. The number of children in each family ranges from 1 to 14 in the mother–children data with mean and median both around 5. The distribution is similar for sibship data before we excluded single child families from the sibship analysis.
More importantly, the data had tied observations due to ages being rounded to the nearest integers, which has been ignored in the previous works. We computed the modified U-statistic along with the two extended estimators and the original U-statistic. To facilitate comparison with previous work, we examined the strengths of association over the same time regions as reported in Cheng et al. [10]. We first divided the age ranges into three intervals:
Child–mother association and sibship association in the Cache County Study estimated within 3
| Child–mother association | |||||||||||||
| Obs | |||||||||||||
| N | EST | MSE | BSE | EST | BSE | EST | MSE | BSE | EST | MSE | BSE | ||
| 1,338 | 10 | 4.03 | 1.33 | 1.40 | 2.57 | 1.22 | 4.21 | 1.42 | 1.51 | 4.06 | 1.36 | 1.34 | |
| 2,171 | 12 | 2.27 | 0.69 | 0.72 | 3.13 | 1.16 | 2.34 | 0.73 | 0.76 | 2.29 | 0.70 | 0.69 | |
| 3,907 | 7 | 1.24 | 0.46 | 0.48 | 1.31 | 0.56 | 1.30 | 0.49 | 0.51 | 1.25 | 0.47 | 0.49 | |
| 1,258 | 18 | 4.86 | 1.26 | 1.30 | 4.83 | 1.51 | 5.14 | 1.39 | 1.44 | 4.93 | 1.30 | 1.36 | |
| 2,271 | 23 | 2.64 | 0.58 | 0.60 | 3.00 | 0.84 | 2.76 | 0.64 | 0.66 | 2.68 | 0.60 | 0.58 | |
| 3,757 | 15 | 2.13 | 0.63 | 0.64 | 1.85 | 0.63 | 2.27 | 0.70 | 0.72 | 2.15 | 0.65 | 0.67 | |
| 650 | 3 | 0.96 | 0.49 | 0.48 | 1.83 | 0.71 | 0.94 | 0.49 | 0.49 | 0.96 | 0.49 | 0.51 | |
| 961 | 10 | 2.44 | 0.89 | 0.95 | 2.55 | 0.89 | 2.55 | 0.98 | 1.08 | 2.47 | 0.92 | 0.96 | |
| 1,715 | 9 | 2.40 | 1.04 | 1.12 | 1.88 | 0.91 | 2.66 | 1.19 | 1.15 | 2.43 | 1.08 | 1.07 | |
| Sibship association | |||||||||||||
| Obs | |||||||||||||
| N | EST | MSE | BSE | EST | BSE | EST | MSE | BSE | EST | MSE | BSE | ||
| 22,478 | 26 | 3.40 | 1.01 | 1.08 | 3.16 | 1.23 | 3.42 | 1.03 | 1.12 | 3.44 | 1.03 | 1.09 | |
| 17,525 | 44 | 3.35 | 0.58 | 0.56 | 3.87 | 0.76 | 3.45 | 0.61 | 0.59 | 3.37 | 0.59 | 0.57 | |
| 5,256 | 27 | 2.42 | 0.60 | 0.60 | 2.42 | 0.63 | 2.56 | 0.66 | 0.68 | 2.44 | 0.62 | 0.66 | |
| 21,500 | 104 | 2.95 | 0.49 | 0.48 | 3.16 | 0.55 | 3.11 | 0.53 | 0.53 | 3.00 | 0.51 | 0.52 | |
| 10,531 | 56 | 2.58 | 0.44 | 0.40 | 2.41 | 0.42 | 2.80 | 0.50 | 0.47 | 2.61 | 0.45 | 0.44 | |
| 9,984 | 60 | 1.65 | 0.45 | 0.42 | 1.43 | 0.27 | 1.81 | 0.52 | 0.52 | 1.67 | 0.45 | 0.46 | |
The original U-estimates for mother–child associations are identical to those reported in Cheng et al. [10]. However, there are some that differ in the sibship analysis because we are now using not only the pairs whose event times belong to the region of interest but also the pairs whose switched times under exchangeability belong to the region. For mother–child associations, the four estimates are coherent on most regions. The modified U-estimates almost always lie between the plug-in estimates and the original U-estimates except for the region
However, when the number of double cause 1 events
Child–mother association and sibship association in the Cache County Study estimated within 2
| Child–mother association | |||||||||||||
| Obs | |||||||||||||
| N | EST | MSE | BSE | EST | BSE | EST | MSE | BSE | EST | MSE | BSE | ||
| 5,319 | 41 | 3.41 | 0.60 | 0.59 | 3.57 | 0.68 | 3.55 | 0.65 | 0.65 | 3.44 | 0.61 | 0.62 | |
| 5,891 | 12 | 1.30 | 0.39 | 0.40 | 1.25 | 0.40 | 1.36 | 0.42 | 0.44 | 1.30 | 0.39 | 0.45 | |
| 3,330 | 35 | 2.40 | 0.48 | 0.52 | 2.45 | 0.49 | 2.49 | 0.53 | 0.58 | 2.43 | 0.49 | 0.50 | |
| 3,488 | 19 | 2.71 | 0.87 | 0.55 | 1.80 | 0.41 | 2.90 | 0.99 | 0.63 | 2.75 | 0.90 | 0.95 | |
| Sibship association | |||||||||||||
| Obs | |||||||||||||
| N | EST | MSE | BSE | EST | BSE | EST | MSE | BSE | EST | MSE | BSE | ||
| 60,922 | 128 | 3.28 | 0.52 | 0.51 | 3.17 | 0.50 | 3.36 | 0.55 | 0.54 | 3.33 | 0.53 | 0.52 | |
| 9,663 | 44 | 2.61 | 0.47 | 0.52 | 2.41 | 0.45 | 2.76 | 0.52 | 0.58 | 2.63 | 0.48 | 0.49 | |
| 33,893 | 168 | 2.94 | 0.35 | 0.34 | 2.77 | 0.34 | 3.11 | 0.39 | 0.39 | 2.97 | 0.36 | 0.36 | |
| 16,108 | 94 | 2.03 | 0.34 | 0.33 | 1.67 | 0.25 | 2.24 | 0.40 | 0.40 | 2.06 | 0.35 | 0.34 | |
When we examine the mother–child associations on broader regions, as given in Table 6, the plug-in estimates, the modified U-estimates, and the original U become closer and always follow the same order. A similar trend is observed in the sibship analysis. However, the discrepancy between the likelihood-based estimates and the other three still exists, especially on the region
In this application, we consider and contrast the associations over a
5 Remarks
In this paper we discuss three types of nonparametric approaches to estimating two equivalent and easily interpreted association measures for multivariate competing risks data. The pseudo-likelihood estimator lacks an explicit variance estimator and relies on bootstrapping, hence it is most computationally intensive. For example, it took about half a second to obtain the plug-in estimate or the U-estimate as well as the corresponding standard error and about 0.5(500)=250 seconds to obtain the pseudo-likelihood estimate and its bootstrap standard error based on 500 bootstrap samples, for the case of
On the other hand, the pseudo-likelihood estimator has the potential to be most efficient. In this paper we use a stratified approach which always matches up children with the same sibling order. It is possible to check concordance status for two pairs, for example, a mother and her eldest child from one family and another mother and her second child from another family. There are actually many more combinations than those used in eqs (7) and (8). Based on our simulation studies, the stratified pseudo-likelihood estimator has comparable efficiency to the other two estimators. If we were able to use all possible combinations, which of course would be very computationally intensive, we would further improve the efficiency of the pseudo-likelihood estimator. Moreover, the pseudo-likelihood estimator weights the time-varying associations differently from the plug-in estimator and the U-estimator. We hence recommend using both the pseudo-likelihood estimator and the plug-in (or U-) estimator to capture different aspects of the underlying association, when we suspect the association to vary over time but cannot refine the grids due to the limited sample size. In addition, the likelihood-based approach is more adaptive to a regression setting or to a parametric model. For example, one may extend the regression model for bivariate competing risks data [19] to multivariate competing risks data based on the pseudo-likelihood functions in eqs (7) and (8). Recently, Hu et al. [23] used a polynomial function to approximate cross hazard ratio and estimated the unknown parameters based on a pseudo-partial likelihood. Similar parametric approximation may be used for the cause-specific cross hazard ratio, and the pseudo-likelihood functions considered in this paper can again be used as the base of estimation. These will be future research topics.
More importantly, we investigate how rounding errors affect the performance of the three piecewise estimators and propose a modified U-statistic to handle tied multivariate competing risks data. For the modified U-statistic, we simply adjust for concordant and discordant pairs for the tied events. It is also fast to compute the modified U-estimate, e.g. taking about a second to obtain the modified U-estimate and standard error for the case of
It is worth pointing out that our method can also handle other-cause association and across-cause association. However, the estimators may not have satisfactory performance under settings in which it is relatively rare to observe two different types of outcomes for two relatives. This is also true for the primary cause. In the Cache County Study, the onset of dementia is relatively rare. Thus, a large sample is needed to warrant reliable estimation of the familial association in dementia, and it is more crucial to use the information from all children in the estimation. If a large sample is not available, some parametric or semiparametric models such as the frailty model considered in Gorfine and Hsu [11] may be more efficient in estimating the familial association of a rare event among family members.
Appendix
Asymptotic properties of ζ ˆ C M
Recall the extended association measure for mother–children data within region
where
as given in eq. (6).
Following the arguments in Cheng and Fine [7], we establish consistency and asymptotic normality as below. For simplicity, in the sequel, we omit the integration region
where
where
as
as
Asymptotic normality of
Note that
since we assume constant
By the asymptotic linearity, and assuming
Asymptotic properties of θ ˜ C M L
Without confusion we simply denote
where
Define
Then
Let
Next, we will establish the asymptotic normality of the estimator using a master theorem of Z-estimators. For any
since
Formula for corrected U
We now define concordant and discordant pairs for tied sibship observations as follows:
Similarly we define concordant and discordant pairs for the reversed tied observations under exchangeability
Acknowledgments
The authors would like to thank the two referees and Dr. Nancy Pfenning for their constructive comments. We are grateful to the Cache County Study Steering Committee for providing the dementia dataset. This research was supported by the National Science Foundation grants DMS-0906449 and DMS-1207711 to Cheng.
References
1. Hougaard P. Analysis of multivariate survival data. New York: Springer, 2000.10.1007/978-1-4612-1304-8Search in Google Scholar
2. Pruitt RC. Identifiability of bivariate survival curves from censored data. J Am Stat Assoc 1993;88:573–9.10.1080/01621459.1993.10476309Search in Google Scholar
3. Tsiatis A. A nonidentifiability aspect of the problem of competing risks. Proc Natl Acad Sci 1975;72:20–2.10.1073/pnas.72.1.20Search in Google Scholar PubMed PubMed Central
4. Bandeen-Roche K, Liang K. Modelling multivariate failure time associations in the presence of a competing risk. Biometrika 2002;89:299–314.10.1093/biomet/89.2.299Search in Google Scholar
5. Oakes D. Bivariate survival models induced by frailties. J Am Stat Assoc 1989;84:487–93.10.1080/01621459.1989.10478795Search in Google Scholar
6. Bandeen-Roche K, Ning J. Nonparametric estimation of bivariate failure time associations in the presence of a competing risk. Biometrika 2008;95:221–32.10.1093/biomet/asm091Search in Google Scholar PubMed PubMed Central
7. Cheng Y, Fine JP. Nonparametric estimation of cause-specific cross hazard ratio with bivariate competing risks data. Biometrika 2008;95:233–40.10.1093/biomet/asm089Search in Google Scholar
8. Shih JH, Albert PS. Modeling familial association of ages at onset of disease in the presence of competing risk. Biometrics 2010;66:1012–923.10.1111/j.1541-0420.2009.01372.xSearch in Google Scholar PubMed PubMed Central
9. Shih JH, Louis TA. Inferences on the association parameter in copula models for bivariate survival data. Biometrics 1995;51:1384–99.10.2307/2533269Search in Google Scholar
10. Cheng Y, Fine JP, Bandeen-Roche K. Association analyses of clustered competing risks data via cross hazard ratio. Biostatistics 2010;11:82–92.10.1093/biostatistics/kxp039Search in Google Scholar PubMed PubMed Central
11. Gorfine M, Hsu L. Frailty-based competing risks model for multivariate survival data. Biometrics 2011;67:415–26.10.1111/j.1541-0420.2010.01470.xSearch in Google Scholar PubMed PubMed Central
12. Katki HA, Blackford A, Chen S, Parmigiani G. Multiple diseases in carrier probability estimation: accounting for surviving all cancers other than breast and ovary in BRCAPRO. Stat Med 2008;27:4532–48.10.1002/sim.3302Search in Google Scholar PubMed PubMed Central
13. Clayton DG. A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 1978;65:141–52.10.1093/biomet/65.1.141Search in Google Scholar
14. Oakes D. A model for association in bivariate survival data. J R Stat Soc Ser B Methodological 1982;44:414–22.10.1111/j.2517-6161.1982.tb01222.xSearch in Google Scholar
15. Kosorok MR. Introduction to empirical processes and semiparametric inference. New York: Springer, 2007.10.1007/978-0-387-74978-5Search in Google Scholar
16. van der Vaart AW. Asymptotic statistics.Cambridge: Cambridge University Press, 1998.10.1017/CBO9780511802256Search in Google Scholar
17. Fine JP, Jiang H. On association in a copula with time transformations. Biometrika 2000;87:559–71.10.1093/biomet/87.3.559Search in Google Scholar
18. Lindsay BG. Composite likelihood methods. In: Prabhu NU, editor. Statistical inference from stochastic processes. American Mathematical Society, 1988:221–39.10.1090/conm/080/999014Search in Google Scholar
19. Ning J, Bandeen-Roche K. Estimation of time-dependent association for bivariate failure times in the presence of a competing risk. Biometrics 2014;70:10–20.10.1111/biom.12110Search in Google Scholar PubMed PubMed Central
20. Cheng Y, Fine JP, Kosorok MR. Nonparametric analysis of bivariate competing risks data. J Am Stat Assoc 2007;102:1407–16.10.1198/016214507000001157Search in Google Scholar
21. Cheng Y, Fine JP. Cumulative incidence association models for bivariate competing risks data. J R Stat Soc Ser B 2012. DOI: DOI: 10.1111/j.1467-9868.2011.01012.x.DOI: 10.1111/j.1467-9868.2011.01012.xSearch in Google Scholar
22. Cheng Y. Modeling cumulative incidences of dementia and dementia-free death using a novel three-parameter logistic function. Int J Biostat 2009. DOI: DOI: 10.2202/1557-4679.1183.DOI: 10.2202/1557-4679.1183Search in Google Scholar
23. Hu T, Nan B, Lin X, Robins JM. Time-dependent cross ratio estimation for bivariate failure times. Biometrika 2011;98:341–54.10.1093/biomet/asr005Search in Google Scholar PubMed PubMed Central
© 2014 by Walter de Gruyter Berlin / Boston
Articles in the same Issue
- Frontmatter
- Research Articles
- On the Correction of the Asymptotic Distribution of the Likelihood Ratio Statistic If Nuisance Parameters Are Estimated Based on an External Source
- Large Sample Bounds on the Survivor Average Causal Effect in the Presence of a Binary Covariate with Conditionally Ignorable Treatment Assignment
- Estimation of a Predictor’s Importance by Random Forests When There Is Missing Data: RISK Prediction in Liver Surgery using Laboratory Data
- Optimal Design Strategies for Sibling Studies with Binary Exposures
- Piecewise Cause-Specific Association Analyses of Multivariate Untied or Tied Competing Risks Data
- A Comparison of Exact Tests for Trend with Binary Endpoints Using Bartholomew’s Statistic
- Semiparametric Odds Rate Model for Modeling Short-Term and Long-Term Effects with Application to a Breast Cancer Genetic Study
- Classification in Postural Style Based on Stochastic Process Modeling
- Fuzzy Set Regression Method to Evaluate the Heterogeneity of Misclassifications in Disease Screening with Interval-Scaled Variables: Application to Osteoporosis (KCIS No. 26)
- Improving Dietary Exposure Models by Imputing Biomonitoring Data through ABC Methods
- Multiple Curve Comparisons with an Application to the Formation of the Dorsal Funiculus of Mutant Mice
Articles in the same Issue
- Frontmatter
- Research Articles
- On the Correction of the Asymptotic Distribution of the Likelihood Ratio Statistic If Nuisance Parameters Are Estimated Based on an External Source
- Large Sample Bounds on the Survivor Average Causal Effect in the Presence of a Binary Covariate with Conditionally Ignorable Treatment Assignment
- Estimation of a Predictor’s Importance by Random Forests When There Is Missing Data: RISK Prediction in Liver Surgery using Laboratory Data
- Optimal Design Strategies for Sibling Studies with Binary Exposures
- Piecewise Cause-Specific Association Analyses of Multivariate Untied or Tied Competing Risks Data
- A Comparison of Exact Tests for Trend with Binary Endpoints Using Bartholomew’s Statistic
- Semiparametric Odds Rate Model for Modeling Short-Term and Long-Term Effects with Application to a Breast Cancer Genetic Study
- Classification in Postural Style Based on Stochastic Process Modeling
- Fuzzy Set Regression Method to Evaluate the Heterogeneity of Misclassifications in Disease Screening with Interval-Scaled Variables: Application to Osteoporosis (KCIS No. 26)
- Improving Dietary Exposure Models by Imputing Biomonitoring Data through ABC Methods
- Multiple Curve Comparisons with an Application to the Formation of the Dorsal Funiculus of Mutant Mice