Group MCP for Cox Models with Time-Varying Coefficients

Xiaodong Xie; Shaozhi Zheng

doi:10.21078/JSSI-2016-476-13

Article Publicly Available

Group MCP for Cox Models with Time-Varying Coefficients

Xiaodong Xie and Shaozhi Zheng

Published/Copyright: October 25, 2016

Published by

Become an author with De Gruyter Brill

Author Information

From the journal Journal of Systems Science and Information Volume 4 Issue 5

Abstract

Cox’s proportional hazard models with time-varying coefficients have much flexibility for modeling the dynamic of covariate effects. Although many variable selection procedures have been developed for Coxs proportional hazard model, the study of such models with time-varying coefficients appears to be limited. The variable selection methods involving nonconvex penalty function, such as the minimax concave penalty (MCP), introduces numerical challenge, but they still have attractive theoretical properties and were indicated that they are worth to be alternatives of other competitive methods. We propose a group MCP method that uses B-spline basis to expand coefficients and maximizes the log partial likelihood with nonconvex penalties on regression coefficients in groups. A fast, iterative group shooting algorithm is carried out for model selection and estimation. Under some appropriate conditions, the simulated example shows that our method performs competitively with the group lasso method. By comparison, the group MCP method and group lasso select the same amount of important covariates, but group MCP method tends to outperform the group lasso method in selection of unimportant covariates.

Keywords: group MCP; group lasso; varying coefficient; B-spline; Cox models

1 Introduction

Cox’s proportional hazard model, which was introduced by Cox^[1,2] has become the most common method to analyze time to event data. In classical setting where the number of covariates of covariates p is fixed and the sample size n is large, many variable selection techniques for linear regression models have been extended to Cox model such as best-subset selection, stepwise selection, asymptotic procedures based on score tests, Wald tests, other approximate chi-square testing procedures, bootstrap procedures^[3] and Bayesian variable selection ^[4,5]. However, the theoretical properties of these methods are generally unknown^[6]. Recently, a number of regu larization methods have been proposed for Cox model with time-independent coefficients. The lasso, proposed for simultaneous coefficient estimation and variable estimation^[7] , was extended to Cox models^[8]. Fan and Li ^[6] extended the smoothly clipped absolute deviation (SCAD) penalty^[9] to Cox models. Zhang and Lu^[10] proposed the adaptive lasso with the adaptively weighted penalty L₁ on each coefficient. For Cox models with time-varying coefficients^[11], pro viding more flexibility and challenge than the parametric linear models, the literature appears to be limited. Lin and Zhang^[12] proposed a component selection and smoothing operator in the frame of smoothing spline analysis of variance, and Leng and Zhang^[13] extended this approach to varying coefficients Cox models. Huang, Horowitz, and Wei^[14] proposed the adaptive group lasso method which approximated additive components with B-spline expansions and selected nonzero components by selecting the groups coefficients in the expansion for nonparametric additive models. This adaptive group lasso method, which not only selects important variables but also selects between time-independent and time-varying specifications, was later extended for Cox models with time-varying coefficients by Yan and Huang^[15].

Recent technological advances have made it possible to collect a huge amount of covari ate information. However, due to the high dimensional space of the predictors, the standard maximum Cox partial likelihood method cannot be applied directly to obtain the parameter estimator. In the high-dimensional and low-sample size setting for Cox models, Li and Luan^[16] investigated the L₂ penalized estimation which used the kernel tricks to reduce the computation to just involve the inversion of matrix of the sample size. Li and Gui^[17] proposed to use the LARS algorithm to obtain the solution for the Cox model with L₁ penalty. Tibshirani^[18] used the univariate shrinkage method which assumed the independence of the covariates in each risk set and the partial likelihood factors into a product, and then the variables are entered into the model based on the size of their Cox score statistics. Bradic, Fan and Jiang^[19] proposed the penalized partial likelihood approach employing a class of folded-concave penalties to the Cox parametric relative risk model for non-polynomial dimensional date. Fan, Feng and Wu^[20] extended the sure independence screening (SIS) and iterative SIS (ISIS) for Cox model and have shown its encouraging performance.

In this paper, we study the Cox models with time-varying coefficients, and then apply the group minimax concave penalty (group MCP)^[21] method that uses smooth function for coefficients as well as maximizes the log partial likelihood with nonconvex penalties on regression coefficients^[22] Although the variable selection methods involving nonconvex penalty functions, such as SCAD^[9] and MCP^[23], may introduce numerical challenges in fitting these models, they are worthwhile alternatives to the lasso in many applications and have attractive theoretical properties^[24]. Under some appropriate conditions, Yang, Huang and Zhou^[21] showed that the group MCP has the oracle selection property that it correctly selects important variables with probability converging to one. Furthermore, these concave can be applied to high-dimensional problems. In this paper, each time-varying coefficient is expanded over a B-spline basis, and then characterized by a set of bases coefficients that are further treated as a group. We select significant variables by applying the group MCP approach over these groups of bases coefficients^[25] with a fast, iterative group shooting algorithm.

The rest of the paper is organized as follows. Section 2 proposes a group MCP method with penalized partial likelihood based on B-splines. Computational details of the proposed model selection procedure are described in Section 3. The variance estimation for estimated coefficients are also presented in Section 3. Section 4 gives numerical comparisons among the group MCP and group Lasso. Section 5 gives the discussion.

2 Group MCP with B-Splines

Suppose a random sample of n individuals is chosen. Let Ti∗ and C_i be the failure time and censoring time of subject i, i = 1, 2, · · · , n, respectively. Define observed time Ti=min{Ti∗,Ci} and censoring indicators δi=I(Ti∗≤Ci)i = 1,2, ··· , n. Let X_i = (X_i₁, X_i₂, · · · , X_ip)^T be the vector of covariate for subject i. Assume that Ti∗ and C_i are conditionally independent given X_i, and the censoring mechanism is uninformative. Our data set are independently and identically random samples that consists of the triplets {T_i, δ_i, X_i}, i = 1, 2, · · · , n.

The Cox model with time-varying coefficients is

(1)h(t|Xi)=h0(t)exp⁡[XiTβ(t)],

where h₀is an unspecified baseline function, and β(t) is p × 1 vector of time-varying coefficients. Let (B₁(t), B₂(t), · · · , B_q(t))^Tbe a set of B-spline basis of q degrees of freedom without intercept on a predetermined time interval [0, τ]. This B-spline basis can be obtained from function bs in package splines from base R^[26]. In our simulation and analysis, we used function bs with quadratic splines with q degrees of freedom (df = q), with equally spaced interior knots, and intercept = FALSE. Further assume that β(t)is expanded by the B-spline basis, that is, β(t) = ΘF(t), where F(t)= (B₁(t), B₂(t), · · · , B_q(t))^Tand Θ is a p × q matrix of parameters to be estimated. Therefore, each time-varying coefficient β_j(t) = Θ_jF(t), j = 1, 2, ··· , p, is determined by the jthrow of parameter matrix Θ.

For simplicity, assume that there are no ties in the observed failure times. Present of ties we may use the technique in Breslow^[27]. Let θ = vech(Θ), the vectorization of Θ by row. Let R_i = {j : T_j < T_i}be the risk set at time T_i. The log partial likelihood function is then given

(2)ln⁡(θ)=∑i=1nδi[XiTΘF(Ti)−log⁡{∑j∈Riexp⁡(XjTΘF(Ti))}].

Following Tibshirani^[8] and Fan and Li^[6], we estimate θ by minimizing the negative penalized log partial likelihood function

(3)Qλ,γ(θ)=−1nln⁡(θ)+P(θ;λ,γ),

where Ρ (θ; λ, γ) is a any penalty function that penalizes coefficient estimates in groups with a tuning penalty parameter λ^[25] and a regularization parameter γ.

To select significant variables, we should partition Θ first. A straightforward and useful way is to put each row in Θ into a single group, which leads to penalty function

(4)P(θ;λ,γ)=∑j=1pρλ,γ(∥Θj∥).

The penalty function ρ is applied to a penalty function for the group of parameters. In generally, any penalty function that works well for individual variable selection can be considered to be extended to this group penalty function. For instance, the resulting criterion is the group lasso^[25] if we use the L₂ penalty, while it comes to be group MCP^[21] if MCP penalty function is used.

The group lasso method^[25] was defined as the value of that minimizes

(5)ρλ,γ(∥Θj∥)=λ∥Θj∥.

The group MCP was defined as

(6)ρλ,γ(∥Θj∥)=λ∫0∥θj∥(1−x(λγ))+dx,

(7)ρλ,γ′(∥Θj∥)=λ(1−∥θj∥(λγ))+×θj∥θj∥

for λ < 0 and γ ≥ 1, and the (a)₊ = aI{a ≥ 0} is the nonnegative part of a. The regularization parameter γ controls the degree of concavity, meaning that, a large value of γ makes ρ less concave^[24]. Detail discussion of MCP can be found by Zhang^[23]. Note that the group lasso can be considered a special case of group MCP by taking γ → ∞. This penalty treats Θ_j as a whole group and each covariate coefficient is penalized via a single penalty. When Θ_j are nonzero, the corresponding covariate coefficient β_j is nonzero while β_j is zero if Θ_j are zero.

3 Computation

A convex objective is desirable because convexity ensures that the algorithm converges to the unique global minimum once it converges to a critical point of the objective function^[24] The penalty function of MCP is concave, which introduces a challenge that the algorithm are not guaranteed to converge a global minimum in general. However, Breheny and Huang^[24] proposed a convexity diagnostics measure and showed that it is possible for the objection function to be convex even though it contains a nonconvex penalty component. For example, the MCP object function is convex if γ > 1/c_*, where c_* is the minimum eigenvalue of n^-¹X^TX, and then the algorithm can converge to the global minimum.

3.1 Iterative Group Shooting Algorithm

In this paper, we minimize Q_λ_,γ(θ)by modified the iterative group shooting algorithm!¹⁵!. This algorithm approximate the partial likelihood function using the Newton-Raphson update through an iterative least squares procedure^[10] togroup penalty. Define G=−∇ln⁡(θ)=−∂ln⁡(θ)/∂θandH=−∇2ln⁡(θ)=−∂2ln⁡(θ)/∂θ∂θT. Consider the Cholesky decomposition X^TX = H. Set the pseudo response vector Y = (Χ^Τ)^-¹{Hθ – G}. And - ln(θ) can be approximated by the quadratic form 1/2(Y – Χθ)^Τ(Y – Χθ)by second-order Taylor expansion. Then the minimization of Q_λ_,γ(θ) is approximated by

(8)12(Y−Xθ)T(Y−Xθ)+∑j=1pρλ,γ(∥Θj∥).

According to Yuan and Lin^[25] , a necessary and sufficient condition for θ to be a solution to this penalized least square is

(9)−XjT(Y−Xθ)+ρλ,γ′(∥Θj∥)=0,θj≠0

(10)∥−XjT(Y−Xθ)∥≤λ,θj=0

LetSj=XjT(Y−Xθ−j), where θ−j=(θ1T,⋯,θj−1T,0T,θj+1T,⋯,θpT)T, then the condition (9) is equal to

(11)Sj=(XjTXj+λ(1∥θj∥−1(λγ))+Ip)θj.

Slightly change the equation (11), we have the iteration

(12)θj(1)=(XjTXj+λ(1∥θj(0)∥−1(λγ))+Ip)−1Sj.

Note that it reduces to the closed-form solution of Yuan and Lin^[25] when XjTXj=Ip. For any fixed tuning penalty parameter, the complete algorithm is as follows.

1) Initialize with θ⁽⁰⁾.

2) Calculate G, H, X, Y and S_j based on the current value θ⁽⁰⁾, j =1, 2, · · · ,p.

3) For each j, obtain θj(1) from

θj(1)=(XjTXj+λ(1∥θj(0)∥−1(λγ))+Ip)−1Sj,∥Sj∥>λ,0,∥Sj∥≤λ.

4) Letθj(0)=θj(1) and go back to steps 2) and 3) until the convergence criterion is met.

Note that each coefficient is computed based on the most recent version of θj(0) in each iteration, so a coefficient stay at zero once it is shrunken to zero^[6]. As we mentioned earlier, although the penalty function is nonconvex, the algorithm will converge to a global minimum if γ > 1/c_*. This algorithm can also be considered as a special case of block coordinate descent method, so that it is guaranteed to converge to a local minimum^[28]. it converges quickly under a moderate tolerance in our simulation studies.

3.2 Variance Estimation and Parameter Tuning

When the algorithm converges, Fan and Li^[6] proposed standard error formulae based on their approximated solutions. Following their methods, when our algorithm converges, our group MCP solution can be estimated by

θ^(1)=θ^(0)−{∇2ln⁡(θ^(0))+∑(θ^(0);λ)}−1×{∇ln⁡(θ^(0))+U(θ^(0);λ)},

where

∑(θ^(0);λ)=diag{λ(1∥θ^1(0)∥−1(λγ))+I(p1),⋯,λ(1∥θ^p(0)∥−1(λγ))+I(pp)},

p_i is dimension of θ^i(0) and I(p_i)is identity matrix of dimension p_i, in our simulation, all p_i = q, and

U(θ^(0);λ)=diag{λ(1∥θ^1(0)∥−1(λγ))+diag(θ^1(0)),⋯,λ(1∥θ^p(0)∥−1(λγ))+diag(θ^p(0))}.

Using method similar to those in Fan and Li^[6], the covariance of θ^NZ the nonzero component of the group MCP estimator can be approximated by the sandwich formula cov^(θ^NZ)=Acov^(∇ln⁡(θ^NZ,0))A, where A={∇2ln⁡(θ^NZ,0)+∑(θ^NZ;λ)}−1, and then the covariance estimator of a nonzero coefficient is

cov^(β^j(t))=FT(t)cov^(Θ^j)F(t),

which can used to construct pointwise confidence intervals for a corresponding coefficient, and côv(Θ̂_j)is the variance estimator of Θ̂_j.

The MCP method contains two tuning parameters, λ and γ. In practice, we can search the best pair (λ, γ) over two-dimensional grids using some information criterion such as AIC or BIC. However, such an implementation can be computationally expensive. To determine

the regularization parameter γ, Zhang^[23] suggested that using γ=2(1−maxj≠k|xj′xk|/n) for standardized covariates. Breheny and Huang^[24] suggested γ = 3 is a reasonable choice. In our studies, we have experimented with different γ values and reached the similar result. Therefore, we set γ = 3 for simplicity. More importantly, γ = 3 meets the condition γ>1c∗, and then the algorithm can converge to the global minimum.

We use generalized crossvalidation^[29] to estimate the tuning parameter λ. Note that the minimize of (8) can be approximated by a ridge solution (H + λD)^-¹X^TY, where

D=diag{diag(1∥θ1∥−1(λγ))⋯,diag(1∥θp∥−1(λγ))}.

Therefore, the number of effective parameters in group MCP estimator can be approximated by p(λ)= tr{(H + λD)^-1H}, and then the generalized crossvalidation function is

GCV(λ)=−ln⁡(θ)n{1−p(λ)/n}2.

The optimal λ is chosen as the minimize of GCV function over a grid of λ values given a fixed γ

3.3 Likelihood Derivatives Evaluation

In our algorithm, an important issue is to calculate the derivatives of the log partial likelihood function which contains the information of all case in the risk set. Perperoglou et al.^[30] proposed a fast and efficient algorithm used a Newton-Raphson algorithm. Following their methods, the gradient vector is

∇ln⁡(θ)=∂ln⁡(θ)∂θ=∑i=1nδi(Xi−X¯i(Θ))⊗F(Ti),

where ⊗ is the Kronecker product and

X¯i(Θ)=∑j∈RiXjexp⁡(XjTΘF(Ti))∑j∈Riexp⁡(XjTΘF(Ti))

is the mean of covariate vectors X_j in risk set R_i weighted by exp (XjTΘF(Ti)). Similarly, the

Hessian matrix is ∇2ln⁡(θ)=∂2ln⁡(θ)∂θ∂θT=∑i=1nδiCi(Θ)⊗{F(Ti)FT(Ti)},, where is the C_i (Θ) covariate matrix of covariate vector X_j in risk set R_i weighted again by exp ⁡(XjTΘF(Ti)).

4 Numerical Studies

In this section, we compare the performance of our group MCP method and group lasso method under the Cox model with time-varying coefficients.

In our simulation design, we consider three factors: Sample size (100 and 150), number of covariates (10 and 20) and the censoring rates (20% and 30%).

Event times are generated from varying-coefficient Cox model with covariate vector X and coefficient β(t), and the nonzero components are β₂(t) = {l + cos(πt)}I(0 < t < 1), β₃(t)= {1 + 0.5sin(πt)} and β₈(t) = l, for t ∊ (0,2), the baseline hazard function is λ(t) = exp{sin(πt/2)}. That is, the second and the third ones have time-varying coefficients, the eighth one is a constant coefficient out of 10 or 20 covariates, and all the rest have coefficient zero. Note that β₂(t) diminished to zero when t ≥ 1, which makes model selection and estimation harder. Covariate vector X is generated from a multivariate normal distribution whose marginals are all N(0,0.5), and whose pairwise correlation coefficients are ρ^|j–k| for pair (j, k), we take ρ = 0.5, which mean that covariates have moderate correlation. Censoring times are generated from a mixture of uniform distribution over (0, 2) and a point mass at 2, with the mixing probability calibrated to yield desire censoring percentage c_p, this way was used by Yan and Huang^[15].

For each scenario, 100 datasets are generated. Given a simulated dataset, we use quadratic B-splines with 5 degrees of freedom, with equally spaced knots in time window (0, 2), for each covariate coefficient. This gives two equal distant interior knots in (0,2). Generalized cross validation is used to estimate the tuning parameter γ in the group lasso and group MCP method as described above.

To measure prediction accuracy, we report the average number of groups selected (NG), and the average mean squared error (MSE) over 100 runs in each scenario. The “correct” NGs are all 3 for group MCP and group lasso, respectively. Following Tibshirani^[8], the MSE is calculated as {β̂(t) − β(t)}^TV{β̂(t)— β(t)}, where V is the population covariance matrix of the covariates, and we report the average of pointwise MSE.

In our simulation study, both group MCP method and group lasso method work analogously in important covariates X₂, X₃and X₈. Due to a diminishing effect in covariate X₂, it is difficult to be selected, especially in low-sample size. It is easily found that group lasso select more frequently than group MCP method when choosing unimportant covariates. However, the group lasso method has smaller MSE relative to group MCP method.

Several observations such as the variable selection results, NG and MSE from 100 runs with 10 covariates can be obtained from Table 1. Covariates X₃and X₈are selected most of the times, and covariate X₂is selected less. When the sample size increases, both methods seem to perform better as expected. For example, in the scenario of sample size n = 100 and censoring rate c_p = 30%, the covariate X₂is selected almost half of the times for both methods. As sample size increase to 150, the covariate X₂is selected most of the times with smaller MSE and less overselection. The MSE of group MCP method in the scenario of the n = 100, c_p = 30% with 10 covariates is 0.526, and it is reduced to 0.349 in the scenario of n = 150, c_p = 30%, which means that the group MCP method selects important variables more accurately as the sample size increasing. On the other hand, the unimportant covariates are also selected less as a whole. As censoring rate increase, both methods appear to perform worse, especially in the small size setting. In the scenario of n = 100 and c_p = 20%, both methods select covariate X₂about three/four of the times, and they select it only half of the times for c_p = 30%. The covariates X₃and X₈ are also select less at the same time.

Table 1

Model selection results with 10 covariates of group MCP and group lasso

n	c_p	X_l	X₁	X₃	X₄	X₅	X₆	X₇	X₈	X₉	X₁₀	NG	MSE
100	0.2	3	74	99	5	2	3	13	100	9	3	3.11	0.421
		3	74	99	10	2	3	13	100	11	4	3.19	0.405
	0.3	0	53	98	6	3	2	10	99	6	3	2.8	0.526
		0	53	98	6	4	2	12	99	6	5	2.85	0.554
150	0.2	4	100	100	3	2	1	3	100	2	1	3.16	0.331
		5	100	100	4	3	2	5	100	4	2	3.25	0.286
	0.3	4	99	100	3	2	1	3	100	2	1	3.15	0.349
		5	99	100	5	2	3	3	100	4	2	3.23	0.295

It is worth noting that we observed such a phenomenon: On the one hand the MSE in small sample size with a high censoring rate is significantly higher than that in other settings, on the other hand the group MCP has a smaller MSE than the group lasso method. For example, with regard to mean squared error, although the group lasso perform better than the group MCP in most situations, in the scenario of n = 100, c_p = 30%, the MSE of the group lasso method is 0.554, While the MSE of the group MCP method is 0.526. That is, the group MCP method performs better than group lasso method in such low sample size with high censoring rate setting. We conduct more simulation in higher censoring rate setting to study the performance of these two methods, Table 2 summarizes the results. Both scenarios indicate that the group MCP method and group lasso method done poorly, but the group MCP method performs better than the competing method. When censoring rate increases, it is difficult to reflect the complete information in such setting. Moreover, the number of parameters to be estimated in Θ is large, so a large sample size is necessary.

Table 2

Model selection results with 10 covariates in high censoring rate setting

n	C_p	X₁	X₂	X₃	X₄	X₆	X₇	X₈	X₉	X₁₀	NG	MSE
100	0.5	2	18	86	8	0	8	82	20	4	2.88	0.951
		3	16	86	6	2	6	83	14	5	2.21	0.978
	0.7	1	1	32	2	1	4	25	2	0	0.68	1.332
		1	1	32	2	1	4	25	2	9	0.68	1.337

Figure 1 shows the 100 estimated coefficient curves overlaid with the true curves for the scenario of n = 100 with 10 covariates, c_p = 20% and c_p = 30%, respectively. To compare the group MCP method and group lasso method in the performance of recovering of the coefficient, we also plot the average of the 100 estimated coefficients curves and their pointwise 95% confidence intervals constructed using the method in Section 3.2. In these scenarios, both method perform well and the estimated curves are closer to the true curves for all three coefficients, especially for coefficients β₃ and β₈. For coefficient β₂, the estimator curves are also close to the true curves in their nonzero part, and when they diminishes to zero, the estimator curves are slightly diverge. It is not a surprise because it remains zero afterward which makes model estimation harder. The group MCP method perform well that the estimated curves for coefficient β₂ are tighter around the true curves compared with the group lasso method. As we observe in Figure 1, the standard errors seem to underestimate the true variation. Overall, the group MCP can fit better than the group Lasso.

Table 3 shows the model selection results from 100 runs with 20 covariates and the conclusion are similar to those in Table 1. However, in low sample size with a high censoring rate setting, the group MCP method seems to select more covariate X₂and less covariates X₃and X₈ compared to the same setting with 10 covariates.

Table 3

Model selection results with 20 covariates of group MCP and group lasso

n	c_p	X₁	X₂	X₃	X₄	X₅	X₆	X₇	X₈	X₉	Χ₁₀	X₁₁
100	0.2	3	75	100	5	2	1	6	100	9	3	2
		7	75	100	7	2	1	7	100	9	3	2
	0.3	6	60	97	7	3	3	10	95	6	1	1
		6	60	97	9	3	3	10	95	9	1	1
150	0.2	2	100	100	5	1	1	3	100	2	1	1
		3	100	100	7	2	1	4	100	3	1	1
	0.3	3	96	100	5	2	1	5	100	3	1	1
		4	96	100	6	1	2	5	100	4	1	1
n	c_p	X_l2	X_l3	X_l4	X₁₅	X_l6	X₁₇	X_l8	X_l9	X₂₀	NG	MSE
100	0.2	2	3	1	0	0	0	0	0	0	3.12	0.395
		2	2	1	1	0	0	0	0	1	3.2	0.381
	0.3	1	1	0	1	0	0	0	0	0	2.92	0.535
		1	1	0	1	0	0	0	0	0	2.97	0.565
150	0.2	0	1	0	0	0	0	0	0	0	3.17	0.314
		1	1	0	0	0	0	0	0	0	3.24	0.274
	0.3	0	1	0	0	0	0	0	0	0	3.18	0.345
		1	1	0	0	0	0	0	0	0	3.22	0.3

Figure 2 shows the 100 estimated coefficient curves overlaid with the true curves for the scenario of n = 100 with 20 covariates, c_p = 20% and c_p = 30%, respectively. The conclusion are also similar to those in Figure 1 that the group MCP can fit better than the group Lasso.

Figure 1

Estimated curves (gray) of the three nonzero coefficients from 100 runs in the scenario of sample size n = 100 with 10 covariates, c_p = 20% and c_p = 30%, respectively. The dark lines are the true curves. The dashed lines are the average of 100 estimated coefficients curves. The dotted lines are the pointwise 95% confidence intervals

Figure 2

Estimated curves (gray) of the three nonzero coefficients from 100 runs in the scenario of sample size n = 100 with 20 covariates, c_p = 20% and c_p = 30%, respectively. The dark lines are the true curves. The dashed lines are the average of 100 estimated coefficients curves. The dotted lines are the pointwise 95% confidence intervals

These simulation results indicate that the group MCP and group lasso method have good selection and estimation performance in the simulated model. Overall, the performances of these two methods are similar. However, the group MCP method selects fewer unimportant covariates than the group lasso method, yet the MSE of the group Lasso is smaller. This was also observed in the study of Yang, Huang and Zhou^[21]. In the low sample size with high censoring rate setting, the group MCP have better selection and estimation performance in the simulation study. On the whole, the group MCP can improve the selection and estimation results over the group Lasso.

5 Discussion

In this article, we studied the model selection and estimation of the group Lasso and group MCP in Cox models with time-varying coefficients on right-censored failure times. In our study, our nonparametric coefficients are fitted with smooth functions which are expanded using B-spline basis. Also, a set of bases coefficients further are treated as a group and applied the group MCP and group lasso approach. Our simulation studies indicate that the group MCP outperforms the group Lasso in terms of selection and estimation.

Although the selection consistency and asymptotic oracle properties of group MCP have been proofed by Yang, Huang and Zhou^[21] in high-dimensional varying coefficient models, a rigorous proof of our group MCP estimator’s properties in Cox model is not given in this article. This is a challenging problem when extend this approach to Cox models with time-varying coefficients. Research to the theoretical properties of group MCP to such model is necessary.

The methods in this article raise several questions. As shown in Figure 1 and Figure 2, although the coefficient β₈ is a time-independent coefficient, it is selected to have a time-varying effect. This may be explained by the fact that we put each row in Θ into a single group so that both covariates coefficients are treated as time varying coefficients. Yan and Huang^[15] penalized a time-independent part and a time-varying part separately for each coefficient, and then it can select significant variables and the temporal dynamics of their effects. This is an efficient and desirable way that can be considered to solve our problem.

In principle, the group MCP method can be applied in high-dimensional Cox models with time-varying or time-independent coefficients. However, the iterative group shooting algorithm involves matrix calculation and the computational cost of this algorithm in high-dimensional setting is relatively high. Therefore, more work is needed to study such as the theoretical properties and computational procedures in these more complicated models.

References

[1] Cox D R. Regression models and life-tables (with Discussion). Journal of the Royal Statistical Society Series B, 1972, 34(2): 187-220.Search in Google Scholar

[2] Cox D R. Partial likelihood. Biometrika, 1975, 62(2): 269-76.10.1093/biomet/62.2.269Search in Google Scholar

[3] Sauerbrei W, Schumacher M. A bootstrap resampling procedure for model building: Application to the Cox regression model. Statistics in Medicine, 1992, 11(16): 2093-2109.10.1002/sim.4780111607Search in Google Scholar

[4] Faraggi D, Simon R. Bayesian variable selection method for censored survival data. Biometrics, 1998, 54(4): 1475-1485.10.2307/2533672Search in Google Scholar

[5] Ibrahim J G, Chen M H, Maceachern S N. Bayesian variable selection for proportional hazards models. The Canadian Journal of Statistics, 1999, 27(4): 701-717.10.2307/3316126Search in Google Scholar

[6] Fan J Q, Li R Z. Variable selection for Coxs proportional hazards model and frailty model. The Annals of Statistics, 2002, 30(1): 74-99.Search in Google Scholar

[7] Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 1996, 58(1): 267-288.10.1111/j.2517-6161.1996.tb02080.xSearch in Google Scholar

[8] Tibshirani R. The lasso method for variable selection in the Cox model. Statistics in Medicine, 1997, 16(4): 385-395.10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3Search in Google Scholar

[9] Fan J Q, Li R Z. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 2001, 96(456): 1348-1360.10.1198/016214501753382273Search in Google Scholar

[10] Zhang H H, Lu W B. Adaptive lasso for Coxs proportional hazards model. Biometrika, 2007, 94(3): 691-703.10.1093/biomet/asm037Search in Google Scholar

[11] Hastie T, Tibshirani R. Varying-coefficient models. Journal of the Royal Statistical Society Series B, 1993, 55(4): 757-796.10.1111/j.2517-6161.1993.tb01939.xSearch in Google Scholar

[12] Lin Y, Zhang H H. Component selection and smoothing in multivariate nonparametric regression. Annals of Statistics, 2006, 34(5): 2272-2297.10.1214/009053606000000722Search in Google Scholar

[13] Leng C L, Zhang H H. Model selection in nonparametric hazard regression. Journal of Nonparametric Statistics, 2006, 18(7-8): 417-429.10.1080/10485250601027042Search in Google Scholar

[14] Huang J, Horowitz J L, Wei F R. Variable selection in nonparametric additive models. Annals of Statistics, 2010, 38(4): 2282-2313.10.1214/09-AOS781Search in Google Scholar PubMed PubMed Central

[15] Yan J, Huang J. Model selection for Cox models with time-varying coefficients. Biometrics, 2012, 68(2): 419-428.10.1111/j.1541-0420.2011.01692.xSearch in Google Scholar PubMed PubMed Central

[16] Li H Z, Luan Y H. Kernel Cox regression models for linking gene expression profiles to censored survival data. Pacific Symposium of Biocomputing, 2003, 8: 65-76.10.1142/9789812776303_0007Search in Google Scholar

[17] Li H Z, Gui J. Partial Cox Regression Analysis for High-Dimensional Microarray Gene Expression Data. Bioinformatics, 2004, 20(1): 208-215.10.1093/bioinformatics/bth900Search in Google Scholar PubMed

[18] Tibshirani R. Univariate shrinkage in the Cox model for high dimensional data. Statistical Applications in Genetics and Molecular Biology, 2009, 8(1): 3498-3528.10.2202/1544-6115.1438Search in Google Scholar PubMed PubMed Central

[19] Bradic J, Fan J Q, Jiang J C. Regularization for Coxs proportional hazards model with NP-dimensionality. Annals of Statistics, 2011, 39(6): 3092-3120.10.1214/11-AOS911Search in Google Scholar PubMed PubMed Central

[20] Fan J Q, Feng Y, Wu Y C. High-dimensional variable selection for Coxs proportional hazards model. Institute of Mathematical Statistics, 2010, 6: 70-86.10.1214/10-IMSCOLL606Search in Google Scholar

[21] Yang G R, Huang J, Zhou Y. Concave group methods for variable selection and estimation in high-dimensional varying coeffcient models. Science China Mathematics, 2014, 57(10): 2073-2090.10.1007/s11425-014-4842-ySearch in Google Scholar

[22] Zucker D M, Karr A F. Nonparametric survival analysis with time-dependent covariate effects: A penalized partial likelihood approach. The Annals of Statistics, 1990, 18(1): 329-353.10.1214/aos/1176347503Search in Google Scholar

[23] Zhang C H. Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 2010, 38(2): 894-942.10.1214/09-AOS729Search in Google Scholar

[24] Breheny P, Huang J. Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Annals of Applied Statistics, 2011, 5(1): 232-253.10.1214/10-AOAS388Search in Google Scholar PubMed PubMed Central

[25] Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B, 2006, 68(1): 49-67.10.1111/j.1467-9868.2005.00532.xSearch in Google Scholar

[26] R Development Core Team. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing, 2011.Search in Google Scholar

[27] Breslow N. Covariance analysis of censored survival data. Biometrics, 1974, 30(1): 89-99.10.2307/2529620Search in Google Scholar

[28] Tseng P, Yun S. Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization. Journal of Optimization Theory and Applications, 2009, 140(3): 513-535.10.1007/s10957-008-9458-3Search in Google Scholar

[29] Craven P, Wahba G. Smoothing noisy data with spline functions. Numerische Mathematik, 1978, 31(4): 377-403.10.1007/BF01404567Search in Google Scholar

[30] Perperoglou A, le Cessie S, van Houwelingen H C. A fast routine for fitting Cox models with time varying effects of the covariates. Biomedicine, 2006, 81(2): 154-161.10.1016/j.cmpb.2005.11.006Search in Google Scholar PubMed

Received: 2016-1-19

Accepted: 2016-4-3

Published Online: 2016-10-25

Articles in the same Issue

https://doi.org/10.21078/JSSI-2016-476-13

Keywords for this article

group MCP; group lasso; varying coefficient; B-spline; Cox models