Home Multiple Comparisons Using Composite Likelihood in Clustered Data
Article Publicly Available

Multiple Comparisons Using Composite Likelihood in Clustered Data

  • Mahdis Azadbakhsh , Xin Gao EMAIL logo and Hanna Jankowski
Published/Copyright: December 6, 2016

Abstract

We study the problem of multiple hypothesis testing for correlated clustered data. As the existing multiple comparison procedures based on maximum likelihood estimation could be computationally intensive, we propose to construct multiple comparison procedures based on composite likelihood method. The new test statistics account for the correlation structure within the clusters and are computationally convenient to compute. Simulation studies show that the composite likelihood based procedures maintain good control of the familywise type I error rate in the presence of intra-cluster correlation, whereas ignoring the correlation leads to erratic performance.

1 Introduction

The prevalence of depression in seniors estimated by the World Health Organization varies between 10% to 20% [1]. Understanding the relationship between depression and other health factors can help prevent the disease and alleviate the symptoms. The health and retirement study (HRS) conducted by the University of Michigan is a longitudinal study measuring various aspects of health, retirement and aging, as well as depression status. In this study, seniors were measured every two years from 1994 to 2012. The objective of our analysis is to estimate the effect of several health factors known to be associated with depression status and compare the effect sizes of different factors. Multiple comparisons on the effect sizes will clarify the relative importance of factors on the disease. For example, sleeplessness and smoking both contribute to the occurrence rate of depression. One might question if one factor is more important than the other for development of the disease. Therefore, to fully understand the effects of the health factors, we perform multiple comparisons on their effect sizes. It is important to note that we only observe associations but do not establish causal relationships of health factors with the disease.

The repeated binary measurements of depression status observed in this data set are correlated within individuals. These repeated measurements can be viewed as “clustered” data since they are recorded from the same experimental unit multiple times. Examples of clustered data arise in many other situations, including measurements coming from siblings or same pedigrees, or measurements taken in close proximity to each other in spatial data. Ignoring existing correlations within clusters leads to invalid individual or multiple inferences.

When performing multiple comparisons in clustered data, one should therefore take into account the correlation structure within the clusters. However, full likelihood analyses on such data often encounter computational challenges. Composite likelihood methods are extensions of the likelihood method that project high-dimensional likelihood functions to low-dimensional ones [2, 3]. This dimension reduction is achieved by compounding valid marginal or conditional densities. It has been shown that, under regularity conditions, the composite likelihood estimator has desirable properties, such as consistency and asymptotic normality [2, 3, 4, 5]. This makes it an appealing alternative in inferential procedures. Furthermore, composite likelihood is more computationally convenient than full likelihood at a cost of some loss of efficiency. The magnitude of this loss depends on the dimension of the multivariate vector and its dependency structure. Composite likelihood methodology has been applied to numerous statistical problems [6, 7, 8], however, the potential of composite likelihood in multiple testing has yet to be explored.

Multiple testing procedures have been developed to control the overall type I error rate when the number of tests is greater than one [9, 10]. Family-wise error rate (FWER) is defined as the probability of falsely rejecting at least one true null hypothesis. The classical Bonferroni method is the simplest procedure to adjust the FWER. It assumes maximum negative correlation between test statistics, and the resulting FWER does not exceed the significance level in any setting given the per comparison error is controlled correctly. However, it is well-known to be very conservative. The Dunn-Sidák procedure [11] uses a slightly less conservative p-value threshold for each comparison. This procedure assumes uncorrelated test statistics that explains the power increase compared to the Bonferroni method. The procedure provides the exact control of the FWER under independence, and it is conservative under positive dependence and it is liberal under negative dependence. For example, testing Ti and Ti simultaneously is not possible with Sidák. Schéffe [12] established a method for testing all possible linear comparisons among a set of normally distributed variables, which tends to be over-conservative for a finite family of multiple comparisons. Scheffé controls all possible linear combinations, which makes a comparison with the other methods quite difficult, as they are testing different sets of hypotheses, where a smaller set of hypotheses of interest will in most cases result in a higher power. Several stage-wise procedures have also been proposed to improve the power. All of these methods present some shortcuts to avoid testing all intersection hypotheses in closed testing. Simes [13] modified the Bonferroni procedure based on ordered p-values. Holm [14] proposed a multi-stage procedure based on closed testing procedure. The method adjusts the FWER in each step using the number of remaining null hypotheses. Hommel [15] suggested a stage-wise rejective multiple test based on Simes inequality. All of these methods are less conservative and therefore more powerful than the Bonferroni method. However, it is difficult to construct simultaneous confidence intervals based on a stage-wise approach. As another alternative, Hothorn et al. [16] proposed to use equi-coordinate quantiles of the multivariate normal and multivariate t distribution to perform multiple comparisons in parametric methods. This corresponds to the control of the FWER based on the maximum of test statistics. The MNQ method allows for any correlation between test statistics, but needs the assumption of an asymptotic multivariate normal distribution for the test statistics, which is most of the time validated by the multivariate central limit theorem. This method offers sharp control of the FWER. The approach has been employed in many parametric and nonparametric settings to provide both multiple inferences and simultaneous intervals [17, 18].

In this paper, we propose a new procedure where multiple testing methods are combined with composite likelihood inference. This is motivated by the multiple testing problems for which full likelihood inference can be rather computational complex. We show that the composite likelihood test statistics for multiple hypotheses offer strong control of the FWER with the use of closed testing procedures. We explore in detail different multivariate models for correlated clustered data including the multivariate normal, multivariate probit, and quadratic exponential models to illustrate our multiple comparisons approach. Among these methods, the multivariate normal quantile threshold appears to have the best control of the FWER in most simulation settings.

The structure of this paper is as follows: In Section 2, we develop our composite likelihood based test statistics for multiple inferences and establish their asymptotic properties. In Section 3, we provide details on how to apply the general approach for several multivariate models. In Section 4, we conduct simulation studies to evaluate empirical performance of the proposed method. Finally, we analyze the depression data set to demonstrate the practical utility of the method.

2 Multiple comparisons procedures based on composite likelihood

Let {f(Y;θ),θΘ}, where θ=(θ1,,θp)T, be a parametric statistical model with parameter space ΘRp. Let Y=(y1T,,ynT) denote the response variables, where yi=(yi1,,yimi)T is the vector of observations from cluster i, i=1,,n from a study population. It is assumed that observations across different clusters are independent, whereas observations within the same cluster may be dependent. Note that overall sample size is i=1nmi. We assume that the cluster size, mi, is uniformly bounded.

Let

C=Cc×p=(C(1),C(2),,C(c))T

denote the contrast matrix. A family of c linear combinations of the parameters can then be specified by Cθ. Let H0l denote the hypothesis that C(l)θ=0, for l=1,,c. We focus here on jointly testing the family of hypotheses H0l,l=1,,c. The multiple testing procedure has weak control of the FWER if the FWER α when all of the null hypotheses are true, whereas one has strong control of the FWER if the FWER α under any combinations of null hypotheses and alternative hypotheses. Composite likelihood was proposed as a pseudo likelihood inference method that has attracted much attention in recent years [3, 4, 5, 19]. Two popular examples of composite likelihood are the “univariate marginal likelihood” and “univariate conditional likelihood”. In the former, we have the composite likelihood functionCL(θ;Y)=i=1nj=1mif(yij),

where any dependence structure is ignored. In the latter, we take CL(θ;Y)=i,jf(yij|yi(j))=i,jf(yi)/f(yi(j)), where yi(j) denotes the sub-vector of yi with its j th element removed. We consider both types of approaches in our examples. In general, composite likelihood is a compounded form of marginal or conditional likelihoods, which is often easier to maximize than full likelihood. For Ak{(i,j):j=1,,mi,i=1,,n}, let YAk={Yij,(i,j)Ak} denote a subset of the data, where k=1,,K. The composite likelihood function is then defined as

CL(θ;Y)=k=1Kf(yAk;θ)wAk,

where f(yAk;θ) is the density for the subset vector yAk, and wAk are some suitably chosen weights. In practice, the type of composite likelihood should be chosen so that the resulting composite score equation is consistent for the parameters, and the computation complexity is sufficiently manageable.

The maximum composite likelihood estimate (MCLE) is defined as

θˆnc=argmaxθΘCL(θ;Y).

Xu and Reid [20] give precise conditions under which θˆnc is consistent for θ. Under appropriate assumptions, n(θˆncθ) is also asymptotically normally distributed with mean zero and limiting variance given by the inverse of the the Godambe information matrix [3, 21], where

(1)G1(θ)=H1(θ)J(θ)H1(θ),

with H(θ)=limnE(cl(2)(θ;Y))/n and J(θ)=limnvar(cl(1)(θ;Y))/n. Here, cl(1) is the vector of first derivatives and cl(2) is the matrix of second order derivatives of cl(θ;Y)=logCL(θ;Y) with respect to θ. The matrix H(θ) can be estimated as the negative Hessian matrix evaluated at the maximum composite likelihood estimator, whereas the matrix J(θ) can be estimated as the sample covariance matrix of the composite score vectors. Both estimators, which we denote as Hˆn and Jˆn, are consistent [21].

Consider the hypothesis test on a family of linear combinations of the parameters: {H0:Cθ=0}. Denote by Γ=G1(θ) the inverse Godambe information matrix, and let Γˆn denote the consistent estimator of Γ, where Γˆn=Hˆn1JˆnHˆn1. We propose the following test statistics for our hypothesis test

(2)Tl,n=C(l)TθˆncC(l)TΓˆnC(l)/n,l=1,,c.

The limiting distribution of Tn=(T1,n,,Tl,n)T is multivariate normal MVN(0,V), where

(3)V=diag(D)1/2Ddiag(D)1/2,D=CG1(θ)CT.

Furthermore, since Vi,i=1, the marginal asymptotic distribution of each individual Tl,n is standard normal. In practice, we estimate V by plugging Γˆn as a consistent estimator of G1(θ) into eq. (3). This results in a consistent estimator of V.

We illustrate a few multiple comparison procedures as follows.

  1. The Bonferroni procedure: The global intersection hypothesis l=1cH0l will be rejected if maxl|Tl,n|>Z1(α/2c), where Z1(α/2c) denotes the critical value for standard normal random variable with P(Z>Z1(α/2c))=α/2c. Each individual hypothesis H0l will be rejected if |Tl,n|>Z1(α/2c).

  2. The Holm’s procedure: For each H0l, evaluate the p-value pl=2P(Z>|Tl,n|). Order the p-values from the least to the greatest as p(1),,p(c) and the corresponding hypotheses are reordered as H(01),,H(0c). The global intersection hypothesis l=1cH0l will be rejected if p(1)α/c. Let k denote the smallest l so that p(l)>α/(cl+1). If k>1, then the individual hypotheses H(01),,H(0,k1) will be rejected.

  3. The MNQ procedure: The global intersection hypothesis l=1cH0l will be rejected if maxl|Tl,n|>Q1α,V, where Q1α,V denotes the equi-coordinate quantile for multivariate normal random vector Z=(Z1,,Zc) with ZN(0,V), and P(maxl|Zl|>Q1α,V)=α. Each individual hypothesis H0l will be rejected if |Tl,n|>Q1α,V.

As the variance estimators are consistent regardless of the configuration of the hypotheses, the multivariate distribution of any subset of the test statistics still follows a multivariate normal distribution. Therefore, if a closed testing procedure is applied on the test statistics, the procedure has a strong control of the FWER [22, 23].

It is worthy to point out that the test statistics we propose here are Wald-type statistics which are not invariant under re-parametrization. Under re-parametrization, the new statistics follow the same type of limiting distributions, but the values of the statistics are not the same. This is a standard limitation associated with Wald-type statistics.

The multivariate distribution of Tn can be approximated by a multivariate t distribution. The denominator C(l)TΓˆnC(l)/n has an asymptotic equivalent distribution as C(l)TH1JˆnH1C(l)/n based on Slutsky’ Theorem. Furthermore, Jˆn=(cl(1))Tcl(1) asymptotically follows a Wishart(J,n). This entails that asymptotically C(l)TH1JˆnH1C(l) follows σl2χn2, where σl2=C(l)TH1JH1C(l). Reformulate the test statistics as

Tl,n=C(l)Tθˆnc/σlC(l)TΓˆnC(l)/(nσl2),

where the numerator is asymptotically a multivariate normal MVN(0,V), and the denominator is asymptotically χn2/n. Therefore, the multivariate distribution of Tn can be approximated as a multivariate t(V,n), where V is the covariance matrix and n is the degrees of freedom.

The MNQ method also facilitates the construction of simultaneous confidence intervals. The simultaneous (1α)100% confidence interval for Cθ is

(4)(C(l)Tθˆnc±Qα,VC(l)TΓˆnC(l)/n).

In some applications, the collection of effect sizes are nonlinear monotone transformations of the parameters. For example, we obtain odds ratio from log odds ratio by applying the exponential function. Let G(.) denote a nonlinear monotone transformation. Then the simultaneous (1α)100% confidence interval for G[(C(l))Tθ],l=1,,c, is

(5)(G[C(l)Tθˆnc±Qα,VC(l)TΓˆnC(l)/n]).

3 Three multivariate models

To demonstrate the application of our methodology, we consider three specific multivariate distributions: multivariate normal, multivariate probit, and quadratic exponential distributions. In this Section, we give the details of the distribution and computation of the test statistics; Section 4 examines the behavior of the test statistics via simulations.

For the first two distributions, the composite likelihood is constructed as a product of univariate marginal likelihoods. We choose this type of composite likelihood as the univariate marginal likelihood of these two distributions follows the univariate exponential family for which the estimation of parameters is not difficult. For the quadratic exponential distribution, the composite likelihood is constructed as a conditional likelihood. For the quadratic exponential model the normalizing constant is computationally intensive to obtain, and in this composite likelihood the normalizing constant is canceled out, greatly simplifying the computational burden of estimation.

In order to include covariates into our modelling scheme, let Xi denote an mi×p matrix containing the values of p covariates for the mi individuals in the ith cluster and β=(β1,,βp)T denote the vector of regression coefficients. Let xij denote the jth row of the matrix Xi (this is the vector of covariates for individual j in cluster i).

3.1 Multivariate Gaussian distribution

Let {(yi,Xi),i=1,n}, denote the response and covariates arising from a multivariate normal model, withyi=Xiβ+ϵi,i=1,,n, and mi=m. We assume that ϵiNm(0,Σ) where Σ=(σij),i,j=1,,m, is an arbitrary covariance matrix. The univariate composite likelihood is thus equal to

(6)clβ=i=1nj=1m(12log(2πσjj)12σjj2(yijxijβ)2),

where the σjj’s are nuisance parameters. The Hessian matrix and variability matrix are, respectively, H(β)=n1i=1nXiTWXi and J(β)=n1i=1nXiTWΣWXi, with W=diag(Σ)1. To estimate the regression coefficients, we employ an iterative algorithm: Given the current estimate for the nuisance parameters σjj’s, we maximize the composite likelihood to obtain an estimate of βˆnc=(i=1nXiTWXi)1i=1nXiTWYi, and given a current estimate for β, we use the sample covariance matrix of residuals to estimate Σ. Based on the estimates βˆnc and Σˆ, we obtain estimates for H(β) and J(β) with W being replaced by its estimate Wˆ=diag(Σˆ).

3.2 Multivariate probit model

Let yi=Xiβ+ϵi with ϵiNm(0,Σ) and Σ=σR, where R is an m×m correlation matrix. The variables yi are the latent response variables, and their dichotomized version of the latent variable with yij=I(yij>0),j=1,,m yield the multivariate probit model. We therefore have that P(yij=1|Xi)=Φ(xijβ/σ) where Φ denotes the univariate standard normal cumulative distribution function. It follows that the parameters β and σ are not fully identifiable in the model, and we can only estimate the ratio β/σ. To simplify notation, σ is set equal to 1 in what follows. The univariate composite log-likelihood function of the probit model is then formulated as

cl(β;Y)=i=1nj=1m[yijlogΦxijβ+(1yij)log1Φxijβ].

Denoting μij=P(yij=1|Xi), and μi=(μi1,,μim)T, we have

cl(1)(β;Y)=i=1nμiβTΠi1(yiμi),

where Πi=diag(var(yi1),,var(yim)), and var(yij)=μij(1μij). This yields

H(β)=n1i=1nμiβTΠi1μiβ,and J(β)=n1i=1nμiβTΠi1cov(yi)Πi1μiβ.

To find the estimates βˆnc, we use the Newton-Raphson algorithm. Denote μˆin={μˆi1n,μˆi2n,,μˆimn}T, where μˆi=Φ(Xiβˆnc). Let Πˆin denote the estimator of Πi obtained by substituting μˆijn for μij. We estimate H(β) and J(β) as

Hˆn=n1i=1n(μiβ|βˆnc)TΠˆin1(μiβ|βˆnc)Jˆn=n1i=1n(μiβ|βˆnc)TΠˆin1covˆn(yi)Πˆin1(μiβ|βˆnc),

calculating the empirical variance as covˆn(yi)=(yiμˆin)(yiμˆin)T.

3.3 Quadratic exponential model

The quadratic exponential model is a popular tool used to model clustered binary data with intra-cluster interactions [6]. In this model, the binary observations take values yij{1,1} and the joint distribution is given by

(7)fY(yi)expj=1miμijyij+j<jwijjyijyij,

where μij is a parameter which describes the main effect of the measurements and wijj describes the association between pairs of measurements within the cluster yi. Independence corresponds to the case that wijj=0 and positive or negative correlation corresponds to wijj>0 or wijj<0, respectively. For simplicity, we consider the case that μij=μi and wijj=wi, noting that our methodology can be readily applied to the general scenario as well. Under this simplification, Molenberghs and Ryan [24], showed that the joint distribution can be equivalently written in terms of zi=j=1mi1(yij=1) (the number of successes in the i th cluster) as fY(yi)exp{μiziwizi(mizi)}, where wi=2wi and μi=2μi.

Specifying the normalizing constant in eq. (7) is computationally difficult, but also necessary for the full likelihood evaluation. It is therefore desirable to use an alternative approach, one which does not involve such an intensive calculation. Replacing the joint distribution with the conditional distributions leads to a conditional composite likelihood function cl(μ,w;Y)=i=1nj=1milogf(yij|{yij},jj), which does not require computation of the normalizing constant. We now define two conditional probabilities

pis=exp{μiwi(mi2zi+1)}1+exp{μiwi(mi2zi+1)},pif=exp{μi+wi(mi2zi1)}1+exp{μi+wi(mi2zi1)}.

Heuristically, pis is the conditional probability of one more success, given zi1 successes and mizi failures, while pif is the conditional probability of one more failure, given zi successes and mizi1 failures. Note that pif1pis, because of the term mi2zi±1. The composite likelihood can now be expressed as cl(μ,w;Y)=i=1nzilogpis+(mizi)logpif. This special form of the composite likelihood means that a logistic regression algorithm can be used to estimate the parameters. We model a covariate effect by using the linear model μi=Xiβ, with wi=w interpreted as an additional parameter. That is, for the parameter w, the value of the covariate is set to (mi2zi+1) when yij=1 and (mi2zi1) when yij=1. This allows us to obtain CMLE estimates of both β and w using iterative re-weighted least squares, commonly used to solve logistic regression maximization problems.To estimate the covariance of βˆnc, we computed Jˆn as the empirical variance of the score vector, plugging in estimates of μi,w throughout. The Hessian matrix Hˆn is estimated using the result from fitting the logistic model in R, see Geys et al. [6].

4 Simulation results

We evaluate the validity of our proposed approach on three different multivariate models from Section 3 using simulations. We randomly simulate covariates from independent normal distributions. We consider different values of the regression parameters and correlation structures. We simulate multivariate clustered data under varying cluster sizes and varying number of clusters. We perform multiple comparisons on the regression parameters using the proposed composite likelihood method. For the multivariate normal case, we also compare the proposed method with the likelihood approach.

We test two different global null hypotheses on the regression coefficients β1,,βp: many-to-one comparisons, H0:i=2p{β1=βi}; and all pairwise comparisons H0:1ijp{βi=βj}. The results for many-to-one comparisons are summarized here while the results for all pairwise comparisons are provided in the supplementary material. We choose a collection of multiple testing methods including one-step Bonferroni method and the MNQ method based on the multivariate distribution of the test statistics. The equi-coordinate critical values for multivariate normal and multivariate t distributions are obtained using the R package mvtnorm [25]. We also examine the performance of Dunn-Sidák method, Holm’s stage-wise procedure, and Scheffé’s projection method and the results are provided in the supplementary material.

Part of our goal is to show practitioners what happens if the correlation structure in the clustered data is ignored. To this end, we also include a “misspecified” scenario, where independence is erroneously assumed within the clusters. Due to the specific composite likelihood methods we use (univariate marginals and univariate conditionals), such a misspecification is equivalent to H(θ)=J(β) in eq. (1). This results in an estimate of Γˆn=Hˆn1. This misspecified scenario is included for comparison, and we consider it only with the MNQ multiple comparison method (that is, the MNQ cutoff is calculated based on V estimated by plugging in Γˆn=Hˆn1). Throughout, it is referred to as the “naive” approach.

In our simulations, we study the three models described in the previous section. For each model, a different sample size is needed for our asymptotic approximations to be valid. We determine this sample size with an initial simulation, before we proceed with our more in-depth investigations. For each simulation setting, 10000 simulated data sets are generated and the FWER is set to 0.05. The standard deviation for the observed FWER is hence approximately 0.002. These preliminary simulation results are given in Table 1. We observe that n=200,500 and 700 are required for the multivariate normal, multivariate probit and quadratic exponential models to maintain FWER within two standard deviations away from 0.05, respectively. These are the sample sizes used for the simulation results which follow.

Table 1

FWER under different sample sizes.

ModelSample size
502005007001,0004,000
Multivariate normal0.05440.05090.04920.04830.04950.0500
Multivariate probit0.11400.05760.05010.05110.05060.0511
Quadratic exponential0.11800.05800.05430.05190.05200.0504

To evaluate the power of each methods, we consider two different alternative scenarios: one alternative configuration a1 with only one non-zero parameter having a large effect size, and a second alternative configuration a2 with five true non-zero parameters but having small effect sizes for all. We are interested in the ability of the test to reject the global null hypothesis, but also in the ability of the test to reject the individual null hypotheses. Under the alternative scenario a1, we calculate the power to reject the global hypothesis (denoted as “a1” in the tables) and for the alternative configuration a2, we calculate both the power to reject the global null hypothesis (denoted as “a2” in the tables) and the average individual powers for the five individual true alternatives (denoted as “ind a2” in the tables).

Table 2

Simulations results for three multivariate models.

NormalProbitQuad Exp
ρmpMNQnaiveBonfMNQnaiveBonfMNQnaiveBonf
FWER04100.05450.05530.04190.05300.05060.04130.05140.05620.0400
a10.81640.81660.78940.87000.87050.84770.53900.55340.5010
a20.80570.80530.76170.91140.91090.88850.70670.72400.6573
ind a20.18160.18160.16830.21660.21560.20390.19560.20570.1765
FWER200.05110.05020.03520.05280.05030.03890.05610.07670.0404
a10.74870.74760.70620.82580.82320.79020.45510.48530.3990
a20.71500.71340.66870.85470.85110.81490.60400.63650.5237
ind a20.15400.15350.14160.18870.18780.17690.15460.16690.1303
FWER10100.04790.04710.03750.05260.05150.04230.04910.05490.0381
a10.99830.99830.99790.99960.99960.99950.53910.55350.5010
a20.99930.99930.99861.00001.00001.00000.70660.72390.6573
ind a20.29640.29630.28440.33300.33190.31680.19560.20570.1765
FWER200.04870.04850.03630.05270.05080.03640.05610.07670.0404
a10.99810.99800.99670.99930.99950.99850.45480.48490.3989
a20.99780.99770.99631.00000.99990.99970.59710.63090.5255
ind a20.26880.26810.25520.29730.29560.28110.15360.16630.1305
FWER0.54100.05130.09770.03900.05080.07930.03930.05210.00000.0417
a10.72350.81290.68670.81020.86010.78080.78640.03070.7546
a20.69220.80420.66150.85300.90380.83050.90500.04440.8800
ind a20.15140.18780.14420.19700.22060.18640.30270.00900.2820
FWER200.05100.10290.03770.05850.09150.04060.05090.00000.0377
a10.64200.75260.59500.75780.81960.70820.72140.01580.6739
a20.60310.73220.54370.78910.85340.73650.84600.01480.7998
ind a20.12740.16070.11350.17270.19420.15710.25800.00300.2306
FWER10100.05200.20790.04100.05130.14370.04020.05210.00000.0417
a10.95700.99360.94660.99000.99790.98710.78640.03070.7546
a20.95550.99820.94380.99520.99970.99660.91410.04070.8800
ind a20.23830.33810.22730.28150.35850.27040.30650.00830.2820
FWER200.04590.22710.03280.05430.16220.03820.05090.00000.0378
a10.94030.99380.92240.98620.99740.97840.72020.01610.6731
a20.92220.99480.89320.99350.99980.98940.84600.01480.7998
ind a20.21180.30720.19810.25750.32500.24500.25800.00300.2306
Table 3

Comparison of CMLE and MLE for multivariate normal model with exchangeable Σ.

Methodmpρ
0.20.5
MNQnaiveBonfMNQnaiveBonf
FWERCMLE4100.04940.06700.03890.05130.09770.0390
a10.77600.81130.74530.72350.81290.6867
FWERMLE0.04920.04930.04430.05510.01100.0431
a10.83190.82000.80531.00000.84230.9309
FWERCMLE200.05330.07340.03900.05100.10290.0377
a10.70440.74900.65910.64200.75260.5950
FWERMLE0.04810.04750.03930.05910.01090.0445
a10.78670.75810.73441.00000.77430.9070
FWERCMLE10100.04670.10190.03570.05200.20790.0410
a10.99120.99740.98750.95700.99360.9466
FWERMLE0.05360.03810.03640.05780.00710.0479
a10.99940.99910.99921.00000.99981.0000
FWERCMLE200.04680.10570.03200.04590.22710.0328
a10.98680.99700.98130.94030.99380.9224
FWERMLE0.04900.03880.03470.06740.00600.0529
a10.99910.99870.99861.00000.99971.0000
Table 4

Simulations results with small sample sizes.

NormalProbitQuad Exp
nMNQnaiveBonfMNQnaiveBonfMNQnaiveBonf
FWER500.05310.09620.04190.08390.08370.06740.11390.00030.0912
a10.17500.26450.15020.13530.14110.11390.15510.00150.1273
a20.17600.26280.14960.28490.30340.23980.18100.00340.1480
FWER1000.04740.09310.03790.06310.08080.04960.07040.00000.0551
a10.37510.48360.33820.18320.22030.15940.14580.00040.1190
a20.35630.47540.31810.51590.58350.46040.20160.00070.1703
Table 5

Simulations results using multivariate t approximation with n=50.

ModelMNQnaiveBonf
NormalFWER0.05090.08960.0509
a10.15130.22450.1547
a20.14700.22410.1503
ProbitFWER0.06670.06600.0673
a10.11120.11300.1152
a20.12060.12180.1143
inda20.02270.02370.0214
Quad expFWER0.09340.00010.0912
a10.13050.00080.1269
a20.16290.00130.1600

4.1 Multivariate normal model

We consider the multivariate normal model with n=200 clusters, cluster size m=4 or 10, and the number of covariates set to p=10 or 20. Different Σ scenarios are considered: 1) exchangeable structures with σ2=0.8 and ρ=cov(yij,yik)=0, or 0.5; 2) one arbitrary structure, where Σ=((1.3,0.9,0.5,0.3)T,(0.9,1.9,1.3,0.3)T,(0.5,1.3,1.3,0.1)T,(0.3,0.9,0.1,0.7)T). In each simulation, the m×p covariate matrix Xi is obtained by randomly sampling from normal distributions.

We consider here many-to-one comparisons where the first parameter is taken as the baseline. Under the global null hypothesis H0, the true value of the regression parameters is set to βT=0, and the power is calculated under two different alternative configurations βa1T=(0,0,0,0.032,0,,0) and βa2T=(0,0.008,0.01,0.03,0.005,0.01,0,,0). Under βa1, there is only one true alternative, and we evaluate the power to reject the global null hypothesis. Under βa2, there are five true alternatives and we evaluate both the power to reject the global null and the average of five powers to reject the five true alternatives.

Table 2 summarizes the results of our simulations. Overall, it is shown that the method which utilizes MNQ and correctly accounts for the intra-cluster correlations, has the best performance. A comparison of MNQ and naive MNQ clearly shows the cost of ignoring these correlations: the FWER of MNQ is superior to that of naive MNQ for ρ0 (when ρ=0 the two methods are almost identical). Notably, the power of the naive MNQ is occasionally higher than that of MNQ, however, this is only due to the over-inflation of the naive MNQ’s FWER. Overall, MNQ exhibits the best performance among all of the multiple comparison procedures.

We also evaluate the efficiency of the maximum composite likelihood estimator versus the maximum likelihood estimator. In Table 3, it is observed that both type of statistics have very similar performance maintaining the type I error rates, while the method based on the composite likelihood estimator suffers some loss of power. For example, when m=10,p=20,ρ=0.5, the power of MNQ based on the composite likelihood estimator is 0.94 compared to the power of 1.00 based on the maximum likelihood estimator. The MLE is obtained by treating the var-covariance matrix as a nuisance parameter. With some initial estimate for Σ, we maximize the loglikelihood for β. Then we estimate Σ using the residual covariance matrix. We iterate between the two steps until the estimates for β converges. For the maximum likelihood estimate, we use the matrix n1(i=1nXiTΣˆXi) as the estimated covariance matrix for βˆ. In contrast, the naive method on the maximum likelihood estimate uses the matrix n1(i=1nXiTXi) as the incorrect covariance matrix for βˆ.

It is observed that with increasing ρ and p for the multivariate distribution of the clustered data, the power of Bonferroni was not substantially smaller. The increase of ρ will increase the variability of each estimate C(l)Tβˆ and hence decrease the power. When ρ increases from 0 to 0.5, we observe about 10% increase in the variability of the estimates and this is in compatible with the 5-10% power loss that we observe. We also conduct simulations with smaller sample sizes n=50, and n=100. It is shown that with n greater than 50, the statistics based on the plug-in estimate of the Godambe information matrix has satisfactory performance. Table 4 shows for n=50, MNQ and Bonferroni maintains the FWER only for normal distribution, whereas for other two distributions, MNQ and Bonferroni tend to be liberal. The control of FWER is greatly improved with n=100 for all three multivariate distributions. As the multivariate distribution of Tn can be approximated as a multivariate t distribution with n degrees of freedom, we conduct simulations to investigate the multivariate t approximation. Table 5 shows that the multivariate t approximation provides improved control of the FWER for normal, probit and quadratic exponential distribution compared to multivariate normal approximation (Table 4) with the same sample of n=50.

4.2 Multivariate probit model

Here, we consider n=500 clusters with a cluster size m=4, or 10. The binary variables are generated by dichotomizing latent multivariate normal variables with a threshold of zero. For each cluster, an m×p covariate matrix Xi, with p=10 or 20, is obtained by randomly sampling from normal distributions. The regression coefficient vector under the global null hypothesis is set to βT=0 and the two alternative configurations are βa1T=(0,0,0,0.03,0,,0) and βa2T=(0,0.008,0.01,0.03,0.005,0.01,0,,0). The latent multivariate random vector has a mean Xiβ and a correlation matrix with ρ on the off-diagonals and σ=1. Here, we consider ρ=0, or 0.5. The empirical results are given in Table 2. Similarly to the multivariate normal setting, the naive MNQ for the multivariate probit model has large FWER when ρ=0.5. The MNQ method has better performance than the Bonferroni and naive method.

4.3 Quadratic exponential model

Here, we take a total of n=700 clusters, and p=10 or 20 predictors. The number of observations within each clusters, mi, varies between clusters and is uniformly sampled from {4,5,6,7,8}. The mi×p covariate matrix Xi is sampled from a standard normal distribution. We also consider two different values for the interaction parameter: w=0 or 0.5. The null value of the regression coefficients is βT=0 and the two alternative configurations are to βa1T=(0,0,0,0.12,0,,0) and βa2T=(0,0.08,0.12,0.03,0.05,0.08,0,,0). The empirical FWER and power are computed and summarized in Table 2. Overall, MNQ has clearly the best performance.

5 Analysis of depression data

Table 6

Composite likelihood estimates of the health factors’ regression coefficients.

EstimateSEUnadjusted p-value
Intercept 2.17510.0508<2e16
Sleeplessness1.33300.0290<2e16
Smoking0.28260.0439<2e16
High blood pressure0.07640.02192.07e11
Diabetes0.07100.02968.96e07
Difficulty in walking0.06950.0054<2e16
Age0.00930.00003<2e16
Activity 0.01560.00642.35e05
w0.28770.0094<2e16

We apply our proposed method to the health and retirement study (HRS) dataset. Information about health, financial situation, family structure, and health factors were collected by the RAND center at the University of Michigan. Depression status is recorded as a binary response variable. Seven health factors include “age” (in months), “smoking”, “restless sleep”, “diabetes”, “high blood pressure”, “activity”, and “difficulty in walking” are considered as predictors and each predictor is highly significant with unadjusted p-values all less than 104. The effect size estimates and unadjusted p-values are reported in Table 6. As we are interested in comparing the importance of different predictors, we perform multiple comparisons on the effects of these seven health factors. For each individual we include only the years for which all of the factors were recorded. In total, there are 33,636 people included in the analysis and the number of repeated measurements vary across individuals. As the response variable is binary and the cluster sizes vary, we propose to use a quadratic exponential model to model this data set. The w parameter in the quadratic exponential model allows us to account for the interaction effect among the repeated measurements for the same individuals. The full likelihood approach is computationally challenging for this model, and hence we use the proposed composite likelihood method to perform inference.

To compare the effect sizes of all the seven health factors, we perform both all pairwise comparisons and many-to-one comparisons on the seven parameters. For all pairwise comparisons, the MNQ approach rejects 15 of the 21 hypothesis. The results are given in Table 7. Based on the estimates of the effect sizes (Table 6), we note that “restless sleep” and “smoking” are the two health factors with the largest effect sizes. The pairwise comparisons between restless sleep with all other health factors and smoking with all other factors are rejected. This shows that “restless sleep” and “smoking” are the two leading health factors for the occurrence of depression. “High blood pressure”, “diabetes”, and “difficulty in walking” have less influence on the occurrence rate of depression. When we examine the three pairwise comparisons among these three factors, the three null hypotheses are not rejected, indicating that these three health factors have similar effects on the disease. Furthermore, when we compare “high blood pressure” with “age” and “activity”, the test rejects the two comparisons, indicating that “high blood pressure” is more important than “age” and “activity” with regard to the disease development.

To show how different the results will be if the within-patient correlations are ignored, we also compare the result of the MNQ with the naive MNQ method. Both the MNQ and the naive MNQ reject the global null hypothesis that all pairs of health factors have equal effects on the depression status. The MNQ method rejects 15 hypotheses, whereas the naive MNQ method rejects 18 out of the total 21 hypotheses. MNQ and naive MNQ are in agreement in all the aforementioned comparisons. However, when we compare the effect sizes between age and diabetes, diabetes and activity, age and activity, the MNQ method cannot reject these three null hypotheses while the naive method rejects all three. The difference between the two methods is due to the correlation among the repeated measurements, which is estimated as wˆ=0.2877. By ignoring this correlation, as in the naive method, the standard errors are underestimated leading to more rejections.

We also conduct many-to-one comparisons on the seven health factors using “diabetes” as the baseline factor. Table 8 shows that the comparisons are consistent with our findings in Table 7. “Restless sleep” and “smoking” are two factors which are much more influential than “diabetes” in terms of increasing the risk of depression, whereas “activity” is one factor which is more important than “diabetes” in terms of decreasing the risk of depression. The “age”, “high blood pressure” and “difficulty in walking” seem to be have similar effect sizes as “diabetes”. Naive MNQ method rejects one more comparison than MNQ method indicating that naive method is also more liberal in many-to-one comparisons.

Table 7

Results of MNQ and naive MNQ for all pairwise comparisons on the depression study data set. (A: fail to reject, R: reject H0).

H0MNQnaiveH0MNQnaive
βsleep=βsmokeRRβhbp=βdiabetAA
βsleep=βhbpRRβhbp=βdiffwalkAA
βsleep=βdiabetRRβhbp=βageRR
βsleep=βdiffwalkRRβhbp=βactivityRR
βsleep=βageRRβdiabet=βdiffwalkAA
βsleep=βactivityRRβdiabet=βageAR
βsmoke=βhbpRRβdiabet=βactivityAR
βsmoke=βdiabetRRβdiffwalk=βageRR
βsmoke=βdiffwalkRRβdiffwalk=βactivityRR
βsmoke=βageRRβage=βactivityAR
βsmoke=βactivityRR
Table 8

Results of MNQ and naive MNQ for many-to-one comparisons on the depression study data set. (A: fail to reject, R: reject H0).

H0MNQnaive
βdibet=βsleepRR
βdibet=βsmokeRR
βdibet=βhbpAA
βdibet=βdiffwalkAA
βdibet=βageAR
βdibet=βactivityRR

6 Discussion

For many correlated multivariate datasets, it is often computationally convenient to perform multiple comparisons based on the composite likelihood. Theory is developed based on the asymptotic properties of the composite likelihood test statistics. Sample sizes greater than 50 will be sufficient to apply the proposed multiple testing procedures in various models examined in the simulations. We establish the strong control of the FWER of our proposed composite likelihood test statistics with closed testing procedures. The method is illustrated for three different models: multivariate normal, multivariate probit and quadratic exponential. The equi-coordinate quantile of a multivariate normal distribution is used as a threshold for test statistics compared to some well-known traditional methods. This MNQ method, which is based on composite likelihood test statistics and uses multivariate normal quantiles to derive cut-off values for the test statistics, shows a better control of the FWER in most simulation settings, compared to the other test procedures.

7 Supplementary Files

We provide a supplementary file containing additional simulation results and technical derivations. An R package named CLMC is developed and can be downloaded from github account “m-azad”.

Acknowledgment

This work is supported by NSERC grants held by Gao and Jankowski.

References

1. Barua A, Ghosh MK, Kar N, Basiliod MA. Prevalence of depressive disorders in the elderly, 2011.10.4103/0256-4947.87100Search in Google Scholar

2. Cox DR, Reid N. A note on pseudolikelihood constructed from marginal densities. Biometrika 2004;91:729–737.10.1093/biomet/91.3.729Search in Google Scholar

3. Lindsay BG. Composite likelihood methods. Contemporary Mathematics 1988; 80:221–239.10.1090/conm/080/999014Search in Google Scholar

4. Varin C. On composite marginal likelihoods. AStA Adv Stat Anal 2008; 92:1–28.10.1007/s10182-008-0060-7Search in Google Scholar

5. Varin C, Reid N, Firth D. An overview of composite likelihood methods. Stat Sin 2011;21:5–42.Search in Google Scholar

6. Geys H, Molenberghs G, Ryan LM. Pseudo-likelihood inference for clustered binary data. Commun Stat Theory Methods 1997;26:2743–2767.10.1080/03610929708832075Search in Google Scholar

7. Renard D, Molenberghs G, Geys H. A pairwise likelihood approach to estimation in multilevel probit models. Comput Stat Data Anal 2004;44:649–667.10.1016/S0167-9473(02)00263-3Search in Google Scholar

8. Zhao Y, Joe H. Composite likelihood estimation in multivariate data analysis. Canadian J Stat 2005;33:335–356.10.1002/cjs.5540330303Search in Google Scholar

9. Bretz F, Hothorn T, Westfall P. Multiple comparisons using R FL:Chapman and Hall/CRC Press, 2010 Boca Raton.Search in Google Scholar

10. Hochberg Y, Tamhane A. Multiple comparison procedures. New York: Willy, 1987.10.1002/9780470316672Search in Google Scholar

11. Sidak Z. On multivariate normal probabilities of rectangles: Their dependence on correlations. Ann Math Stat 1968;39:1425–1434.10.1214/aoms/1177698122Search in Google Scholar

12. Schéffe. The analysis of variance. New York: Wiley. 1959.Search in Google Scholar

13. Simes RJ. An improved Bonferroni procedure for multiple tests of significance. Biometrika 1986;73:751–754.10.1093/biomet/73.3.751Search in Google Scholar

14. Holm S. A simple sequentially rejective multiple test procedure. Scand J Stat 1979;6:65–70.Search in Google Scholar

15. Hommel G. A stagewise rejective multiple test procedure based on a modified bonferroni test. Biometrika 1988; 75:383–386.10.1093/biomet/75.2.383Search in Google Scholar

16. Hothorn T, Bretz F, Westfall P. Simultaneous inference in general parametric models. Biom J 2008;50:346–363.10.1002/bimj.200810425Search in Google Scholar

17. Konietschke F, Bosiger S, Brunner E, Hothorn LA. Are multiple contrast tests superior to the anova? Int J Biostat 2013;9:11.10.1515/ijb-2012-0020Search in Google Scholar

18. Konietschke F, Hothorn LA, Brunner E. Rank-based multiple test procedures and simultaneous confidence intervals. Electron J Stat 2012;6:738–759.10.1214/12-EJS691Search in Google Scholar

19. Besag J. Spatial interaction and the statistical analysis of lattice systems, J Roy Stat Soc Ser B, 1974; 36:192–236. With discussion by D. R. Cox, A. G. Hawkes, P. Clifford, P. Whittle, K. Ord, R. Mead, J. M. Hammersley, and M. S. Bartlett and with a reply by the author.10.1111/j.2517-6161.1974.tb00999.xSearch in Google Scholar

20. Xu X, Reid N. On the robustness of maximum composite likelihood estimate. J Stat Plann Inference 2011;141:3047–3054.10.1016/j.jspi.2011.03.026Search in Google Scholar

21. Varin C, Vidoni P. A note on composite likelihood inference and model selection. Biometrika 2005;92:519–528.10.1093/biomet/92.3.519Search in Google Scholar

22. Gabriel KR. Simultaneous test procedures, some theory of multiple comparisons. Ann Math Stat 1969;40:224–250.10.1214/aoms/1177697819Search in Google Scholar

23. Marcus R, Peritz E, GK R. On closed testing procedures with specific reference to ordered analysis of variance. Biometrika 1976;63:655–660.10.1093/biomet/63.3.655Search in Google Scholar

24. Molenberghs G, Ryan LM. An exponential family model for clustered multivariate binary data. Environmetrics 1999; 10:279–300.10.1002/(SICI)1099-095X(199905/06)10:3<279::AID-ENV352>3.0.CO;2-XSearch in Google Scholar

25. Hothorn T, Bretz F, Westfall P, Heiberger RM, Schutzenmeister A. multcomp: Simultaneous inference for general linear hypotheses. R package version. 1.1-7, 2010 Available at http://CRAN.R-project.org/package=multcomp.Search in Google Scholar


Supplemental Material

The online version of this article (DOI:10.1515/ijb-2016-0004) offers supplementary material, available to authorized users.


Published Online: 2016-12-6
Published in Print: 2016-11-1

© 2016 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 5.11.2025 from https://www.degruyterbrill.com/document/doi/10.1515/ijb-2016-0004/html
Scroll to top button