Combining dependent F-tests for robust association of quantitative traits under genetic model uncertainty

Long Qu

doi:10.1515/sagmb-2013-0001

Enjoy 40% off

academic books on De Gruyter Brill *

Article

Combining dependent F-tests for robust association of quantitative traits under genetic model uncertainty

Long Qu

Published/Copyright: March 4, 2014

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Statistical Applications in Genetics and Molecular Biology Volume 13 Issue 2

Abstract

In association mapping of quantitative traits, the F-test based on an assumed genetic model is a basic statistical tool for testing association of each candidate locus with the trait of interest. However, the true underlying genetic model is often unknown, and using an incorrect model may cause serious loss of power. For case-control studies, it is known that the combination of several tests that are optimal for different models is robust to model misspecification. In this paper, we extend the test combination approach to quantitative trait association. We first derive the exact correlations among transformed test statistics and discuss interesting special cases. We then propose and evaluate a multivariate normality based approximation to the joint distribution of test statistics, such that the marginal distributions and pairwise correlations among test statistics are accounted for. Through simulations, we show that the sizes of the resulting approximate combined tests are accurate for practical purposes under a variety of situations. We find that the combination of the tests from the additive model and the genotypic model performs well, because it demonstrates both robustness to incorrect models and satisfactory power. A mouse lipoprotein data set is used to demonstrate the method.

Keywords: combination of p-values; dominance; genetic association; model misspecification; ratio of quadratic forms

Corresponding author: Long Qu, Department of Mathematics & Statistics, Wright State University, Dayton, OH 45435, USA, e-mail: long.qu@wright.edu

Appendices

An example of QR-decomposition

As a simple example, we consider the QR-decomposition of

X0=[1x1⋮⋮1xn]=[1n, x]

with rank 2 and n>2. Let r denote the residual from regressing x over 1_n, i.e., r=x−x¯1n where x¯=x⋅/n with x⋅=∑i=1nxi. Then the Q₀₀ matrix can be given by [1n1n, 1||r||r].

The corresponding R₀ matrix is then the upper Cholesky root of

X0′X0=[nx⋅x⋅||x||2],

i.e.,

R0=[n1nx⋅0||r||].

It is trivial to verify that Q₀₀R₀=X₀ and that Q₀₀′Q₀₀ is identity.

The columns of Q₀₁ can be any set of orthogonal basis vectors perpendicular to the space spanned by columns of Q₀₀. The choice is not unique, since any valid Q₀₁ can be right multiplied by an arbitrary (n–2)×(n–2) orthonormal matrix P and the resulting Q₀₁P can also be used to compute residual contrasts.

Proof of Theorem 1

First, we derive the formula for the second moment of the ratio of quadratic forms in normal random variables, which is a consequence of Theorem 6 of Magnus (1986).

Lemma 1Letx be ann˜×1 vector following multivariate normal distribution with zero mean and identity covariance matrix. LetA₁ andA₂ ben˜×n˜ symmetric real matrices. Then,

1.1) Fori=1 or 2,

E{(x′Aixx′x)2}={tr(Ai)}2+2tr(Ai2)n˜2+2n˜.

1.2)

E{(x′A1xx′x)(x′A2xx′x)}=tr(A1)tr(A2)+2tr(A1A2)n˜2+2n˜.

Proof. Define Δ(z)=(1+2z)^–1/2I, and R(z)=Δ(z)AΔ(z)=(1+2z)^–1A, where A can be either A₁ or A₂. Then it directly follows from Theorem 6 of Magnus (1986) that

E{(x′Axx′x)2}=∑n∈Uγ2(n)∫0∞z|Δ(z)|∏j=12[tr{R(z)j}]njdz,

where U={(n1,n2)′:∑j=12jnj=2, and nj is a nonnegative integer. }={(0,1)′, (2,0)′} and

γ2(n)=2!⋅22∏j=12{nj!(2j)nj}−1.

As γ₂{(0, 1)′}=2 and γ₂{(2, 0)′}=1, we have

E{(x′Axx′x)2}=2∫0∞z|Δ(z)|[tr{R(z)}]0[tr{R(z)2}]dz+∫0∞z|Δ(z)|[tr{R(z)}]2[tr{R(z)2}]0dz=∫0∞z|Δ(z)|(2tr{R(z)2}+[tr{R(z)}]2)dz.

Because |Δ(z)|=(1+2z)−n˜/2, [tr{R(z)}]²=(1+2z)^–2{tr(A)}², and tr{R(z)²}=(1+2z)^–2tr(A²), we have

E{(x′Axx′x)2}=[{tr(A)}2+2tr(A2)]∫0∞z(1+2z)n˜/2+2dz=[{tr(A)}2+2tr(A2)]21−n˜/2−2)z−122(n˜/2+2−1)(n˜/2+2−2)(2z+1)n˜/2+2−1|z=0∞={tr(A)}2+2tr(A2)n˜2+2n˜.

This completes the proof of part 1.1) of the lemma.

Next, notice that, by applying 1.1),

E{(x′A1xx′x±x′A2xx′x)2}=E[{x′(A1±A2)xx′x}2]={tr(A1±A2)}2+2tr{(A1±A2)2}n˜2+2n˜.

Also, for any a, b∈ℜ, 4ab=(a+b)²–(a–b)². Thus,

E{(x′A1xx′x)(x′A2xx′x)}=14(E[{x′(A1+A2)xx'x}2]−E[{x′(A1−A2)xx'x}2])=14(n˜2+2n˜)[{tr(A1+A2)}2+2tr{(A1+A2)2}−{tr(A1−A2)}2−2tr{(A1−A2)2}]=14(n˜2+2n˜)[{tr(A1+A2)}2−{tr(A1−A2)}2+2tr{(A1+A2)2−(A1−A2)2}]=14(n˜2+2n˜){tr(2A1)tr(2A2)+2tr(2A1A2+2A2A1)}=1(n˜2+2n˜){tr(A1)tr(A2)+2tr(A1A2)}.

Proof of Theorem 1. For i, j=1, 2, …, M, □

Bi+Bj=y˜′(PX˜i+PX˜j)y˜y˜′y˜=(y˜/σe) ′(PX˜i+PX˜j)(y˜/σe)(y˜/σe) ′(y˜/σe),

which is a ratio of quadratic forms in normal random variables. In addition, when the null hypothesis holds, B_i+B_j has the same distribution as

z′(PX˜i+PX˜j)zz′z,

where z is a vector of independent standard normal random variables. Similarly, B_i–B_j is identically distributed as the random variable

z′(PX˜i−PX˜j)zz′z.

By applying Lemma 1, we have

E(BiBj)=tr(PX˜i)tr(PX˜j)+2tr(PX˜iPX˜j)n˜2+2n˜=νiνj+2tr(PX˜iPX˜j)n˜2+2n˜.

Therefore,

ρij={E(BiBj)−μiμj}τiτj=n˜νiνjrirj{tr(PX˜iPX˜j)−νiνjn˜}.

Further, if we perform a QR-decomposition for X˜m=Q˜mR˜m, where Q˜m is n˜×νm such that Q˜mm′Q˜mm=I and R˜m is ν_m×ν_m upper triangular matrix, then tr(PX˜iPX˜j)=‖Q˜i′Q˜j‖F2 where ||‧||_F denotes the Frobenius norm. Thus, the lower bound of ρ_ij occurs when the norm is zero. It follows that ρij≥−νiνjrirj. □

Computational details of μ^ij(ϱij)

To facilitate the numerical integration involved in μ^ij(ϱij), we perform the following change of variables. First let z˜i=zi and z˜j=zj−ϱijzi1−ϱij2. It can be shown that z˜i and z˜j are independently and standard normally distributed if z_i and z_j follow the bivariate standard normal distribution with correlation ϱ_ij. Therefore,

μ^ij(ϱij)=∫−∞∞∫−∞∞ℏ(z˜i, z˜j; ϱij)dz˜idz˜j,

where

ℏ(z˜i, z˜j; ϱij)=Ξi−1{Φ(z˜i)}Ξj−1{Φ(ϱijz˜i+1−ϱijz˜j)}ϕ(z˜i)ϕ(z˜j).

Further, to avoid the indefinite interval over which the integration is performed, we let z¯i=arctan(z˜i) and z¯j=arctan(z˜j), such that μ^ij(ϱij) is equal to

∫−π2π2∫−π2π2ℏ(tan(z¯i),tan(z¯j); ϱi,j)cos2(z¯i)cos2(z¯j)dz¯idz¯j.

Alternatively, the integration can be performed by letting b¯i=Φ(z˜i) and b¯j=Φ(z˜j). Then b¯i and b¯j are independently and uniformly distributed and μ^ij(ρij) is given by

∫01∫01Ξi−1(b¯i)Ξj−1[Φ{ϱijΦ−1(b¯i)+1−ϱijΦ−1(b¯j)}]db¯idb¯j.

Either of the above two formulas is in the form to be readily fed to common numerical integration routines, e.g., the adapt Integrate function in the cubature package of R.

References

Freidlin, B., G. Zheng, Z. Li and J. L. Gastwirth (2002): “Trend tests for case-control studies of genetic markers: power, sample size and robustness,” Hum. Hered., 53, 146–152.Search in Google Scholar

Fritsch, F. N. and R. E. Carlson (1980): “Monotone piecewise cubic interpolation,” SIAM J. Numer. Anal., 17, 238–246.Search in Google Scholar

Genz, A. (1992): “Numerical computation of multivariate normal probabilities,” J.Comput. Graph. Stat., 1, 141–149.Search in Google Scholar

Genz, A. (1993): “Comparison of methods for the computation of multivariate normal probabilities,” Comput. Sci. Stat., 55, 400–405.Search in Google Scholar

Genz, A. and K.-S. Kwong (2000): “Numerical evaluation of singular multivariate normal distributions,” J. Stat. Comput. Sim., 68, 1–21.Search in Google Scholar

González, J. R., J. L. Carrasco, F. Dudbridge, L. Armengol, X. Estivill and V. Moreno (2008): “Maximizing association statistics over genetic models,” Genet. Epidemiol., 32, 246–254.Search in Google Scholar

Han, B., H. M. Kang and E. Eskin (2009): “Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers,” PLoS Genet., 5, e1000456.Search in Google Scholar

Higham, N. J. (2002): “Computing the nearest correlation matrix – a problem from finance,” IMA J. Numer. Anal., 22, 329–343.Search in Google Scholar

Joo, J., M. Kwak, K. Ahn and G. Zheng (2009): “A robust genome-wide scan statistic of the Wellcome Trust Case-Control Consortium,” Biometrics, 65, 1115–1122.10.1111/j.1541-0420.2009.01185.xSearch in Google Scholar PubMed

Joo, J., M. Kwak, Z. Chen and G. Zheng (2010a): “Efficiency robust statistics for genetic linkage and association studies under genetic model uncertainty,” Stat. Med., 29, 158–180.10.1002/sim.3759Search in Google Scholar PubMed

Joo, J., M. Kwak and G. Zheng (2010b): “Improving power for testing genetic association in casecontrol studies by reducing the alternative space,” Biometrics, 66, 266–276.10.1111/j.1541-0420.2009.01241.xSearch in Google Scholar PubMed

Kwak, M., J. Joo and G. Zheng (2009): “A robust test for two-stage design in genome-wide association studies,” Biometrics, 65, 1288–1295.10.1111/j.1541-0420.2008.01187.xSearch in Google Scholar PubMed

Lehmann, E. L. and J. P. Romano (2005): Testing statistical hypotheses, Springer texts in statistics, 3rd edition. New York: Springer.Search in Google Scholar

Lettre, G., C. Lange and J. N. Hirschhorn (2007): “Genetic model testing and statistical power in population-based association studies of quantitative traits,” Genet. Epidemiol., 31, 358–362.Search in Google Scholar

Li, Q., K. Yu, Z. Li and G. Zheng (2008): “MAX-rank: a simple and robust genome-wide scan for case-control association studies,” Hum. Genet., 123, 617–623.Search in Google Scholar

Li, Q., G. Zheng, X. Liang and K. Yu (2009): “Robust tests for single-marker analysis in case-control genetic association studies,” Ann. Hum. Genet., 73, 245–252.Search in Google Scholar

Magnus, J. R. (1986): “The exact moments of a ratio of quadratic forms in normal variables,” Annals of Economics and Statistics/Annales d’Économie et de Statistique, 4, 95–109.10.2307/20075629Search in Google Scholar

Patterson, H. D. and R. Thompson (1971): “Recovery of inter-block information when block sizes are unequal,” Biometrika, 58, 545–554.10.1093/biomet/58.3.545Search in Google Scholar

Price, A. L., N. J. Patterson, R. M. Plenge, M. E. Weinblatt, N. A. Shadick and D. Reich (2006): “Principal components analysis corrects for stratification in genome-wide association studies,” Nat. Genet., 38, 904–909.Search in Google Scholar

Qu, L., D. Nettleton, J. C. Dekkers and N. Bacciu (2010): “Variance model selection with application to joint analysis of microarray datasets from multiple studies under false discovery rate control,” Stat. Interface, 3, 477–491.Search in Google Scholar

Qu, L., T. Guennel and S. L. Marshal (2013): “Linear score tests for variance components in linear mixed models and applications to genetic association studies,” Biometrics, 69, 883–892.10.1111/biom.12095Search in Google Scholar PubMed

Sladek, R., G. Rocheleau, J. Rung, C. Dina, L. Shen, D. Serre, P. Boutin, D. Vincent, A. Belisle, S. Hadjadj, B. Balkau, B. Heude, G. Charpentier, T. J. Hudson, A. Montpetit, A. V. Pshezhetsky, M. Prentki, B. I. Posner, D. J. Balding, D. Meyre, C. Polychronakos and P. Froguel (2007): “A genome-wide association study identifies novel risk loci for type 2 diabetes,” Nature, 445, 881–885.10.1038/nature05616Search in Google Scholar PubMed

So, H.-C. and P. Sham (2011): “Robust association tests under different genetic models, allowing for binary or quantitative traits and covariates,” Behav. Genet., 41, 768–775.Search in Google Scholar

The Wellcome Trust Case Control Consortium (2007): “Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls,” Nature, 447, 661–678.10.1038/nature05911Search in Google Scholar PubMed PubMed Central

Valdar, W., L. C. Solberg, D. Gauguier, S. Burnett, P. Klenerman, W. O. Cookson, M. S. Taylor, J. N. P. Rawlins, R. Mott and J. Flint (2006): “Genome-wide genetic association of complex traits in heterogeneous stock mice,” Nat. Genet., 38, 879–887.Search in Google Scholar

Wang, K. and V. C. Sheffield (2005): “A constrained-likelihood approach to marker-trait association studies,” Am. J. Hum. Genet., 77, 768–780.Search in Google Scholar

Zang, Y., W. K. Fung and G. Zheng (2010): “Simple algorithms to calculate asymptotic null distributions of robust tests in case-control genetic association studies in R,” J. Stat. Software, 33, 1–24.Search in Google Scholar

Zaykin, D. V., L. A. Zhivotovsky, P. H. Westfall and B. S. Weir (2002): “Truncated product method for combining p-values,” Genet. Epidemiol., 22, 170–185.Search in Google Scholar

Zheng, G. and H. K. T. Ng (2008): “Genetic model selection in two-phase analysis for case-control association studies,” Biostatistics, 9, 391–399.10.1093/biostatistics/kxm039Search in Google Scholar PubMed PubMed Central

Zheng, G., J. Joo, X. Tian, C. O. Wu, J.-P. Lin, M. Stylianou, M. A. Waclawiw and N. L. Geller (2009a): “Robust genome-wide scans with genetic model selection using case-control design,” Stat. Interface, 2, 145–151.10.4310/SII.2009.v2.n2.a4Search in Google Scholar

Zheng, G., J. Joo, D. Zaykin, C. Wu and N. Geller (2009b): “Robust tests in genome-wide scans under incomplete linkage disequilibrium,” Stat. Sci., 24, 503–516.10.1214/09-STS314Search in Google Scholar

Published Online: 2014-3-4

Published in Print: 2014-4-1

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/sagmb-2013-0001

Keywords for this article

combination of p-values; dominance; genetic association; model misspecification; ratio of quadratic forms