Startseite Combining dependent F-tests for robust association of quantitative traits under genetic model uncertainty
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

Combining dependent F-tests for robust association of quantitative traits under genetic model uncertainty

  • Long Qu EMAIL logo
Veröffentlicht/Copyright: 4. März 2014

Abstract

In association mapping of quantitative traits, the F-test based on an assumed genetic model is a basic statistical tool for testing association of each candidate locus with the trait of interest. However, the true underlying genetic model is often unknown, and using an incorrect model may cause serious loss of power. For case-control studies, it is known that the combination of several tests that are optimal for different models is robust to model misspecification. In this paper, we extend the test combination approach to quantitative trait association. We first derive the exact correlations among transformed test statistics and discuss interesting special cases. We then propose and evaluate a multivariate normality based approximation to the joint distribution of test statistics, such that the marginal distributions and pairwise correlations among test statistics are accounted for. Through simulations, we show that the sizes of the resulting approximate combined tests are accurate for practical purposes under a variety of situations. We find that the combination of the tests from the additive model and the genotypic model performs well, because it demonstrates both robustness to incorrect models and satisfactory power. A mouse lipoprotein data set is used to demonstrate the method.


Corresponding author: Long Qu, Department of Mathematics & Statistics, Wright State University, Dayton, OH 45435, USA, e-mail:

Appendices

An example of QR-decomposition

As a simple example, we consider the QR-decomposition of

X0=[1x11xn]=[1n,x]

with rank 2 and n>2. Let r denote the residual from regressing x over 1n, i.e., r=xx¯1n where x¯=x/n with x=i=1nxi. Then the Q00 matrix can be given by [1n1n,1||r||r].

The corresponding R0 matrix is then the upper Cholesky root of

X0X0=[nxx||x||2],

i.e.,

R0=[n1nx0||r||].

It is trivial to verify that Q00R0=X0 and that Q00Q00 is identity.

The columns of Q01 can be any set of orthogonal basis vectors perpendicular to the space spanned by columns of Q00. The choice is not unique, since any valid Q01 can be right multiplied by an arbitrary (n–2)×(n–2) orthonormal matrix P and the resulting Q01P can also be used to compute residual contrasts.

Proof of Theorem 1

First, we derive the formula for the second moment of the ratio of quadratic forms in normal random variables, which is a consequence of Theorem 6 of Magnus (1986).

Lemma 1Letx be ann˜×1 vector following multivariate normal distribution with zero mean and identity covariance matrix. LetA1 andA2 ben˜×n˜ symmetric real matrices. Then,

1.1) Fori=1 or 2,

E{(xAixxx)2}={tr(Ai)}2+2tr(Ai2)n˜2+2n˜.

1.2)

E{(xA1xxx)(xA2xxx)}=tr(A1)tr(A2)+2tr(A1A2)n˜2+2n˜.

Proof. Define Δ(z)=(1+2z)–1/2I, and R(z)=Δ(z)(z)=(1+2z)–1A, where A can be either A1 or A2. Then it directly follows from Theorem 6 of Magnus (1986) that

E{(xAxxx)2}=nUγ2(n)0z|Δ(z)|j=12[tr{R(z)j}]njdz,

where U={(n1,n2):j=12jnj=2,andnjisanonnegativeinteger.}={(0,1),(2,0)} and

γ2(n)=2!22j=12{nj!(2j)nj}1.

As γ2{(0, 1)′}=2 and γ2{(2, 0)′}=1, we have

E{(xAxxx)2}=20z|Δ(z)|[tr{R(z)}]0[tr{R(z)2}]dz+0z|Δ(z)|[tr{R(z)}]2[tr{R(z)2}]0dz=0z|Δ(z)|(2tr{R(z)2}+[tr{R(z)}]2)dz.

Because |Δ(z)|=(1+2z)n˜/2, [tr{R(z)}]2=(1+2z)–2{tr(A)}2, and tr{R(z)2}=(1+2z)–2tr(A2), we have

E{(xAxxx)2}=[{tr(A)}2+2tr(A2)]0z(1+2z)n˜/2+2dz=[{tr(A)}2+2tr(A2)]21n˜/22)z122(n˜/2+21)(n˜/2+22)(2z+1)n˜/2+21|z=0={tr(A)}2+2tr(A2)n˜2+2n˜.

This completes the proof of part 1.1) of the lemma.

Next, notice that, by applying 1.1),

E{(xA1xxx±xA2xxx)2}=E[{x(A1±A2)xxx}2]={tr(A1±A2)}2+2tr{(A1±A2)2}n˜2+2n˜.

Also, for any a, b∈ℜ, 4ab=(a+b)2–(ab)2. Thus,

E{(xA1xxx)(xA2xxx)}=14(E[{x(A1+A2)xx'x}2]E[{x(A1A2)xx'x}2])=14(n˜2+2n˜)[{tr(A1+A2)}2+2tr{(A1+A2)2}{tr(A1A2)}22tr{(A1A2)2}]=14(n˜2+2n˜)[{tr(A1+A2)}2{tr(A1A2)}2+2tr{(A1+A2)2(A1A2)2}]=14(n˜2+2n˜){tr(2A1)tr(2A2)+2tr(2A1A2+2A2A1)}=1(n˜2+2n˜){tr(A1)tr(A2)+2tr(A1A2)}.

Proof of Theorem 1. For i, j=1, 2, …, M,     □

Bi+Bj=y˜(PX˜i+PX˜j)y˜y˜y˜=(y˜/σe)(PX˜i+PX˜j)(y˜/σe)(y˜/σe)(y˜/σe),

which is a ratio of quadratic forms in normal random variables. In addition, when the null hypothesis holds, Bi+Bj has the same distribution as

z(PX˜i+PX˜j)zzz,

where z is a vector of independent standard normal random variables. Similarly, BiBj is identically distributed as the random variable

z(PX˜iPX˜j)zzz.

By applying Lemma 1, we have

E(BiBj)=tr(PX˜i)tr(PX˜j)+2tr(PX˜iPX˜j)n˜2+2n˜=νiνj+2tr(PX˜iPX˜j)n˜2+2n˜.

Therefore,

ρij={E(BiBj)μiμj}τiτj=n˜νiνjrirj{tr(PX˜iPX˜j)νiνjn˜}.

Further, if we perform a QR-decomposition for X˜m=Q˜mR˜m, where Q˜m is n˜×νm such that Q˜mmQ˜mm=I and R˜m is νm×νm upper triangular matrix, then tr(PX˜iPX˜j)=Q˜iQ˜jF2 where ||‧||F denotes the Frobenius norm. Thus, the lower bound of ρij occurs when the norm is zero. It follows that ρijνiνjrirj.     □

Computational details of μ^ij(ϱij)

To facilitate the numerical integration involved in μ^ij(ϱij), we perform the following change of variables. First let z˜i=zi and z˜j=zjϱijzi1ϱij2. It can be shown that z˜i and z˜j are independently and standard normally distributed if zi and zj follow the bivariate standard normal distribution with correlation ϱij. Therefore,

μ^ij(ϱij)=(z˜i,z˜j;ϱij)dz˜idz˜j,

where

(z˜i,z˜j;ϱij)=Ξi1{Φ(z˜i)}Ξj1{Φ(ϱijz˜i+1ϱijz˜j)}ϕ(z˜i)ϕ(z˜j).

Further, to avoid the indefinite interval over which the integration is performed, we let z¯i=arctan(z˜i) and z¯j=arctan(z˜j), such that μ^ij(ϱij) is equal to

π2π2π2π2(tan(z¯i),tan(z¯j);ϱi,j)cos2(z¯i)cos2(z¯j)dz¯idz¯j.

Alternatively, the integration can be performed by letting b¯i=Φ(z˜i) and b¯j=Φ(z˜j). Then b¯i and b¯j are independently and uniformly distributed and μ^ij(ρij) is given by

0101Ξi1(b¯i)Ξj1[Φ{ϱijΦ1(b¯i)+1ϱijΦ1(b¯j)}]db¯idb¯j.

Either of the above two formulas is in the form to be readily fed to common numerical integration routines, e.g., the adapt Integrate function in the cubature package of R.

References

Freidlin, B., G. Zheng, Z. Li and J. L. Gastwirth (2002): “Trend tests for case-control studies of genetic markers: power, sample size and robustness,” Hum. Hered., 53, 146–152.Suche in Google Scholar

Fritsch, F. N. and R. E. Carlson (1980): “Monotone piecewise cubic interpolation,” SIAM J. Numer. Anal., 17, 238–246.Suche in Google Scholar

Genz, A. (1992): “Numerical computation of multivariate normal probabilities,” J.Comput. Graph. Stat., 1, 141–149.Suche in Google Scholar

Genz, A. (1993): “Comparison of methods for the computation of multivariate normal probabilities,” Comput. Sci. Stat., 55, 400–405.Suche in Google Scholar

Genz, A. and K.-S. Kwong (2000): “Numerical evaluation of singular multivariate normal distributions,” J. Stat. Comput. Sim., 68, 1–21.Suche in Google Scholar

González, J. R., J. L. Carrasco, F. Dudbridge, L. Armengol, X. Estivill and V. Moreno (2008): “Maximizing association statistics over genetic models,” Genet. Epidemiol., 32, 246–254.Suche in Google Scholar

Han, B., H. M. Kang and E. Eskin (2009): “Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers,” PLoS Genet., 5, e1000456.Suche in Google Scholar

Higham, N. J. (2002): “Computing the nearest correlation matrix – a problem from finance,” IMA J. Numer. Anal., 22, 329–343.Suche in Google Scholar

Joo, J., M. Kwak, K. Ahn and G. Zheng (2009): “A robust genome-wide scan statistic of the Wellcome Trust Case-Control Consortium,” Biometrics, 65, 1115–1122.10.1111/j.1541-0420.2009.01185.xSuche in Google Scholar PubMed

Joo, J., M. Kwak, Z. Chen and G. Zheng (2010a): “Efficiency robust statistics for genetic linkage and association studies under genetic model uncertainty,” Stat. Med., 29, 158–180.10.1002/sim.3759Suche in Google Scholar PubMed

Joo, J., M. Kwak and G. Zheng (2010b): “Improving power for testing genetic association in casecontrol studies by reducing the alternative space,” Biometrics, 66, 266–276.10.1111/j.1541-0420.2009.01241.xSuche in Google Scholar PubMed

Kwak, M., J. Joo and G. Zheng (2009): “A robust test for two-stage design in genome-wide association studies,” Biometrics, 65, 1288–1295.10.1111/j.1541-0420.2008.01187.xSuche in Google Scholar PubMed

Lehmann, E. L. and J. P. Romano (2005): Testing statistical hypotheses, Springer texts in statistics, 3rd edition. New York: Springer.Suche in Google Scholar

Lettre, G., C. Lange and J. N. Hirschhorn (2007): “Genetic model testing and statistical power in population-based association studies of quantitative traits,” Genet. Epidemiol., 31, 358–362.Suche in Google Scholar

Li, Q., K. Yu, Z. Li and G. Zheng (2008): “MAX-rank: a simple and robust genome-wide scan for case-control association studies,” Hum. Genet., 123, 617–623.Suche in Google Scholar

Li, Q., G. Zheng, X. Liang and K. Yu (2009): “Robust tests for single-marker analysis in case-control genetic association studies,” Ann. Hum. Genet., 73, 245–252.Suche in Google Scholar

Magnus, J. R. (1986): “The exact moments of a ratio of quadratic forms in normal variables,” Annals of Economics and Statistics/Annales d’Économie et de Statistique, 4, 95–109.10.2307/20075629Suche in Google Scholar

Patterson, H. D. and R. Thompson (1971): “Recovery of inter-block information when block sizes are unequal,” Biometrika, 58, 545–554.10.1093/biomet/58.3.545Suche in Google Scholar

Price, A. L., N. J. Patterson, R. M. Plenge, M. E. Weinblatt, N. A. Shadick and D. Reich (2006): “Principal components analysis corrects for stratification in genome-wide association studies,” Nat. Genet., 38, 904–909.Suche in Google Scholar

Qu, L., D. Nettleton, J. C. Dekkers and N. Bacciu (2010): “Variance model selection with application to joint analysis of microarray datasets from multiple studies under false discovery rate control,” Stat. Interface, 3, 477–491.Suche in Google Scholar

Qu, L., T. Guennel and S. L. Marshal (2013): “Linear score tests for variance components in linear mixed models and applications to genetic association studies,” Biometrics, 69, 883–892.10.1111/biom.12095Suche in Google Scholar PubMed

Sladek, R., G. Rocheleau, J. Rung, C. Dina, L. Shen, D. Serre, P. Boutin, D. Vincent, A. Belisle, S. Hadjadj, B. Balkau, B. Heude, G. Charpentier, T. J. Hudson, A. Montpetit, A. V. Pshezhetsky, M. Prentki, B. I. Posner, D. J. Balding, D. Meyre, C. Polychronakos and P. Froguel (2007): “A genome-wide association study identifies novel risk loci for type 2 diabetes,” Nature, 445, 881–885.10.1038/nature05616Suche in Google Scholar PubMed

So, H.-C. and P. Sham (2011): “Robust association tests under different genetic models, allowing for binary or quantitative traits and covariates,” Behav. Genet., 41, 768–775.Suche in Google Scholar

The Wellcome Trust Case Control Consortium (2007): “Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls,” Nature, 447, 661–678.10.1038/nature05911Suche in Google Scholar PubMed PubMed Central

Valdar, W., L. C. Solberg, D. Gauguier, S. Burnett, P. Klenerman, W. O. Cookson, M. S. Taylor, J. N. P. Rawlins, R. Mott and J. Flint (2006): “Genome-wide genetic association of complex traits in heterogeneous stock mice,” Nat. Genet., 38, 879–887.Suche in Google Scholar

Wang, K. and V. C. Sheffield (2005): “A constrained-likelihood approach to marker-trait association studies,” Am. J. Hum. Genet., 77, 768–780.Suche in Google Scholar

Zang, Y., W. K. Fung and G. Zheng (2010): “Simple algorithms to calculate asymptotic null distributions of robust tests in case-control genetic association studies in R,” J. Stat. Software, 33, 1–24.Suche in Google Scholar

Zaykin, D. V., L. A. Zhivotovsky, P. H. Westfall and B. S. Weir (2002): “Truncated product method for combining p-values,” Genet. Epidemiol., 22, 170–185.Suche in Google Scholar

Zheng, G. and H. K. T. Ng (2008): “Genetic model selection in two-phase analysis for case-control association studies,” Biostatistics, 9, 391–399.10.1093/biostatistics/kxm039Suche in Google Scholar PubMed PubMed Central

Zheng, G., J. Joo, X. Tian, C. O. Wu, J.-P. Lin, M. Stylianou, M. A. Waclawiw and N. L. Geller (2009a): “Robust genome-wide scans with genetic model selection using case-control design,” Stat. Interface, 2, 145–151.10.4310/SII.2009.v2.n2.a4Suche in Google Scholar

Zheng, G., J. Joo, D. Zaykin, C. Wu and N. Geller (2009b): “Robust tests in genome-wide scans under incomplete linkage disequilibrium,” Stat. Sci., 24, 503–516.10.1214/09-STS314Suche in Google Scholar

Published Online: 2014-3-4
Published in Print: 2014-4-1

©2014 by Walter de Gruyter Berlin/Boston

Heruntergeladen am 26.9.2025 von https://www.degruyterbrill.com/document/doi/10.1515/sagmb-2013-0001/html
Button zum nach oben scrollen