Abstract
In association mapping of quantitative traits, the F-test based on an assumed genetic model is a basic statistical tool for testing association of each candidate locus with the trait of interest. However, the true underlying genetic model is often unknown, and using an incorrect model may cause serious loss of power. For case-control studies, it is known that the combination of several tests that are optimal for different models is robust to model misspecification. In this paper, we extend the test combination approach to quantitative trait association. We first derive the exact correlations among transformed test statistics and discuss interesting special cases. We then propose and evaluate a multivariate normality based approximation to the joint distribution of test statistics, such that the marginal distributions and pairwise correlations among test statistics are accounted for. Through simulations, we show that the sizes of the resulting approximate combined tests are accurate for practical purposes under a variety of situations. We find that the combination of the tests from the additive model and the genotypic model performs well, because it demonstrates both robustness to incorrect models and satisfactory power. A mouse lipoprotein data set is used to demonstrate the method.
Appendices
An example of QR-decomposition
As a simple example, we consider the QR-decomposition of
with rank 2 and n>2. Let r denote the residual from regressing x over 1n, i.e.,
The corresponding R0 matrix is then the upper Cholesky root of
i.e.,
It is trivial to verify that Q00R0=X0 and that Q00′Q00 is identity.
The columns of Q01 can be any set of orthogonal basis vectors perpendicular to the space spanned by columns of Q00. The choice is not unique, since any valid Q01 can be right multiplied by an arbitrary (n–2)×(n–2) orthonormal matrix P and the resulting Q01P can also be used to compute residual contrasts.
Proof of Theorem 1
First, we derive the formula for the second moment of the ratio of quadratic forms in normal random variables, which is a consequence of Theorem 6 of Magnus (1986).
Lemma 1Letx be an
1.1) Fori=1 or 2,
1.2)
Proof. Define Δ(z)=(1+2z)–1/2I, and R(z)=Δ(z)AΔ(z)=(1+2z)–1A, where A can be either A1 or A2. Then it directly follows from Theorem 6 of Magnus (1986) that
where
As γ2{(0, 1)′}=2 and γ2{(2, 0)′}=1, we have
Because
This completes the proof of part 1.1) of the lemma.
Next, notice that, by applying 1.1),
Also, for any a, b∈ℜ, 4ab=(a+b)2–(a–b)2. Thus,
Proof of Theorem 1. For i, j=1, 2, …, M, □
which is a ratio of quadratic forms in normal random variables. In addition, when the null hypothesis holds, Bi+Bj has the same distribution as
where z is a vector of independent standard normal random variables. Similarly, Bi–Bj is identically distributed as the random variable
By applying Lemma 1, we have
Therefore,
Further, if we perform a QR-decomposition for
Computational details of μ ^ i j ( ϱ i j )
To facilitate the numerical integration involved in
where
Further, to avoid the indefinite interval over which the integration is performed, we let
Alternatively, the integration can be performed by letting
Either of the above two formulas is in the form to be readily fed to common numerical integration routines, e.g., the adapt Integrate function in the cubature package of R.
References
Freidlin, B., G. Zheng, Z. Li and J. L. Gastwirth (2002): “Trend tests for case-control studies of genetic markers: power, sample size and robustness,” Hum. Hered., 53, 146–152.Suche in Google Scholar
Fritsch, F. N. and R. E. Carlson (1980): “Monotone piecewise cubic interpolation,” SIAM J. Numer. Anal., 17, 238–246.Suche in Google Scholar
Genz, A. (1992): “Numerical computation of multivariate normal probabilities,” J.Comput. Graph. Stat., 1, 141–149.Suche in Google Scholar
Genz, A. (1993): “Comparison of methods for the computation of multivariate normal probabilities,” Comput. Sci. Stat., 55, 400–405.Suche in Google Scholar
Genz, A. and K.-S. Kwong (2000): “Numerical evaluation of singular multivariate normal distributions,” J. Stat. Comput. Sim., 68, 1–21.Suche in Google Scholar
González, J. R., J. L. Carrasco, F. Dudbridge, L. Armengol, X. Estivill and V. Moreno (2008): “Maximizing association statistics over genetic models,” Genet. Epidemiol., 32, 246–254.Suche in Google Scholar
Han, B., H. M. Kang and E. Eskin (2009): “Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers,” PLoS Genet., 5, e1000456.Suche in Google Scholar
Higham, N. J. (2002): “Computing the nearest correlation matrix – a problem from finance,” IMA J. Numer. Anal., 22, 329–343.Suche in Google Scholar
Joo, J., M. Kwak, K. Ahn and G. Zheng (2009): “A robust genome-wide scan statistic of the Wellcome Trust Case-Control Consortium,” Biometrics, 65, 1115–1122.10.1111/j.1541-0420.2009.01185.xSuche in Google Scholar PubMed
Joo, J., M. Kwak, Z. Chen and G. Zheng (2010a): “Efficiency robust statistics for genetic linkage and association studies under genetic model uncertainty,” Stat. Med., 29, 158–180.10.1002/sim.3759Suche in Google Scholar PubMed
Joo, J., M. Kwak and G. Zheng (2010b): “Improving power for testing genetic association in casecontrol studies by reducing the alternative space,” Biometrics, 66, 266–276.10.1111/j.1541-0420.2009.01241.xSuche in Google Scholar PubMed
Kwak, M., J. Joo and G. Zheng (2009): “A robust test for two-stage design in genome-wide association studies,” Biometrics, 65, 1288–1295.10.1111/j.1541-0420.2008.01187.xSuche in Google Scholar PubMed
Lehmann, E. L. and J. P. Romano (2005): Testing statistical hypotheses, Springer texts in statistics, 3rd edition. New York: Springer.Suche in Google Scholar
Lettre, G., C. Lange and J. N. Hirschhorn (2007): “Genetic model testing and statistical power in population-based association studies of quantitative traits,” Genet. Epidemiol., 31, 358–362.Suche in Google Scholar
Li, Q., K. Yu, Z. Li and G. Zheng (2008): “MAX-rank: a simple and robust genome-wide scan for case-control association studies,” Hum. Genet., 123, 617–623.Suche in Google Scholar
Li, Q., G. Zheng, X. Liang and K. Yu (2009): “Robust tests for single-marker analysis in case-control genetic association studies,” Ann. Hum. Genet., 73, 245–252.Suche in Google Scholar
Magnus, J. R. (1986): “The exact moments of a ratio of quadratic forms in normal variables,” Annals of Economics and Statistics/Annales d’Économie et de Statistique, 4, 95–109.10.2307/20075629Suche in Google Scholar
Patterson, H. D. and R. Thompson (1971): “Recovery of inter-block information when block sizes are unequal,” Biometrika, 58, 545–554.10.1093/biomet/58.3.545Suche in Google Scholar
Price, A. L., N. J. Patterson, R. M. Plenge, M. E. Weinblatt, N. A. Shadick and D. Reich (2006): “Principal components analysis corrects for stratification in genome-wide association studies,” Nat. Genet., 38, 904–909.Suche in Google Scholar
Qu, L., D. Nettleton, J. C. Dekkers and N. Bacciu (2010): “Variance model selection with application to joint analysis of microarray datasets from multiple studies under false discovery rate control,” Stat. Interface, 3, 477–491.Suche in Google Scholar
Qu, L., T. Guennel and S. L. Marshal (2013): “Linear score tests for variance components in linear mixed models and applications to genetic association studies,” Biometrics, 69, 883–892.10.1111/biom.12095Suche in Google Scholar PubMed
Sladek, R., G. Rocheleau, J. Rung, C. Dina, L. Shen, D. Serre, P. Boutin, D. Vincent, A. Belisle, S. Hadjadj, B. Balkau, B. Heude, G. Charpentier, T. J. Hudson, A. Montpetit, A. V. Pshezhetsky, M. Prentki, B. I. Posner, D. J. Balding, D. Meyre, C. Polychronakos and P. Froguel (2007): “A genome-wide association study identifies novel risk loci for type 2 diabetes,” Nature, 445, 881–885.10.1038/nature05616Suche in Google Scholar PubMed
So, H.-C. and P. Sham (2011): “Robust association tests under different genetic models, allowing for binary or quantitative traits and covariates,” Behav. Genet., 41, 768–775.Suche in Google Scholar
The Wellcome Trust Case Control Consortium (2007): “Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls,” Nature, 447, 661–678.10.1038/nature05911Suche in Google Scholar PubMed PubMed Central
Valdar, W., L. C. Solberg, D. Gauguier, S. Burnett, P. Klenerman, W. O. Cookson, M. S. Taylor, J. N. P. Rawlins, R. Mott and J. Flint (2006): “Genome-wide genetic association of complex traits in heterogeneous stock mice,” Nat. Genet., 38, 879–887.Suche in Google Scholar
Wang, K. and V. C. Sheffield (2005): “A constrained-likelihood approach to marker-trait association studies,” Am. J. Hum. Genet., 77, 768–780.Suche in Google Scholar
Zang, Y., W. K. Fung and G. Zheng (2010): “Simple algorithms to calculate asymptotic null distributions of robust tests in case-control genetic association studies in R,” J. Stat. Software, 33, 1–24.Suche in Google Scholar
Zaykin, D. V., L. A. Zhivotovsky, P. H. Westfall and B. S. Weir (2002): “Truncated product method for combining p-values,” Genet. Epidemiol., 22, 170–185.Suche in Google Scholar
Zheng, G. and H. K. T. Ng (2008): “Genetic model selection in two-phase analysis for case-control association studies,” Biostatistics, 9, 391–399.10.1093/biostatistics/kxm039Suche in Google Scholar PubMed PubMed Central
Zheng, G., J. Joo, X. Tian, C. O. Wu, J.-P. Lin, M. Stylianou, M. A. Waclawiw and N. L. Geller (2009a): “Robust genome-wide scans with genetic model selection using case-control design,” Stat. Interface, 2, 145–151.10.4310/SII.2009.v2.n2.a4Suche in Google Scholar
Zheng, G., J. Joo, D. Zaykin, C. Wu and N. Geller (2009b): “Robust tests in genome-wide scans under incomplete linkage disequilibrium,” Stat. Sci., 24, 503–516.10.1214/09-STS314Suche in Google Scholar
©2014 by Walter de Gruyter Berlin/Boston
Artikel in diesem Heft
- frontmatter
- Research Articles
- Combining dependent F-tests for robust association of quantitative traits under genetic model uncertainty
- Penalized differential pathway analysis of integrative oncogenomics studies
- A data-smoothing approach to explore and test gene-environment interaction in case-parent trios
- Scan statistics analysis for detection of introns in time-course tiling array data
- Variance and covariance heterogeneity analysis for detection of metabolites associated with cadmium exposure
- Improved variational Bayes inference for transcript expression estimation
- Efficient identification of context dependent subgroups of risk from genome-wide association studies
Artikel in diesem Heft
- frontmatter
- Research Articles
- Combining dependent F-tests for robust association of quantitative traits under genetic model uncertainty
- Penalized differential pathway analysis of integrative oncogenomics studies
- A data-smoothing approach to explore and test gene-environment interaction in case-parent trios
- Scan statistics analysis for detection of introns in time-course tiling array data
- Variance and covariance heterogeneity analysis for detection of metabolites associated with cadmium exposure
- Improved variational Bayes inference for transcript expression estimation
- Efficient identification of context dependent subgroups of risk from genome-wide association studies