Abstract
In genome-wide association studies (GWAS), logistic regression is one of the most popular analytics methods for binary traits. Multinomial regression is an extension of binary logistic regression that allows for multiple categories. However, many GWAS methods have been limited application to binary traits. These methods have improperly often been used to account for ordinal traits, which causes inappropriate type I error rates and poor statistical power. Owing to the lack of analysis methods, GWAS of ordinal traits has been known to be problematic and gaining attention. In this paper, we develop a general framework for identifying ordinal traits associated with genetic variants in pedigree-structured samples by collapsing and kernel methods. We use the local odds ratios GEE technology to account for complicated correlation structures between family members and ordered categorical traits. We use the retrospective idea to treat the genetic markers as random variables for calculating genetic correlations among markers. The proposed genetic association method can accommodate ordinal traits and allow for the covariate adjustment. We conduct simulation studies to compare the proposed tests with the existing models for analyzing the ordered categorical data under various configurations. We illustrate application of the proposed tests by simultaneously analyzing a family study and a cross-sectional study from the Genetic Analysis Workshop 19 (GAW19) data.
Funding source: Ministry of Science and Technology, Taiwan
Award Identifier / Grant number: MOST 110-2118-M-037-001-MY2
Acknowledgments
We thank the GAW19 data provider for their generosity in sharing their data with us. The Genetic Analysis Workshops are supported by NIH grant R01 GM031575. The GAW19 whole genome sequence data were provided by the T2D-GENES Consortium, which is supported by NIH grants U01 DK085524, U01 DK085584, U01 DK085501, U01 DK085526, and U01 DK085545. The other genetic and phenotypic data for GAW19 were provided by the San Antonio Family Heart Study and San Antonio Family Diabetes/Gallbladder study, which are supported by NIH grants P01 HL045222, R01 DK047482, and R01 DK053889. Andrew R. Wood is supported by European Research Council grant SZ-245 50371-GLUCOSEGENES-FP7-IDEAS-ERC. We are grateful to the editor and the referees for their helpful comments and suggestions in improving the paper.
-
Research ethics: Not applicable.
-
Author contributions: The author has accepted responsibility for the entire content of this manuscript and approved its submission.
-
Competing interests: The author states no conflict of interest.
-
Research funding: This work is supported by grant MOST 110-2118-M-037-001- MY2 of Ministry of Science and Technology, Taiwan, R.O.C.
-
Data availability: Not applicable.
References
1. Agresti, A. Analysis of ordinal categorical data. New Jersey: John Wiley & Sons Inc; 2010.10.1002/9780470594001Search in Google Scholar
2. McCullagh, P. Regression models for ordinal data. J Roy Stat Soc B 1980;42:109–42. https://doi.org/10.1111/j.2517-6161.1980.tb01109.x.Search in Google Scholar
3. Bedogni, G, Kahn, HS, Bellentani, S, Tiribelli, C. A simple index of lipid overaccumulation is a good marker of liver steatosis. BMC Gastroenterol 2010;10:98. https://doi.org/10.1186/1471-230x-10-98.Search in Google Scholar PubMed PubMed Central
4. Miller, ME, Davis, CS, Landis, JR. The analysis of longitudinal polytomous data: generalized estimating equations and connections with wighted least squares. Biometrics 1993;49:1033–44. https://doi.org/10.2307/2532245.Search in Google Scholar
5. Liang, K, Zeger, S. Longitudinal data-analysis using generalized linear-models. Biometrika 1986;73:13–22. https://doi.org/10.1093/biomet/73.1.13.Search in Google Scholar
6. Kenward, MG, Lesaffre, E, Molenberghs, G. An application of maximum likelihood and generalized estimating equations to the analysis of ordinal data from a longitudinal study with cases missing at random. Biometrics 1994;50:945–54. https://doi.org/10.2307/2533434.Search in Google Scholar
7. Lipsitz, SR, Kim, K, Zhao, L. Analysis of repeated categorical data using generalized estimating equations. Stat Med 1994;13:1149–63. https://doi.org/10.1002/sim.4780131106.Search in Google Scholar PubMed
8. Molenberghs, G, Lesaffre, E. Marginal modeling of correlated ordinal data using a multivariate plackett distribution. J Am Stat Assoc 1994;89:633–44. https://doi.org/10.1080/01621459.1994.10476788.Search in Google Scholar
9. Girard, P, Parent, E. Bayesian analysis of autocorrelated ordered categorical data for industrial quality monitoring. Technometrics 2001;43:180–91. https://doi.org/10.1198/004017001750386297.Search in Google Scholar
10. Parsons, NR, Edmondson, RN, Gilmour, SG. A generalized estimating equation method for fitting autocorrelated ordinal score data with an application in horticultural research. J R Stat Soc Ser C Appl Stat 2006;55:507–24. https://doi.org/10.1111/j.1467-9876.2006.00550.x.Search in Google Scholar
11. Das, U, Das, K. Inference on zero inflated ordinal models with semiparametric link. Comput Stat Data Anal 2018;128:104–15. https://doi.org/10.1016/j.csda.2018.06.016.Search in Google Scholar
12. Weiß, CH. Distance-based analysis of ordinal data and ordinal time series. J Am Stat Assoc 2020;115:1189–200. https://doi.org/10.1080/01621459.2019.1604370.Search in Google Scholar
13. German, CA, Sinsheimer, JS, Klimentidis, YC, Zhou, H, Zhou, JJ. Ordered multinomial regression for genetic association analysis of ordinal phenotypes at Biobank scale. Genet Epidemiol 2019;44:248–60. https://doi.org/10.1002/gepi.22276.Search in Google Scholar PubMed PubMed Central
14. Bi, W, Zhou, W, Dey, R, Mukherjee, B, Sampson, JN, Lee, S. Efficient mixed model approach for large-scale genome-wide association studies of ordinal categorical phenotypes. Am J Hum Genet 2021;108:825–39. https://doi.org/10.1016/j.ajhg.2021.03.019.Search in Google Scholar PubMed PubMed Central
15. Zhang, W, Li, Q. Incorporating Hardy–Weinberg equilibrium law to enhance the association strength for ordinal trait genetic study. Ann Hum Genet 2016;80:102–12. https://doi.org/10.1111/ahg.12142.Search in Google Scholar PubMed
16. Wang, J, Ding, J, Huang, S, Li, Q, Pan, D. A powerful method to test associations between ordinal traits and genotypes. G3 Genes Genom Genet 2019;9:2573–9. https://doi.org/10.1534/g3.119.400293.Search in Google Scholar PubMed PubMed Central
17. Xue, Y, Wang, J, Ding, J, Zhang, S, Li, Q. A powerful test for ordinal trait genetic association analysis. Stat Appl Genet Mol Biol 2019;18:20170066. https://doi.org/10.1515/sagmb-2017-0066.Search in Google Scholar PubMed
18. O’Reilly, PF, Hoggart, CJ, Pomyen, Y, Calboli, FCF, Elliott, P, Jarvelin, MR, et al.. MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PLoS One 2012;7:e34861. https://doi.org/10.1371/journal.pone.0034861.Search in Google Scholar PubMed PubMed Central
19. Chiu, C-Y, Wang, S, Zhang, B, Luo, Y, Simpson, C, Zhang, W, et al.. Gene-level association analysis of ordinal traits with functional ordinal logistic regressions. Genet Epidemiol 2022;46:234–55. https://doi.org/10.1002/gepi.22451.Search in Google Scholar PubMed PubMed Central
20. Wang, S, Chiu, C, Wilson, AF, Bailey‐Wilson, JE, Agron, E, Chew, EY, et al.. Gene-level association analysis of bivariate ordinal traits with functional regressions. Genet Epidemiol 2023. https://doi.org/10.1002/gepi.22524.Search in Google Scholar PubMed
21. Touloumis, A, Agresti, A, Kateri, M. GEE for multinomial responses using a local odds ratios parameterization. Biometrics 2013;69:633–40. https://doi.org/10.1111/biom.12054.Search in Google Scholar PubMed
22. Schaid, DJ, McDonnell, SK, Sinnwell, JP, Thibodeau, SN. Multiple genetic variant association testing by collapsing and kernel methods with pedigree or population structured data. Genet Epidemiol 2013;37:409–18. https://doi.org/10.1002/gepi.21727.Search in Google Scholar PubMed PubMed Central
23. Heagerty, PJ, Zeger, SL. Marginal regression models for clustered ordinal measurements. J Am Stat Assoc 1996;91:1024–36. https://doi.org/10.1080/01621459.1996.10476973.Search in Google Scholar
24. Yee, T. Vector generalized linear and additive models, R package version 1.1; 2021.10.1007/s10687-007-0032-4Search in Google Scholar
25. Nooraee, N, Molenberghs, G, Heuvel, ERVD. GEE for longitudinal ordinal data: comparing R-geepack, R-multgee, R-repolr, SAS-GENMOD, SPSS-GENLIN. Comput Stat Data Anal 2014;77:70–83. https://doi.org/10.1016/j.csda.2014.03.009.Search in Google Scholar
26. Fréchet, M. Les probabilités associées à un système d’événements compatibles et dépendants. Paris: Hermann & Cie; 1940.Search in Google Scholar
27. Touloumis, A. GEE solver for correlated nominal or ordinal multinomial responses, R package version 1.8; 2021.Search in Google Scholar
28. Thornton, T, McPeek, MS. ROADTRIPS: case-control association testing with partially or completely unknown population and pedigree structure. Am J Hum Genet 2010;86:172–84. https://doi.org/10.1016/j.ajhg.2010.01.001.Search in Google Scholar PubMed PubMed Central
29. Kuonen, D. Saddlepoint approximations for distributions of quadratic forms in normal variables. Biometrika 1999;86:929–35. https://doi.org/10.1093/biomet/86.4.929.Search in Google Scholar
30. Liu, Y, Xie, J. Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures. J Am Stat Assoc 2020;115:393–402. https://doi.org/10.1080/01621459.2018.1554485.Search in Google Scholar PubMed PubMed Central
31. Liu, Y, Chen, S, Li, Z, Morrison, AC, Boerwinkle, E, Lin, X. ACAT: a fast and powerful p value combination method for rare-variant analysis in sequencing studies. Am J Hum Genet 2019;104:410–21. https://doi.org/10.1016/j.ajhg.2019.01.002.Search in Google Scholar PubMed PubMed Central
32. McCaw, ZR, Lane, JM, Saxena, R, Redline, S, Lin, X. Operating characteristics of the rank‐based inverse normal transformation for quantitative trait analysis in genome‐wide association studies. Biometrics 2020;76:1262–72. https://doi.org/10.1111/biom.13214.Search in Google Scholar PubMed PubMed Central
33. McCaw, ZR. Rank normal transformation omnibus test, R package version 0.7.1; 2019.Search in Google Scholar
34. Schaffner, SF, Foo, C, Gabriel, S, Reich, D, Daly, MJ, Altshuler, D. Calibrating a coalescent simulation of human genome sequence variation. Genome Res 2005;15:1576–83. https://doi.org/10.1101/gr.3709305.Search in Google Scholar PubMed PubMed Central
35. Lee, S, Teslovich, TM, Boehnke, M, Lin, X. General framework for meta-analysis of rare variants in sequencing association studies. Am J Hum Genet 2013;93:42–53. https://doi.org/10.1016/j.ajhg.2013.05.010.Search in Google Scholar PubMed PubMed Central
36. Schaid, DJ, Alessia, V, Sinnwell, JP. Gene-level association tests with disease status for pedigree data: kernel and burden association statistics, R package version 3.3; 2020.Search in Google Scholar
37. Højsgaard, S, Halekoh, U, Yan, J, Ekstrøm, CT. Generalized estimating equation package, R package version 1.3.9; 2022.Search in Google Scholar
38. Touloumis, A. Simulates correlated multinomial responses, R package version 1.8; 2021.Search in Google Scholar
39. Blangero, J, Teslovich, TM, Sim, X, Almeida, MA, Jun, G, Dyer, TD, et al.. Omics-squared: human genomic, transcriptomic and phenotypic data for genetic analysis workshop 19. BMC Proc 2016;10:71–7. https://doi.org/10.1186/s12919-016-0008-y.Search in Google Scholar PubMed PubMed Central
40. Engelman, CD, Greenwood, CMT, Bailey, JN, Cantor, RM, Kent, JW, König, IR, et al.. Genetic Analysis Workshop 19: methods and strategies for analyzing human sequence and gene expression data in extended families and unrelated individuals. BMC Proc 2016;10:67–70. https://doi.org/10.1186/s12919-016-0007-z.Search in Google Scholar PubMed PubMed Central
41. Fuchsberger, C, Flannick, J, Teslovich, TM, Mahajan, A, Agarwala, V, Gaulton, KJ, et al.. The genetic architecture of type 2 diabetes. Nature 2016;536:41–7. https://doi.org/10.1038/nature18642.Search in Google Scholar PubMed PubMed Central
42. Heiber, M, Marchese, A, Nguyen, T, Heng, HH, George, SR, O’Dowd, BF. A novel human gene encoding a G-protein-coupled receptor (GPR15) is located on chromosome 3. Genomics 1996;32:462–5. https://doi.org/10.1006/geno.1996.0143.Search in Google Scholar PubMed
43. Bauer, M. The role of GPR15 function in blood and vasculature. Int J Mol Sci 2021;22:10824. https://doi.org/10.3390/ijms221910824.Search in Google Scholar PubMed PubMed Central
44. Harris, DM, Cohn, HI, Pesant, S, Eckhart, AD. GPCR signalling in hypertension: role of GRKs. Clin Sci 2008;15:79–89. https://doi.org/10.1042/cs20070442.Search in Google Scholar PubMed
45. Rockman, HA, Koch, WJ, Lefkowitz, RJ. Seven-transmembrane-spanning receptors and heart function. Nature 2002;415:206–12. https://doi.org/10.1038/415206a.Search in Google Scholar PubMed
46. Lee, S, Emond, MJ, Bamshad, MJ, Barnes, KC, Rieder, MJ, Nickerson, DA, et al.. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am J Hum Genet 2012;91:224–37. https://doi.org/10.1016/j.ajhg.2012.06.007.Search in Google Scholar PubMed PubMed Central
Supplementary Material
This article contains supplementary material (https://doi.org/10.1515/ijb-2022-0123).
© 2023 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Research Articles
- Random forests for survival data: which methods work best and under what conditions?
- Flexible variable selection in the presence of missing data
- An interpretable cluster-based logistic regression model, with application to the characterization of response to therapy in severe eosinophilic asthma
- MBPCA-OS: an exploratory multiblock method for variables of different measurement levels. Application to study the immune response to SARS-CoV-2 infection and vaccination
- Detecting differentially expressed genes from RNA-seq data using fuzzy clustering
- Hypothesis testing for detecting outlier evaluators
- Response to comments on ‘sensitivity of estimands in clinical trials with imperfect compliance’
- Commentary
- Comments on “sensitivity of estimands in clinical trials with imperfect compliance” by Chen and Heitjan
- Research Articles
- Optimizing personalized treatments for targeted patient populations across multiple domains
- Statistical models for assessing agreement for quantitative data with heterogeneous random raters and replicate measurements
- History-restricted marginal structural model and latent class growth analysis of treatment trajectories for a time-dependent outcome
- Revisiting incidence rates comparison under right censorship
- Ensemble learning methods of inference for spatially stratified infectious disease systems
- The survival function NPMLE for combined right-censored and length-biased right-censored failure time data: properties and applications
- Hybrid classical-Bayesian approach to sample size determination for two-arm superiority clinical trials
- Estimation of a decreasing mean residual life based on ranked set sampling with an application to survival analysis
- Improving the mixed model for repeated measures to robustly increase precision in randomized trials
- Bayesian second-order sensitivity of longitudinal inferences to non-ignorability: an application to antidepressant clinical trial data
- A modified rule of three for the one-sided binomial confidence interval
- Kalman filter with impulse noised outliers: a robust sequential algorithm to filter data with a large number of outliers
- Bayesian estimation and prediction for network meta-analysis with contrast-based approach
- Testing for association between ordinal traits and genetic variants in pedigree-structured samples by collapsing and kernel methods
Articles in the same Issue
- Frontmatter
- Research Articles
- Random forests for survival data: which methods work best and under what conditions?
- Flexible variable selection in the presence of missing data
- An interpretable cluster-based logistic regression model, with application to the characterization of response to therapy in severe eosinophilic asthma
- MBPCA-OS: an exploratory multiblock method for variables of different measurement levels. Application to study the immune response to SARS-CoV-2 infection and vaccination
- Detecting differentially expressed genes from RNA-seq data using fuzzy clustering
- Hypothesis testing for detecting outlier evaluators
- Response to comments on ‘sensitivity of estimands in clinical trials with imperfect compliance’
- Commentary
- Comments on “sensitivity of estimands in clinical trials with imperfect compliance” by Chen and Heitjan
- Research Articles
- Optimizing personalized treatments for targeted patient populations across multiple domains
- Statistical models for assessing agreement for quantitative data with heterogeneous random raters and replicate measurements
- History-restricted marginal structural model and latent class growth analysis of treatment trajectories for a time-dependent outcome
- Revisiting incidence rates comparison under right censorship
- Ensemble learning methods of inference for spatially stratified infectious disease systems
- The survival function NPMLE for combined right-censored and length-biased right-censored failure time data: properties and applications
- Hybrid classical-Bayesian approach to sample size determination for two-arm superiority clinical trials
- Estimation of a decreasing mean residual life based on ranked set sampling with an application to survival analysis
- Improving the mixed model for repeated measures to robustly increase precision in randomized trials
- Bayesian second-order sensitivity of longitudinal inferences to non-ignorability: an application to antidepressant clinical trial data
- A modified rule of three for the one-sided binomial confidence interval
- Kalman filter with impulse noised outliers: a robust sequential algorithm to filter data with a large number of outliers
- Bayesian estimation and prediction for network meta-analysis with contrast-based approach
- Testing for association between ordinal traits and genetic variants in pedigree-structured samples by collapsing and kernel methods