Abstract
Fernández-Durán, J. J. (2004): “Circular distributions based on nonnegative trigonometric sums,” Biometrics, 60, 499–503, developed a family of univariate circular distributions based on nonnegative trigonometric sums. In this work, we extend this family of distributions to the multivariate case by using multiple nonnegative trigonometric sums to model the joint distribution of a vector of angular random variables. Practical examples of vectors of angular random variables include the wind direction at different monitoring stations, the directions taken by an animal on different occasions, the times at which a person performs different daily activities, and the dihedral angles of a protein molecule. We apply the proposed new family of multivariate distributions to three real data-sets: two for the study of protein structure and one for genomics. The first is related to the study of a bivariate vector of dihedral angles in proteins. In the second real data-set, we compare the fit of the proposed multivariate model with the bivariate generalized von Mises model of [Shieh, G. S., S. Zheng, R. A. Johnson, Y.-F. Chang, K. Shimizu, C.-C. Wang, and S.-L. Tang (2011): “Modeling and comparing the organization of circular genomes,” Bioinformatics, 27(7), 912–918.] in a problem related to orthologous genes in pairs of circular genomes. The third real data-set consists of observed values of three dihedral angles in γ-turns in a protein and serves as an example of trivariate angular data. In addition, a simulation algorithm is presented to generate realizations from the proposed multivariate angular distribution.
The authors wish to thank the Asociación Mexicana de Cultura A.C. for its support.
References
Absil, P.-A., R. Mahony and R. Sepulchre (2008): Optimization algorithms on matrix manifolds, Princeton University Press, Princeton, New Jersey.10.1515/9781400830244Search in Google Scholar
Boomsma, W., K. Mardia, C. Taylor, J. Ferkinghoff-Borg, A. Krogh and T. Hamelryck (2008): “A generative, probabilistic model of local protein structure,” Proc. Natl. Acad. Sci. USA, 105, 8932–8937.10.1073/pnas.0801715105Search in Google Scholar PubMed PubMed Central
Chakrabarti, P. and D. Pal (2001): “The interrelationships of side-chain and main-chain conformations in proteins,” Prog. Biophys. Mol. Bio., 76(1–2), 1–102.Search in Google Scholar
Dayalan, S., N. D. Gooneratne, S. Bevinakoppa and H. Schroder (2006): “Dihedral angle and secondary structure database of short amino acid fragments,” Bioinformation, 1–3, 78–80.10.6026/97320630001078Search in Google Scholar PubMed PubMed Central
Devroye, L. (1986): Non-uniform random variate generation, Springer Verlag, New York.10.1007/978-1-4613-8643-8Search in Google Scholar
Fejér, L. (1915): “Über Trigonometrische Polynome,” Journal für die Reine und Angewandte Mathematik, 146, 53–82.10.1515/crll.1916.146.53Search in Google Scholar
Fernández-Durán, J. J. (2004): “Circular distributions based on nonnegative trigonometric sums,” Biometrics, 60, 499–503.10.1111/j.0006-341X.2004.00195.xSearch in Google Scholar PubMed
Fernández-Durán, J. J. (2007): “Models for circular-linear and circular-circular data constructed from circular distributions based on nonnegative trigonometric sums,” Biometrics, 63, 579–585.10.1111/j.1541-0420.2006.00716.xSearch in Google Scholar PubMed
Fernández-Durán, J. J. and M. M. Gregorio-Domínguez (2010): “Maximum likelihood estimation of nonnegative trigonometric sums models using a newton-like algorithm on manifolds,” Electron. J. Stat., 4, 1402–1410. doi:10.1214/10-EJS587.10.1214/10-EJS587Search in Google Scholar
Fernández-Durán, J. J. and M. M. Gregorio-Domínguez (2013): CircNNTSR: An R Package for the Statistical Analysis of Circular Data Using Nonnegative Trigonometric Sums (NNTS) Models. R package version 2.1. http://CRAN.R-project.org/package=CircNNTSR.Search in Google Scholar
Fisher, N. I. (1993): Statistical analysis of circular data, Cambridge University Press, Cambridge, UK.10.1017/CBO9780511564345Search in Google Scholar
Guruprasad, K., M. S. Prasad and G. R. Kumar (2000): “Database of structural motifs in proteins,” Bioinformatics, 16(4), 372–375.10.1093/bioinformatics/16.4.372Search in Google Scholar PubMed
Hamelryck, T., J.T. Kent and A. Krogh (2006): “Sampling realistic protein conformations using local structural bias,” PLoS Comput. Biol., 2(9), e131. DOI: 10.1371/journal.pcbi.0020131.10.1371/journal.pcbi.0020131Search in Google Scholar PubMed PubMed Central
Ho, B. K., A. Thomas and R. Brasseur (2003): “Revisiting the Ramachandran plot: hard-sphere repulsion, electrostatics, and H-bonding in the α-helix,” Protein Sci., 12, 2508–2522.Search in Google Scholar
Hommola K., W. R. Gilks and K. V. Mardia (2011): “Log-linear modelling of protein dipeptide structure reveals interesting patterns of side-chain-backbone interactions,” Stat. Appl. Genet. Mol. Biol., 10(1), Article 8, 1–29.Search in Google Scholar
Hovmöller, S., T. Zhou and T. Ohlson (2002): “Conformations of amino acids in proteins,” Acta Crystallogr. D, D58, 768–776.Search in Google Scholar
Jammalamadaka, S. and Y. Sarma (1988): A correlation coefficient for angular variables. In: Matusita, K. (Ed.), Statistical Theory and Data Analysis II, 349–364, North Holland.Search in Google Scholar
Jammalamadaka, S. R. and A. SenGupta (2001): Topics in circular statistics, World Scientific Publishing, Co., Singapore.10.1142/4031Search in Google Scholar
Johnson, R. A. and T. Wehrly (1977): “Measures and models for angular correlation and angular-linear correlation,” J. Roy. Stat. Soc. B, 39, 222–229.Search in Google Scholar
Joo H., A. G. Chavan, R. Day, K. P. Lennox, P. Sukhanov, D. B. Dahl, M. Vannucci and J. Tsai (2011): “Near-native protein loop sampling using nonparametric density estimation accommodating sparcity,” PLoS Comput. Biol., 7(10), e1002234. doi:10.1371/journal.pcbi.100223410.1371/journal.pcbi.1002234Search in Google Scholar PubMed PubMed Central
Kessel, A., and N. Ben-Tal (2010): Introduction to proteins: structure, function and motion, CRC Press.10.1201/b10456Search in Google Scholar
Laskowski, R. A., M. W. MacArthur, D. S. Moss and J. M. Thornton (1993): “PROCHECK: a program to check the stereochemical quality of protein structures,” J. Appl. Crystallogr., 26, 283–91.Search in Google Scholar
Lennox, K. P., D. B. Dahl, M. Vannucci and J. W. Tsay (2009): “Density estimation for protein conformation angles using a bivariate von mises distribution and Bayesian nonparametrics,” J. Am. Stat. Assoc., 104(486), 586–596.10.1198/jasa.2009.0024Search in Google Scholar PubMed PubMed Central
Lennox, K. P., D. B. Dahl, M. Vannucci, R. Day and J. W. Tsai (2010): “A Dirichlet process mixture of hidden Markov models for protein structure prediction,” Ann. Appl. Stat., 4, 916–962.Search in Google Scholar
Lovell, S. C., I. W. Davis, W. B. Arendall III, P. I. W. de Bakker, J. M. Word, M. G. Prisant, J. S. Richardson and D. C. Richardson (2003): “Structure validation by Calpha Geometry: Phi, Psi and Cbeta deviation,” Proteins, 50, 437–450.10.1002/prot.10286Search in Google Scholar PubMed
Mardia, K. V. (1975): “Statistics of directional data (with Discussion),” J. Roy. Stat. Soc. B, 37(3), 349–393.Search in Google Scholar
Mardia, K. V. (2013): “Statistical approaches to three key challenges in protein structural bioinformatics,” J. Roy. Stat. Soc. C-App, 62–3, 487–514.10.1111/rssc.12003Search in Google Scholar
Mardia, K. V. and P. E. Jupp (2000): Directional statistics, John Wiley and Sons, Chichester, West Sussex, England.10.1002/9780470316979Search in Google Scholar
Mardia, K. V., C. C. Taylor and G. K. Subramaniam (2007): “Protein bioinformatics and mixtures of bivariate von mises distributions for angular data,” Biometrics, 63, 505–512.10.1111/j.1541-0420.2006.00682.xSearch in Google Scholar PubMed
Mardia, K. V., G. Hughes, C. C. Taylor and H. Singh (2008): “A multivariate von mises distribution with applications to bioinformatics,” Can. J. Stat., 36(1), 99–109.Search in Google Scholar
Mathews, C. K., K. E. van Holde and K. G. Ahern (2000): Biochemistry, 3rd ed. Addison-Wesley Longman, Inc., San Francisco, CA, USA.Search in Google Scholar
Morris, A. L., M. W. MacArthur, E. G. Hutchinson and J. M. Thornton (1992): “Stereochemical quality of protein structure coordinates,” Proteins, 12, 345–364.10.1002/prot.340120407Search in Google Scholar PubMed
R Development Core Team (2013): R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.Search in Google Scholar
Ramachandran, G. N., C. Ramakrishnan and V. Sasisekharan (1963): “Stereochemistry of polypeptide chain configuration,” J. Mol. Biol., 7, 95–99.Search in Google Scholar
Shieh, G. S. and R. A. Johnson (2005): “Inferences based on a bivariate distribution with von mises marginals,” Ann. I. Stat. Math., 57, 789–802.Search in Google Scholar
Shieh, G. S., S. Zheng, R. A. Johnson, Y.-F. Chang, K. Shimizu, C.-C. Wang, and S.-L. Tang (2011): “Modeling and comparing the organization of circular genomes,” Bioinformatics, 27(7), 912–918.10.1093/bioinformatics/btr049Search in Google Scholar PubMed PubMed Central
Singh, H., V. Hnizdo and E. Demchuk (2002): “Probabilistic model for two dependent circular variables,” Biometrika, 89–3, 719–723.10.1093/biomet/89.3.719Search in Google Scholar
Ting, D., G. Wang, M. Shapovalov, R. Mitra, M. I. Jordan and R. L. Dunbrack, Jr. (2010): “Neighbor-dependent Ramachandran probability distributions of amino acids developed from a hierarchical Dirichlet process model,” PLos Comput. Biol., 6(4), 1–21.Search in Google Scholar
Upton, G. J. G. and B. Fingleton (1989): Spatial data analysis by example Vol. 2 (Categorical and Directional Data), John Wiley and Sons, Chichester, West Sussex, England.Search in Google Scholar
Wehrly, T., and R. A. Johnson (1980): “Bivariate models for dependence of angular observations and a related Markov process,” Biometrika, 67, 255–256.10.1093/biomet/67.1.255Search in Google Scholar
Zhao, F., S. Li, B. W. Sterner, and J. Xu (2008): “Discriminative learning for protein conformation sampling,” Proteins, 73(1), 228–240.10.1002/prot.22057Search in Google Scholar PubMed PubMed Central
©2014 by Walter de Gruyter Berlin Boston
Articles in the same Issue
- Masthead
- Masthead
- Research Articles
- Modeling angles in proteins and circular genomes using multivariate angular distributions based on multiple nonnegative trigonometric sums
- Second order optimization for the inference of gene regulatory pathways
- Multiple comparisons in genetic association studies: a hierarchical modeling approach
- A Bayesian decision procedure for testing multiple hypotheses in DNA microarray experiments
- Semi-automatic selection of summary statistics for ABC model choice
- Detection of epistatic effects with logic regression and a classical linear regression model
- Bayesian clustering of DNA sequences using Markov chains and a stochastic partition model
Articles in the same Issue
- Masthead
- Masthead
- Research Articles
- Modeling angles in proteins and circular genomes using multivariate angular distributions based on multiple nonnegative trigonometric sums
- Second order optimization for the inference of gene regulatory pathways
- Multiple comparisons in genetic association studies: a hierarchical modeling approach
- A Bayesian decision procedure for testing multiple hypotheses in DNA microarray experiments
- Semi-automatic selection of summary statistics for ABC model choice
- Detection of epistatic effects with logic regression and a classical linear regression model
- Bayesian clustering of DNA sequences using Markov chains and a stochastic partition model