Home Life Sciences Modeling angles in proteins and circular genomes using multivariate angular distributions based on multiple nonnegative trigonometric sums
Article
Licensed
Unlicensed Requires Authentication

Modeling angles in proteins and circular genomes using multivariate angular distributions based on multiple nonnegative trigonometric sums

  • Juan José Fernández-Durán EMAIL logo and MarÍa Mercedes Gregorio-Domínguez
Published/Copyright: January 4, 2014

Abstract

Fernández-Durán, J. J. (2004): “Circular distributions based on nonnegative trigonometric sums,” Biometrics, 60, 499–503, developed a family of univariate circular distributions based on nonnegative trigonometric sums. In this work, we extend this family of distributions to the multivariate case by using multiple nonnegative trigonometric sums to model the joint distribution of a vector of angular random variables. Practical examples of vectors of angular random variables include the wind direction at different monitoring stations, the directions taken by an animal on different occasions, the times at which a person performs different daily activities, and the dihedral angles of a protein molecule. We apply the proposed new family of multivariate distributions to three real data-sets: two for the study of protein structure and one for genomics. The first is related to the study of a bivariate vector of dihedral angles in proteins. In the second real data-set, we compare the fit of the proposed multivariate model with the bivariate generalized von Mises model of [Shieh, G. S., S. Zheng, R. A. Johnson, Y.-F. Chang, K. Shimizu, C.-C. Wang, and S.-L. Tang (2011): “Modeling and comparing the organization of circular genomes,” Bioinformatics, 27(7), 912–918.] in a problem related to orthologous genes in pairs of circular genomes. The third real data-set consists of observed values of three dihedral angles in γ-turns in a protein and serves as an example of trivariate angular data. In addition, a simulation algorithm is presented to generate realizations from the proposed multivariate angular distribution.


Corresponding author: Juan José Fernández-Durán, School of Business and Department of Statistics, Instituto Tecnológico Autónomo de México, Río Hondo 1, Col. Progreso Tizapán, C.P. 01080, México D.F., México, e-mail:

The authors wish to thank the Asociación Mexicana de Cultura A.C. for its support.

References

Absil, P.-A., R. Mahony and R. Sepulchre (2008): Optimization algorithms on matrix manifolds, Princeton University Press, Princeton, New Jersey.10.1515/9781400830244Search in Google Scholar

Boomsma, W., K. Mardia, C. Taylor, J. Ferkinghoff-Borg, A. Krogh and T. Hamelryck (2008): “A generative, probabilistic model of local protein structure,” Proc. Natl. Acad. Sci. USA, 105, 8932–8937.10.1073/pnas.0801715105Search in Google Scholar PubMed PubMed Central

Chakrabarti, P. and D. Pal (2001): “The interrelationships of side-chain and main-chain conformations in proteins,” Prog. Biophys. Mol. Bio., 76(1–2), 1–102.Search in Google Scholar

Dayalan, S., N. D. Gooneratne, S. Bevinakoppa and H. Schroder (2006): “Dihedral angle and secondary structure database of short amino acid fragments,” Bioinformation, 1–3, 78–80.10.6026/97320630001078Search in Google Scholar PubMed PubMed Central

Devroye, L. (1986): Non-uniform random variate generation, Springer Verlag, New York.10.1007/978-1-4613-8643-8Search in Google Scholar

Fejér, L. (1915): “Über Trigonometrische Polynome,” Journal für die Reine und Angewandte Mathematik, 146, 53–82.10.1515/crll.1916.146.53Search in Google Scholar

Fernández-Durán, J. J. (2004): “Circular distributions based on nonnegative trigonometric sums,” Biometrics, 60, 499–503.10.1111/j.0006-341X.2004.00195.xSearch in Google Scholar PubMed

Fernández-Durán, J. J. (2007): “Models for circular-linear and circular-circular data constructed from circular distributions based on nonnegative trigonometric sums,” Biometrics, 63, 579–585.10.1111/j.1541-0420.2006.00716.xSearch in Google Scholar PubMed

Fernández-Durán, J. J. and M. M. Gregorio-Domínguez (2010): “Maximum likelihood estimation of nonnegative trigonometric sums models using a newton-like algorithm on manifolds,” Electron. J. Stat., 4, 1402–1410. doi:10.1214/10-EJS587.10.1214/10-EJS587Search in Google Scholar

Fernández-Durán, J. J. and M. M. Gregorio-Domínguez (2013): CircNNTSR: An R Package for the Statistical Analysis of Circular Data Using Nonnegative Trigonometric Sums (NNTS) Models. R package version 2.1. http://CRAN.R-project.org/package=CircNNTSR.Search in Google Scholar

Fisher, N. I. (1993): Statistical analysis of circular data, Cambridge University Press, Cambridge, UK.10.1017/CBO9780511564345Search in Google Scholar

Guruprasad, K., M. S. Prasad and G. R. Kumar (2000): “Database of structural motifs in proteins,” Bioinformatics, 16(4), 372–375.10.1093/bioinformatics/16.4.372Search in Google Scholar PubMed

Hamelryck, T., J.T. Kent and A. Krogh (2006): “Sampling realistic protein conformations using local structural bias,” PLoS Comput. Biol., 2(9), e131. DOI: 10.1371/journal.pcbi.0020131.10.1371/journal.pcbi.0020131Search in Google Scholar PubMed PubMed Central

Ho, B. K., A. Thomas and R. Brasseur (2003): “Revisiting the Ramachandran plot: hard-sphere repulsion, electrostatics, and H-bonding in the α-helix,” Protein Sci., 12, 2508–2522.Search in Google Scholar

Hommola K., W. R. Gilks and K. V. Mardia (2011): “Log-linear modelling of protein dipeptide structure reveals interesting patterns of side-chain-backbone interactions,” Stat. Appl. Genet. Mol. Biol., 10(1), Article 8, 1–29.Search in Google Scholar

Hovmöller, S., T. Zhou and T. Ohlson (2002): “Conformations of amino acids in proteins,” Acta Crystallogr. D, D58, 768–776.Search in Google Scholar

Jammalamadaka, S. and Y. Sarma (1988): A correlation coefficient for angular variables. In: Matusita, K. (Ed.), Statistical Theory and Data Analysis II, 349–364, North Holland.Search in Google Scholar

Jammalamadaka, S. R. and A. SenGupta (2001): Topics in circular statistics, World Scientific Publishing, Co., Singapore.10.1142/4031Search in Google Scholar

Johnson, R. A. and T. Wehrly (1977): “Measures and models for angular correlation and angular-linear correlation,” J. Roy. Stat. Soc. B, 39, 222–229.Search in Google Scholar

Joo H., A. G. Chavan, R. Day, K. P. Lennox, P. Sukhanov, D. B. Dahl, M. Vannucci and J. Tsai (2011): “Near-native protein loop sampling using nonparametric density estimation accommodating sparcity,” PLoS Comput. Biol., 7(10), e1002234. doi:10.1371/journal.pcbi.100223410.1371/journal.pcbi.1002234Search in Google Scholar PubMed PubMed Central

Kessel, A., and N. Ben-Tal (2010): Introduction to proteins: structure, function and motion, CRC Press.10.1201/b10456Search in Google Scholar

Laskowski, R. A., M. W. MacArthur, D. S. Moss and J. M. Thornton (1993): “PROCHECK: a program to check the stereochemical quality of protein structures,” J. Appl. Crystallogr., 26, 283–91.Search in Google Scholar

Lennox, K. P., D. B. Dahl, M. Vannucci and J. W. Tsay (2009): “Density estimation for protein conformation angles using a bivariate von mises distribution and Bayesian nonparametrics,” J. Am. Stat. Assoc., 104(486), 586–596.10.1198/jasa.2009.0024Search in Google Scholar PubMed PubMed Central

Lennox, K. P., D. B. Dahl, M. Vannucci, R. Day and J. W. Tsai (2010): “A Dirichlet process mixture of hidden Markov models for protein structure prediction,” Ann. Appl. Stat., 4, 916–962.Search in Google Scholar

Lovell, S. C., I. W. Davis, W. B. Arendall III, P. I. W. de Bakker, J. M. Word, M. G. Prisant, J. S. Richardson and D. C. Richardson (2003): “Structure validation by Calpha Geometry: Phi, Psi and Cbeta deviation,” Proteins, 50, 437–450.10.1002/prot.10286Search in Google Scholar PubMed

Mardia, K. V. (1975): “Statistics of directional data (with Discussion),” J. Roy. Stat. Soc. B, 37(3), 349–393.Search in Google Scholar

Mardia, K. V. (2013): “Statistical approaches to three key challenges in protein structural bioinformatics,” J. Roy. Stat. Soc. C-App, 62–3, 487–514.10.1111/rssc.12003Search in Google Scholar

Mardia, K. V. and P. E. Jupp (2000): Directional statistics, John Wiley and Sons, Chichester, West Sussex, England.10.1002/9780470316979Search in Google Scholar

Mardia, K. V., C. C. Taylor and G. K. Subramaniam (2007): “Protein bioinformatics and mixtures of bivariate von mises distributions for angular data,” Biometrics, 63, 505–512.10.1111/j.1541-0420.2006.00682.xSearch in Google Scholar PubMed

Mardia, K. V., G. Hughes, C. C. Taylor and H. Singh (2008): “A multivariate von mises distribution with applications to bioinformatics,” Can. J. Stat., 36(1), 99–109.Search in Google Scholar

Mathews, C. K., K. E. van Holde and K. G. Ahern (2000): Biochemistry, 3rd ed. Addison-Wesley Longman, Inc., San Francisco, CA, USA.Search in Google Scholar

Morris, A. L., M. W. MacArthur, E. G. Hutchinson and J. M. Thornton (1992): “Stereochemical quality of protein structure coordinates,” Proteins, 12, 345–364.10.1002/prot.340120407Search in Google Scholar PubMed

R Development Core Team (2013): R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.Search in Google Scholar

Ramachandran, G. N., C. Ramakrishnan and V. Sasisekharan (1963): “Stereochemistry of polypeptide chain configuration,” J. Mol. Biol., 7, 95–99.Search in Google Scholar

Shieh, G. S. and R. A. Johnson (2005): “Inferences based on a bivariate distribution with von mises marginals,” Ann. I. Stat. Math., 57, 789–802.Search in Google Scholar

Shieh, G. S., S. Zheng, R. A. Johnson, Y.-F. Chang, K. Shimizu, C.-C. Wang, and S.-L. Tang (2011): “Modeling and comparing the organization of circular genomes,” Bioinformatics, 27(7), 912–918.10.1093/bioinformatics/btr049Search in Google Scholar PubMed PubMed Central

Singh, H., V. Hnizdo and E. Demchuk (2002): “Probabilistic model for two dependent circular variables,” Biometrika, 89–3, 719–723.10.1093/biomet/89.3.719Search in Google Scholar

Ting, D., G. Wang, M. Shapovalov, R. Mitra, M. I. Jordan and R. L. Dunbrack, Jr. (2010): “Neighbor-dependent Ramachandran probability distributions of amino acids developed from a hierarchical Dirichlet process model,” PLos Comput. Biol., 6(4), 1–21.Search in Google Scholar

Upton, G. J. G. and B. Fingleton (1989): Spatial data analysis by example Vol. 2 (Categorical and Directional Data), John Wiley and Sons, Chichester, West Sussex, England.Search in Google Scholar

Wehrly, T., and R. A. Johnson (1980): “Bivariate models for dependence of angular observations and a related Markov process,” Biometrika, 67, 255–256.10.1093/biomet/67.1.255Search in Google Scholar

Zhao, F., S. Li, B. W. Sterner, and J. Xu (2008): “Discriminative learning for protein conformation sampling,” Proteins, 73(1), 228–240.10.1002/prot.22057Search in Google Scholar PubMed PubMed Central

Published Online: 2014-01-04
Published in Print: 2014-02-01

©2014 by Walter de Gruyter Berlin Boston

Downloaded on 6.2.2026 from https://www.degruyterbrill.com/document/doi/10.1515/sagmb-2012-0012/html
Scroll to top button