Home Study of triplet periodicity differences inside and between genomes
Article
Licensed
Unlicensed Requires Authentication

Study of triplet periodicity differences inside and between genomes

  • Yulia M. Suvorova EMAIL logo and Eugene V. Korotkov
Published/Copyright: February 24, 2015

Abstract

Triplet periodicity (TP) is a distinctive feature of the protein coding sequences of both prokaryotic and eukaryotic genomes. In this work, we explored the TP difference inside and between 45 prokaryotic genomes. We constructed two hypotheses of TP distribution on a set of coding sequences and generated artificial datasets that correspond to the hypotheses. We found that TP is more similar inside a genome than between genomes and that TP distribution inside a real genome dataset corresponds to the hypothesis which implies that a common TP pattern exists for the majority of sequences inside a genome. Additionally, we performed gene classification based on TP matrixes. This classification showed that TP allows identification of the genome to which a given gene belongs with more than 85% accuracy.


Corresponding author: Yulia M. Suvorova, Bioinformatics Laboratory, Centre of Bioengineering of the Russian Academy of Sciences, 117312, Prospect 60-tya Oktyabrya, Moscow, Russian Federation, e-mail:

Acknowledgments

The work was supported by the Russian Foundation for Basic Research (RFBR) grant 2014-04-00164.

References

Antezana, M. A. and M. Kreitman (1999): “The nonrandom location of synonymous codons suggests that reading frame-independent forces have patterned codon preferences,” J. Mol. Evol., 49, 36–43.Search in Google Scholar

Bernaola-Galván, P., I. Grosse, P. Carpena, J. L. Oliver, R. Román-Roldán and H. E. Stanley (2000): “Finding borders between coding and noncoding DNA regions by an entropic segmentation method,” Phys. Rev. Lett., 85, 1342–1345.Search in Google Scholar

Bohlin, J. and E. Skjerve (2009): “Examination of genome homogeneity in prokaryotes using genomic signatures,” PLoS One, 4, 12.10.1371/journal.pone.0008113Search in Google Scholar PubMed PubMed Central

Bohlin, J., E. Skjerve and D. W. Ussery (2009): “Correction: investigations of oligonucleotide usage variance within and between prokaryotes,” PLoS Comput. Biol., 5, 9.Search in Google Scholar

Bradley, J. V. (1968): Distribution-free statistical tests, Chapter 12, Prentice-Hall, Englewood Cliffs, NJ, USA.Search in Google Scholar

Chen, B. and P. Ji (2012): “Numericalization of the self adaptive spectral rotation method for coding region prediction,” J Theor. Biol., 296, 95–102.Search in Google Scholar

Cover, T. and P. Hart (1967): “Nearest neighbor pattern classification,” IEEE Trans. Inform. Theor., 13, 21–27.Search in Google Scholar

Eskesen, S. T., F. N. Eskesen, B. Kinghorn and A. Ruvinsky (2004): “Periodicity of DNA in exons,” BMC Mol. Biol., 5, 12.Search in Google Scholar

Fickett, J. W. (1982): “Recognition of protein coding regions in DNA sequences,” Nucleic Acids Res., 10, 5303–5318.Search in Google Scholar

Fickett, J. W. and C. S. Tung (1992): “Assessment of protein coding measures,” Nucleic Acids Res., 20, 6441–6450.Search in Google Scholar

Frenkel, F. E. and E. V. Korotkov (2008): “Classification analysis of triplet periodicity in protein-coding regions of genes,” Gene, 421, 52–60.10.1016/j.gene.2008.06.012Search in Google Scholar PubMed

Frenkel, F. E. and E. V. Korotkov (2009): “Using triplet periodicity of nucleotide sequences for finding potential reading frame shifts in genes,” DNA Res., 16, 105–114.Search in Google Scholar

Gao, J., Y. Qi, Y. Cao and W. Tung (2005): “Protein coding sequence identification by simultaneously characterizing the periodic and random features of DNA sequences,” J. Biomed. Biotechnol., 2005, 139–146.Search in Google Scholar

Jose, M. V., T. Govezensky and J. R. Bobadilla (2005): “Statistical properties of DNA sequences revisited: the role of inverse bilateral symmetry in bacterial chromosomes,” Physica A Stat. Mech. Appl., 351, 477–498.Search in Google Scholar

Konopka, A. K. (1994): Sequences and codes: fundamentals of biomolecular cryptology. In: Smith, D. (Ed.), Biocomputing: Informatics and Genome Projects. Academic Press, San Diego, pp. 119–174.10.1016/B978-0-08-092596-7.50008-3Search in Google Scholar

Korotkov, E. V., M. A. Korotkova and V. M. Rudenko (1999): “Latent periodicity of protein sequences,” J. Mol. Model, 5, 103–115. doi:10.1007/s008940050122.10.1007/s008940050122Search in Google Scholar

Korotkov, E. V., M. A. Korotkova and N. A. Kudryashov (2003): “Information decomposition of symbolic sequences,” Phys. Lett. A, 312, 198–210.Search in Google Scholar

Korotkova, M. A., E. V. Korotkov and N. A. Kudryashov (2011): “An approach for searching insertions in bacterial genes leading to the phase shift of triplet periodicity,” Genom. Proteom. Bioinform., 9, 158–170.Search in Google Scholar

Kullback, S. (1997): Information theory and statistics, Dover Publications, New York.Search in Google Scholar

Li, W. (1997): “The study of correlation structures of DNA sequences: a critical review,” Comput. Chem., 21, 257–271.Search in Google Scholar

López-Villaseñor, I., M. V. José and J. Sánchez (2004): “Three-base periodicity patterns and self-similarity in whole bacterial chromosomes,” Biochem. Biophys. Res. Commun., 325, 467–478.Search in Google Scholar

Makeev, V. J. and V. G. Tumanyan (1996): “Search of periodicities in primary structure of biopolymers: a general Fourier approach,” Comput. Appl. Biosci., 12, 49–54.Search in Google Scholar

Mena-Chalco, J. P., H. Carrer, Y. Zana and R. M. Cesar (2008): “Identification of protein coding regions using the modified Gabor-wavelet transform,” IEEE/ACM Trans. Comput. Biol. Bioinform., 5, 198–207. doi:10.1109/TCBB.2007.70259.10.1109/TCBB.2007.70259Search in Google Scholar PubMed

Ogata, H., S. Goto, K. Sato, W. Fujibuchi, H. Bono and M. Kanehisa (1999): “KEGG: Kyoto encyclopedia of genes and genomes,” Nucleic Acids. Res., 27, 29–34.Search in Google Scholar

Pinho, A. J., S. P. Garcia, P. J. S. G. Ferreira, V. Afreixo and J. R. Neves (2010): “Exploring homology using the concept of three-state entropy vector,” LNBI 6282, 161–170.10.1007/978-3-642-16001-1_14Search in Google Scholar

Plotkin, J. B. and G. Kudla (2011): “Synonymous but not the same: the causes and consequences of codon bias,” Nat. Rev. Genet., 12, 32–42.Search in Google Scholar

Sanchez, J. and M. V. Jose (2002): “Analysis of bilateral inverse symmetry in whole bacterial chromosomes,” Biochem. Biophys. Res. Commun., 299, 126–134.Search in Google Scholar

Sánchez, J. and I. López-Villaseñor (2006): “A simple model to explain three-base periodicity in coding DNA,” FEBS Lett., 580, 6413–6422.Search in Google Scholar

Sharp, P. M., E. Cowe, D. G. Higgins, D. C. Shields, K. H. Wolfe and F. Wright (1988): “Codon usage patterns in Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo sapiens; a review of the considerable within-species diversity,” Nucleic Acids Res., 16, 8207–8211.Search in Google Scholar

Shepherd, J. C. (1981): “Periodic correlations in DNA sequences and evidence suggesting their evolutionary origin in a comma-less genetic code,” J. Mol. Evol., 17, 94–102.Search in Google Scholar

Suvorova, Y. M., V. M. Rudenko and E. V. Korotkov (2012): “Detection change points of triplet periodicity of gene,” Gene, 491, 58–64.10.1016/j.gene.2011.08.032Search in Google Scholar PubMed

Suzuki, H., C. J. Brown, L. J. Forney and E. M. Top (2008): “Comparison of correspondence analysis methods for synonymous codon usage in bacteria,” DNA Res., 15, 357–365.Search in Google Scholar

Team, R. C. D. (2011): R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria.Search in Google Scholar

Tiwari, S., S. Ramachandran, A. Bhattacharya, S. Bhattacharya and R. Ramaswamy (1997): “Prediction of probable genes by Fourier analysis of genomic sequences,” Comput. Appl. Biosci., 13, 263–270.Search in Google Scholar

Trifonov, E. N. (1998): “3-, 10.5-, 200- and 400-base periodicities in genome sequences,” Physica A Stat. Mech. Appl., 249, 511–516.Search in Google Scholar

Trifonov, E. N. (1999): “Elucidating sequence codes: three codes for evolution,” Ann. NY Acad. Sci., 870, 330–338.Search in Google Scholar

Trifonov, E. N. and J. L. Sussman (1980): “The pitch of chromatin DNA is reflected in its nucleotide sequence,” Proc. Natl. Acad. Sci. USA, 77, 3816–3820.10.1073/pnas.77.7.3816Search in Google Scholar PubMed PubMed Central

Trotta, E. (2011): “The 3-base periodicity and codon usage of coding sequences are correlated with gene expression at the level of transcription elongation,” PLoS One, 6, 11.10.1371/journal.pone.0021590Search in Google Scholar PubMed PubMed Central

Tsonis, A. A., J. B. Elsner and P. A. Tsonis (1991): “Periodicity in DNA coding sequences: implications in gene evolution,” J. Theor. Biol., 151, 323–331.Search in Google Scholar

Vinga, S. and J. Almeida (2003): “Alignment-free sequence comparison – a review,” Bioinformatics, 19, 513–523.10.1093/bioinformatics/btg005Search in Google Scholar PubMed

Wang, L. and L. D. Stein (2010): “Localizing triplet periodicity in DNA and cDNA sequences,” BMC Bioinform., 11, 550.Search in Google Scholar

Yan, M., Z. S. Lin and C. T. Zhang (1998): “A new Fourier transform approach for protein coding measure based on the format of the Z curve,” Bioinformatics, 14, 685–690.10.1093/bioinformatics/14.8.685Search in Google Scholar PubMed

Yin, C. and S. S.-T. Yau (2007): “Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence,” J. Theor. Biol., 247, 687–694.Search in Google Scholar

Zoltowski, M. (2007): “Is DNA code periodicity only due to CUF-codons usage frequency?” Conf. Proc. Int. Conf. IEEE Eng. Med. Biol. Soc., 2007, 1383–1386.Search in Google Scholar

Published Online: 2015-2-24
Published in Print: 2015-4-1

©2015 by De Gruyter

Downloaded on 30.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/sagmb-2013-0063/html
Scroll to top button