Abstract
Triplet periodicity (TP) is a distinctive feature of the protein coding sequences of both prokaryotic and eukaryotic genomes. In this work, we explored the TP difference inside and between 45 prokaryotic genomes. We constructed two hypotheses of TP distribution on a set of coding sequences and generated artificial datasets that correspond to the hypotheses. We found that TP is more similar inside a genome than between genomes and that TP distribution inside a real genome dataset corresponds to the hypothesis which implies that a common TP pattern exists for the majority of sequences inside a genome. Additionally, we performed gene classification based on TP matrixes. This classification showed that TP allows identification of the genome to which a given gene belongs with more than 85% accuracy.
Acknowledgments
The work was supported by the Russian Foundation for Basic Research (RFBR) grant 2014-04-00164.
References
Antezana, M. A. and M. Kreitman (1999): “The nonrandom location of synonymous codons suggests that reading frame-independent forces have patterned codon preferences,” J. Mol. Evol., 49, 36–43.Search in Google Scholar
Bernaola-Galván, P., I. Grosse, P. Carpena, J. L. Oliver, R. Román-Roldán and H. E. Stanley (2000): “Finding borders between coding and noncoding DNA regions by an entropic segmentation method,” Phys. Rev. Lett., 85, 1342–1345.Search in Google Scholar
Bohlin, J. and E. Skjerve (2009): “Examination of genome homogeneity in prokaryotes using genomic signatures,” PLoS One, 4, 12.10.1371/journal.pone.0008113Search in Google Scholar PubMed PubMed Central
Bohlin, J., E. Skjerve and D. W. Ussery (2009): “Correction: investigations of oligonucleotide usage variance within and between prokaryotes,” PLoS Comput. Biol., 5, 9.Search in Google Scholar
Bradley, J. V. (1968): Distribution-free statistical tests, Chapter 12, Prentice-Hall, Englewood Cliffs, NJ, USA.Search in Google Scholar
Chen, B. and P. Ji (2012): “Numericalization of the self adaptive spectral rotation method for coding region prediction,” J Theor. Biol., 296, 95–102.Search in Google Scholar
Cover, T. and P. Hart (1967): “Nearest neighbor pattern classification,” IEEE Trans. Inform. Theor., 13, 21–27.Search in Google Scholar
Eskesen, S. T., F. N. Eskesen, B. Kinghorn and A. Ruvinsky (2004): “Periodicity of DNA in exons,” BMC Mol. Biol., 5, 12.Search in Google Scholar
Fickett, J. W. (1982): “Recognition of protein coding regions in DNA sequences,” Nucleic Acids Res., 10, 5303–5318.Search in Google Scholar
Fickett, J. W. and C. S. Tung (1992): “Assessment of protein coding measures,” Nucleic Acids Res., 20, 6441–6450.Search in Google Scholar
Frenkel, F. E. and E. V. Korotkov (2008): “Classification analysis of triplet periodicity in protein-coding regions of genes,” Gene, 421, 52–60.10.1016/j.gene.2008.06.012Search in Google Scholar PubMed
Frenkel, F. E. and E. V. Korotkov (2009): “Using triplet periodicity of nucleotide sequences for finding potential reading frame shifts in genes,” DNA Res., 16, 105–114.Search in Google Scholar
Gao, J., Y. Qi, Y. Cao and W. Tung (2005): “Protein coding sequence identification by simultaneously characterizing the periodic and random features of DNA sequences,” J. Biomed. Biotechnol., 2005, 139–146.Search in Google Scholar
Jose, M. V., T. Govezensky and J. R. Bobadilla (2005): “Statistical properties of DNA sequences revisited: the role of inverse bilateral symmetry in bacterial chromosomes,” Physica A Stat. Mech. Appl., 351, 477–498.Search in Google Scholar
Konopka, A. K. (1994): Sequences and codes: fundamentals of biomolecular cryptology. In: Smith, D. (Ed.), Biocomputing: Informatics and Genome Projects. Academic Press, San Diego, pp. 119–174.10.1016/B978-0-08-092596-7.50008-3Search in Google Scholar
Korotkov, E. V., M. A. Korotkova and V. M. Rudenko (1999): “Latent periodicity of protein sequences,” J. Mol. Model, 5, 103–115. doi:10.1007/s008940050122.10.1007/s008940050122Search in Google Scholar
Korotkov, E. V., M. A. Korotkova and N. A. Kudryashov (2003): “Information decomposition of symbolic sequences,” Phys. Lett. A, 312, 198–210.Search in Google Scholar
Korotkova, M. A., E. V. Korotkov and N. A. Kudryashov (2011): “An approach for searching insertions in bacterial genes leading to the phase shift of triplet periodicity,” Genom. Proteom. Bioinform., 9, 158–170.Search in Google Scholar
Kullback, S. (1997): Information theory and statistics, Dover Publications, New York.Search in Google Scholar
Li, W. (1997): “The study of correlation structures of DNA sequences: a critical review,” Comput. Chem., 21, 257–271.Search in Google Scholar
López-Villaseñor, I., M. V. José and J. Sánchez (2004): “Three-base periodicity patterns and self-similarity in whole bacterial chromosomes,” Biochem. Biophys. Res. Commun., 325, 467–478.Search in Google Scholar
Makeev, V. J. and V. G. Tumanyan (1996): “Search of periodicities in primary structure of biopolymers: a general Fourier approach,” Comput. Appl. Biosci., 12, 49–54.Search in Google Scholar
Mena-Chalco, J. P., H. Carrer, Y. Zana and R. M. Cesar (2008): “Identification of protein coding regions using the modified Gabor-wavelet transform,” IEEE/ACM Trans. Comput. Biol. Bioinform., 5, 198–207. doi:10.1109/TCBB.2007.70259.10.1109/TCBB.2007.70259Search in Google Scholar PubMed
Ogata, H., S. Goto, K. Sato, W. Fujibuchi, H. Bono and M. Kanehisa (1999): “KEGG: Kyoto encyclopedia of genes and genomes,” Nucleic Acids. Res., 27, 29–34.Search in Google Scholar
Pinho, A. J., S. P. Garcia, P. J. S. G. Ferreira, V. Afreixo and J. R. Neves (2010): “Exploring homology using the concept of three-state entropy vector,” LNBI 6282, 161–170.10.1007/978-3-642-16001-1_14Search in Google Scholar
Plotkin, J. B. and G. Kudla (2011): “Synonymous but not the same: the causes and consequences of codon bias,” Nat. Rev. Genet., 12, 32–42.Search in Google Scholar
Sanchez, J. and M. V. Jose (2002): “Analysis of bilateral inverse symmetry in whole bacterial chromosomes,” Biochem. Biophys. Res. Commun., 299, 126–134.Search in Google Scholar
Sánchez, J. and I. López-Villaseñor (2006): “A simple model to explain three-base periodicity in coding DNA,” FEBS Lett., 580, 6413–6422.Search in Google Scholar
Sharp, P. M., E. Cowe, D. G. Higgins, D. C. Shields, K. H. Wolfe and F. Wright (1988): “Codon usage patterns in Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo sapiens; a review of the considerable within-species diversity,” Nucleic Acids Res., 16, 8207–8211.Search in Google Scholar
Shepherd, J. C. (1981): “Periodic correlations in DNA sequences and evidence suggesting their evolutionary origin in a comma-less genetic code,” J. Mol. Evol., 17, 94–102.Search in Google Scholar
Suvorova, Y. M., V. M. Rudenko and E. V. Korotkov (2012): “Detection change points of triplet periodicity of gene,” Gene, 491, 58–64.10.1016/j.gene.2011.08.032Search in Google Scholar PubMed
Suzuki, H., C. J. Brown, L. J. Forney and E. M. Top (2008): “Comparison of correspondence analysis methods for synonymous codon usage in bacteria,” DNA Res., 15, 357–365.Search in Google Scholar
Team, R. C. D. (2011): R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria.Search in Google Scholar
Tiwari, S., S. Ramachandran, A. Bhattacharya, S. Bhattacharya and R. Ramaswamy (1997): “Prediction of probable genes by Fourier analysis of genomic sequences,” Comput. Appl. Biosci., 13, 263–270.Search in Google Scholar
Trifonov, E. N. (1998): “3-, 10.5-, 200- and 400-base periodicities in genome sequences,” Physica A Stat. Mech. Appl., 249, 511–516.Search in Google Scholar
Trifonov, E. N. (1999): “Elucidating sequence codes: three codes for evolution,” Ann. NY Acad. Sci., 870, 330–338.Search in Google Scholar
Trifonov, E. N. and J. L. Sussman (1980): “The pitch of chromatin DNA is reflected in its nucleotide sequence,” Proc. Natl. Acad. Sci. USA, 77, 3816–3820.10.1073/pnas.77.7.3816Search in Google Scholar PubMed PubMed Central
Trotta, E. (2011): “The 3-base periodicity and codon usage of coding sequences are correlated with gene expression at the level of transcription elongation,” PLoS One, 6, 11.10.1371/journal.pone.0021590Search in Google Scholar PubMed PubMed Central
Tsonis, A. A., J. B. Elsner and P. A. Tsonis (1991): “Periodicity in DNA coding sequences: implications in gene evolution,” J. Theor. Biol., 151, 323–331.Search in Google Scholar
Vinga, S. and J. Almeida (2003): “Alignment-free sequence comparison – a review,” Bioinformatics, 19, 513–523.10.1093/bioinformatics/btg005Search in Google Scholar PubMed
Wang, L. and L. D. Stein (2010): “Localizing triplet periodicity in DNA and cDNA sequences,” BMC Bioinform., 11, 550.Search in Google Scholar
Yan, M., Z. S. Lin and C. T. Zhang (1998): “A new Fourier transform approach for protein coding measure based on the format of the Z curve,” Bioinformatics, 14, 685–690.10.1093/bioinformatics/14.8.685Search in Google Scholar PubMed
Yin, C. and S. S.-T. Yau (2007): “Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence,” J. Theor. Biol., 247, 687–694.Search in Google Scholar
Zoltowski, M. (2007): “Is DNA code periodicity only due to CUF-codons usage frequency?” Conf. Proc. Int. Conf. IEEE Eng. Med. Biol. Soc., 2007, 1383–1386.Search in Google Scholar
©2015 by De Gruyter
Articles in the same Issue
- Frontmatter
- Research Articles
- Study of triplet periodicity differences inside and between genomes
- H-CLAP: hierarchical clustering within a linear array with an application in genetics
- Inferring bi-directional interactions between circadian clock genes and metabolism with model ensembles
- Bayesian inference for Markov jump processes with informative observations
- Likelihood free inference for Markov processes: a comparison
- Spatio-temporal model for multiple ChIP-seq experiments
- Software and Application Note
- GenePEN: analysis of network activity alterations in complex diseases via the pairwise elastic net
- Corrigendum
- Corrigendum to: Simple estimators of false discovery rates given as few as one or two p-values without strong parametric assumptions
Articles in the same Issue
- Frontmatter
- Research Articles
- Study of triplet periodicity differences inside and between genomes
- H-CLAP: hierarchical clustering within a linear array with an application in genetics
- Inferring bi-directional interactions between circadian clock genes and metabolism with model ensembles
- Bayesian inference for Markov jump processes with informative observations
- Likelihood free inference for Markov processes: a comparison
- Spatio-temporal model for multiple ChIP-seq experiments
- Software and Application Note
- GenePEN: analysis of network activity alterations in complex diseases via the pairwise elastic net
- Corrigendum
- Corrigendum to: Simple estimators of false discovery rates given as few as one or two p-values without strong parametric assumptions