Is the growth rate of Protein Data Bank sufficient to solve the protein structure prediction problem using template-based modeling?

Michal Brylinski

doi:10.1515/bams-2014-0024

Artikel

Is the growth rate of Protein Data Bank sufficient to solve the protein structure prediction problem using template-based modeling?

Michal Brylinski

Veröffentlicht/Copyright: 7. Februar 2015

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Informationen für Autor*innen Erkunden Sie dieses Fachgebiet

Aus der Zeitschrift Bio-Algorithms and Med-Systems Band 11 Heft 1

Abstract

The Protein Data Bank (PDB) undergoes an exponential expansion in terms of the number of macromolecular structures deposited every year. A pivotal question is how this rapid growth of structural information improves the quality of three-dimensional models constructed by contemporary bioinformatics approaches. To address this problem, we performed a retrospective analysis of the structural coverage of a representative set of proteins using remote homology detected by COMPASS and HHpred. We show that the number of proteins whose structures can be confidently predicted increased during a 9-year period between 2005 and 2014 on account of the PDB growth alone. Nevertheless, this encouraging trend slowed down noticeably around the year 2008 and has yielded insignificant improvements ever since. At the current pace, it is unlikely that the protein structure prediction problem will be solved in the near future using existing template-based modeling techniques. Therefore, further advances in experimental structure determination, qualitatively better approaches in fold recognition, and more accurate template-free structure prediction methods are desperately needed.

Keywords: comparative modeling; COMPASS; HHpred; Protein Data Bank; protein fold recognition; protein structure prediction; protein threading; template-based modeling

Corresponding author: Michal Brylinski, Department of Biological Sciences, 202 Life Sciences Bldg., Louisiana State University, Baton Rouge, LA 70803, USA; and Center for Computation and Technology, 2054 Digital Media Center, Louisiana State University, Baton Rouge, LA 70803, USA, E-mail: michal@brylinski.org.

Acknowledgments

Portions of this research were conducted with high-performance computational resources provided by the Louisiana State University (HPC@LSU; http://www.hpc.lsu.edu) and the Louisiana Optical Network Institute (LONI; http://www.loni.org). We thank Dr. Wei Feinstein who read the manuscript and provided critical comments.

Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: This work was supported by the Louisiana Board of Regents through the Board of Regents Support Fund [contract LEQSF(2012-15)-RD-A-05].
Employment or leadership: None declared.
Honorarium: None declared.
Competing interests: The funding organization(s) played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the report for publication.

References

1. Pauling L. Modern structural chemistry. Nobel Lecture: December 11, 1954.Suche in Google Scholar

2. Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res 2014;42:D756–763.10.1093/nar/gkt1114Suche in Google Scholar PubMed PubMed Central

3. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The Protein Data Bank. Nucleic Acids Res 2000;28:235–42.10.1093/nar/28.1.235Suche in Google Scholar PubMed PubMed Central

4. Guo JT, Ellrott K, Xu Y. A historical perspective of template-based protein structure prediction. Methods Mol Biol 2008;413:3–42.10.1007/978-1-59745-574-9_1Suche in Google Scholar PubMed

5. Dorn M, E Silva MB, Buriol LS, Lamb LC. Three-dimensional protein structure prediction: methods and computational strategies. Comput Biol Chem 2014;53PB:251–76.10.1016/j.compbiolchem.2014.10.001Suche in Google Scholar PubMed

6. Honig B. Protein folding: from the levinthal paradox to structure prediction. J Mol Biol 1999;293:283–93.10.1006/jmbi.1999.3006Suche in Google Scholar PubMed

7. Onuchic JN, Wolynes PG. Theory of protein folding. Curr Opin Struct Biol 2004;14:70–5.10.1016/j.sbi.2004.01.009Suche in Google Scholar PubMed

8. Zhang J, Li W, Wang J, Qin M, Wu L, Yan Z, et al. Protein folding simulations: from coarse-grained model to all-atom model. IUBMB Life 2009;61:627–43.10.1002/iub.223Suche in Google Scholar PubMed

9. Kryshtafovych A, Fidelis K, Moult J. CASP10 results compared to those of previous CASP experiments. Proteins 2014;82:Suppl 2:164–74.10.1002/prot.24448Suche in Google Scholar PubMed PubMed Central

10. Ben-David M, Noivirt-Brik O, Paz A, Prilusky J, Sussman JL, Levy Y. Assessment of CASP8 structure predictions for template free targets. Proteins 2009;77:Suppl 9:50–65.10.1002/prot.22591Suche in Google Scholar PubMed

11. Kinch L, Yong Shi S, Cong Q, Cheng H, Liao Y, Grishin NV. CASP9 assessment of free modeling target predictions. Proteins 2011;79:Suppl 10:59–73.10.1002/prot.23181Suche in Google Scholar PubMed PubMed Central

12. Tai CH, Bai H, Taylor TJ, Lee B. Assessment of template-free modeling in CASP10 and ROLL. Proteins 2014;82:Suppl 2:57–83.10.1002/prot.24470Suche in Google Scholar

13. Cozzetto D, Kryshtafovych A, Fidelis K, Moult J, Rost B, Tramontano A. Evaluation of template-based models in CASP8 with standard measures. Proteins 2009;77:Suppl 9:18–28.10.1002/prot.22561Suche in Google Scholar

14. Huang YJ, Mao B, Aramini JM, Montelione GT. Assessment of template-based protein structure predictions in CASP10. Proteins 2014;82:Suppl 2:43–56.10.1002/prot.24488Suche in Google Scholar

15. Mariani V, Kiefer F, Schmidt T, Haas J, Schwede T. Assessment of template based protein structure predictions in CASP9. Proteins 2011;79:Suppl 10:37–58.10.1002/prot.23177Suche in Google Scholar

16. Ginalski K. Comparative modeling for protein structure prediction. Curr Opin Struct Biol 2006;16:172–7.10.1016/j.sbi.2006.02.003Suche in Google Scholar

17. Lushington GH. Comparative modeling of proteins. Methods Mol Biol 2015;1215:309–30.10.1007/978-1-4939-1465-4_14Suche in Google Scholar

18. Qu X, Swanson R, Day R, Tsai J. A guide to template based structure prediction. Curr Protein Pept Sci 2009;10:270–85.10.2174/138920309788452182Suche in Google Scholar

19. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol 1990;215:403–10.10.1016/S0022-2836(05)80360-2Suche in Google Scholar

20. Boratyn GM, Schaffer AA, Agarwala R, Altschul SF, Lipman DJ, Madden TL. Domain enhanced lookup time accelerated BLAST. Biol Direct 2012;7:12.10.1186/1745-6150-7-12Suche in Google Scholar PubMed PubMed Central

21. Biegert A, Soding J. Sequence context-specific profiles for homology searching. Proc Natl Acad Sci USA 2009;106:3770–5.10.1073/pnas.0810767106Suche in Google Scholar PubMed PubMed Central

22. Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci USA 1988;85:2444–8.10.1073/pnas.85.8.2444Suche in Google Scholar

23. Rost B. Twilight zone of protein sequence alignments. Protein Eng 1999;12:85–94.10.1093/protein/12.2.85Suche in Google Scholar

24. Jones DT, Taylor WR, Thornton JM. A new approach to protein fold recognition. Nature 1992;358:86–9.10.1038/358086a0Suche in Google Scholar

25. Joseph AP, de Brevern AG. From local structure to a global framework: recognition of protein folds. J R Soc Interface 2014;11:20131147.10.1098/rsif.2013.1147Suche in Google Scholar

26. Koonin EV, Wolf YI, Aravind L. Protein fold recognition using sequence profiles and its application in structural genomics. Adv Protein Chem 2000;54:245–75.10.1016/S0065-3233(00)54008-XSuche in Google Scholar

27. Bennett-Lovsey RM, Herbert AD, Sternberg MJ, Kelley LA. Exploring the extremes of sequence/structure space with ensemble fold recognition in the program Phyre. Proteins 2008;70:611–25.10.1002/prot.21688Suche in Google Scholar PubMed

28. Peng J, Xu J. Low-homology protein threading. Bioinformatics 2010;26:i294–300.10.1093/bioinformatics/btq192Suche in Google Scholar PubMed PubMed Central

29. Wu S, Zhang Y. MUSTER: improving protein sequence profile-profile alignments by using multiple sources of structure information. Proteins 2008;72:547–56.10.1002/prot.21945Suche in Google Scholar PubMed PubMed Central

30. Xu J, Li M, Kim D, Xu Y. RAPTOR: optimal protein threading by linear programming. J Bioinform Comput Biol 2003;1:95–117.10.1142/S0219720003000186Suche in Google Scholar

31. Yang Y, Faraggi E, Zhao H, Zhou Y. Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics 2011;27:2076–2082.10.1093/bioinformatics/btr350Suche in Google Scholar PubMed PubMed Central

32. Brylinski M, Lingam D. eThread: a highly optimized machine learning-based approach to meta-threading and the modeling of protein tertiary structures. PLoS One 2012;7:e50200.10.1371/journal.pone.0050200Suche in Google Scholar

33. Wu S, Zhang Y. LOMETS: a local meta-threading-server for protein structure prediction. Nucleic Acids Res 2007;35:3375–82.10.1093/nar/gkm251Suche in Google Scholar

34. Hillisch A, Pineda LF, Hilgenfeld R. Utility of homology models in the drug discovery process. Drug Discov Today 2004;9:659–69.10.1016/S1359-6446(04)03196-4Suche in Google Scholar

35. Liu T, Tang GW, Capriotti E. Comparative modeling: the state of the art and protein drug target structure prediction. Comb Chem High Throughput Screen 2011;14:532–47.10.2174/138620711795767811Suche in Google Scholar PubMed

36. Takeda-Shitaka M, Takaya D, Chiba C, Tanaka H, Umeyama H. Protein structure prediction in structure based drug design. Curr Med Chem 2004;11:551–8.10.2174/0929867043455837Suche in Google Scholar PubMed

37. Zhang Y. Protein structure prediction: when is it useful? Curr Opin Struct Biol 2009;19:145–55.10.1016/j.sbi.2009.02.005Suche in Google Scholar PubMed PubMed Central

38. Brylinski M. Nonlinear scoring functions for similarity-based ligand docking and binding affinity prediction. J Chem Inf Model 2013;53:3097–112.10.1021/ci400510eSuche in Google Scholar PubMed

39. Brylinski M. eMatchSite: sequence order-independent structure alignments of ligand binding pockets in protein models. PLoS Comput Biol 2014;10:e1003829.10.1371/journal.pcbi.1003829Suche in Google Scholar PubMed PubMed Central

40. Skolnick J, Zhou H, Brylinski M. Further evidence for the likely completeness of the library of solved single domain protein structures. J Phys Chem B 2012;116:6654–64.10.1021/jp211052jSuche in Google Scholar PubMed PubMed Central

41. Zhang Y, Hubner IA, Arakaki AK, Shakhnovich E, Skolnick J. On the origin and highly likely completeness of single-domain protein structures. Proc Natl Acad Sci USA 2006;103:2605–10.10.1073/pnas.0509379103Suche in Google Scholar PubMed PubMed Central

42. Zhang Y, Skolnick J. The protein structure prediction problem could be solved using the current PDB library. Proc Natl Acad Sci USA 2005;102:1029–34.10.1073/pnas.0407152101Suche in Google Scholar

43. O’Donovan C, Martin MJ, Gattiker A, Gasteiger E, Bairoch A, Apweiler R. High-quality protein knowledge resource: SWISS-PROT and TrEMBL. Brief Bioinform 2002;3:275–84.10.1093/bib/3.3.275Suche in Google Scholar

44. Vitkup D, Melamud E, Moult J, Sander C. Completeness in structural genomics. Nat Struct Biol 2001;8:559–66.10.1038/88640Suche in Google Scholar

45. Yan Y, Moult J. Protein family clustering for structural genomics. J Mol Biol 2005;353:744–59.10.1016/j.jmb.2005.08.058Suche in Google Scholar

46. Grabowski M, Joachimiak A, Otwinowski Z, Minor W. Structural genomics: keeping up with expanding knowledge of the protein universe. Curr Opin Struct Biol 2007;17:347–53.10.1016/j.sbi.2007.06.003Suche in Google Scholar

47. Sadreyev R, Grishin N. COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. J Mol Biol 2003;326:317–36.10.1016/S0022-2836(02)01371-2Suche in Google Scholar

48. Soding J. Protein homology detection by HMM-HMM comparison. Bioinformatics 2005;21:951–60.10.1093/bioinformatics/bti125Suche in Google Scholar PubMed

49. Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, et al. The Protein Data Bank. Acta Crystallogr D Biol Crystallogr 2002;58:899–907.10.1107/S0907444902003451Suche in Google Scholar

50. Berman HM, Kleywegt GJ, Nakamura H, Markley JL. How community has shaped the Protein Data Bank. Structure 2013;21:1485–91.10.1016/j.str.2013.07.010Suche in Google Scholar PubMed PubMed Central

51. Campbell ID. Timeline: the march of structural biology. Nat Rev Mol Cell Biol 2002;3:377–81.10.1038/nrm800Suche in Google Scholar PubMed

52. Li W, Godzik A. CD-HIT: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006;22:1658–9.10.1093/bioinformatics/btl158Suche in Google Scholar PubMed

53. Pandit SB, Skolnick J. Fr-TM-align: a new protein structural alignment method based on fragment alignments and the TM-score. BMC Bioinformatics 2008;9:531.10.1186/1471-2105-9-531Suche in Google Scholar PubMed PubMed Central

54. Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins 2004;57:702–10.10.1002/prot.20264Suche in Google Scholar PubMed

55. Cormen TH, Leiserson CE, Rivest RL, Stein C. Greedy algorithms. Introduction to algorithms. MIT Press, 1990:414.Suche in Google Scholar

56. Xu J, Zhang Y. How significant is a protein structure similarity with TM-score=0.5? Bioinformatics 2010;26:889–95.10.1093/bioinformatics/btq066Suche in Google Scholar PubMed PubMed Central

Received: 2014-12-22

Accepted: 2015-1-8

Published Online: 2015-2-7

Published in Print: 2015-3-31

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Artikel in diesem Heft

https://doi.org/10.1515/bams-2014-0024

Schlagwörter für diesen Artikel

comparative modeling; COMPASS; HHpred; Protein Data Bank; protein fold recognition; protein structure prediction; protein threading; template-based modeling