Home Physical Sciences Bioinformatics and the Internet
Article Publicly Available

Bioinformatics and the Internet

Published/Copyright: September 1, 2009
Become an author with De Gruyter Brill

_

News from IUPAC

Bioinformatics and the Internet

Dr. Jürgen Pleiss and Professor Rolf D. Schmid, Chairman and Titular Members of the IUPAC Commission on Biotechnology (Institute for Technical Biochemistry, University of Stuttgart, Allmandring 31, D-70569 Stuttgart, Germany; e-mail: jpleiss@tebio1.biologie.uni-stuttgart.de; rolf.d.schmid@rus.uni-stuttgart.de), contributed the following article on the combination of two new technologies that are having a major impact on the pharmaceutical, agrochemical, and food industries.

Introduction

Explosive Growth of the World Wide Web

Life Sciences and the World Wide Web

Protein Sequencing Databanks

Bioinformatics Databanks and Web Sites

Challenges to Bioinformatics

Future of Bioinformatics

References

Introduction

At the turn of the millennium, two young technologies can be singled out which have a major impact on science, industry, and society: recombinant DNA and information technology. As they combine in the field of bioinformatics, they are transforming the pharmaceutical, agrochemical, and food industries and, as a consequence, university education. Much of today's information in the life sciences is generated by collaborative efforts at different locations worldwide, and effective communication is essential for success. Thus, the huge amount of data generated by large-scale genome sequencing activities, e.g., the human genome project, depends heavily on computing and telecommunications and stimulates further efforts in this area.

Explosive Growth of the World Wide Web

In information technology, the World Wide Web (WWW) has become the dominant global communication network. It is based on the Internet, which has served already for more than 20 years as a communication resource among scientists. But only when the hypertext transfer protocol (HTTP) was introduced in 1990 did communication via the Internet became sufficiently easy and inexpensive to allow its general use. Moreover, HTTP is hardware-independent and thus accessible even through inexpensive personal computers which are connected directly to the Internet or via a modem to an Internet provider.

Fig. 1 Number of Internet hosts advertised in the DNS - Internet Domain Survey, July 1998, http://www.nw.com/zone/WWW/report.html)
Fig. 1

Number of Internet hosts advertised in the DNS - Internet Domain Survey, July 1998, http://www.nw.com/zone/WWW/report.html)

This development has stimulated all kinds of commercial activities, and the number of Internet hosts and Internet web sites has reached nearly 40 and 4 million (Fig. 1), respectively. At present, the number of web sites doubles every year, 100 million people worldwide are estimated to be active Internet users, and business on the order of USD 8 billion is done via the Internet. It is expected that within two more years the number of active users might increase tenfold to reach 1 billion, a dramatic increase driven mainly by the populous Asian nations, and that Internet-based sales will account for USD 300 billion or 1% of all global sales within only four years.

Life Sciences and the World Wide Web

Though by now a majority of the 4 million web sites have a commercial background, the scientific use of the WWW will increase as well. Among the initiatives to enhance its quality and speed up transfer of large volumes of data, the Internet2 project is the most ambitious. It will start by mid-1999 with 141 participating universities and 14 companies across the United States. The Internet2 will serve exclusively scientific purposes and "facilitate and coordinate the development, deployment, operation, and technology transfer of advanced, network-based applications and network services to further U.S. leadership in research and higher education and accelerate the availability of new services and applications on the Internet".

Even now in the era of Internet commerce, many thousands of WWW sites are devoted to the global science network. In fact, many recent discoveries and developments, particularly in the life sciences, would be unthinkable without the Internet. The modern era of life sciences started in the 1950s and accelerated in the early 1970s, when the modern tools of genetic engineering were developed, i.e., how to isolate, sequence, and clone DNA and express it in a host organism of one's choice. In those early days, DNA sequencing was cumbersome and restricted to single genes, minor gene clusters, or small virus genomes. In order to store the resulting DNA sequences, the National Biomedical Research Foundation, Washington, DC, USA, created the first sequence databank in 1965.

_

News from IUPAC

Bioinformatics and the Internet

Introduction

Explosive Growth of the World Wide Web

Life Sciences and the World Wide Web

Protein Sequencing Databanks

Bioinformatics Databanks and Web Sites

Challenges to Bioinformatics

Future of Bioinformatics

References

Protein Sequencing Databanks

When DNA sequence information started to grow exponentially during the 1980s, three DNA sequence databanks were established as GenBank (National Center for Biotechnology Information) in Bethesda, MD, USA; the European Molecular Biology Laboratory (EMBL/EBI) Nucleotide Sequence Database, now at the European Bioinformatics Institute (EBI) in Hinxton, UK; and the DNA Data Bank of Japan (DDBJ), Mishima, Japan, serving as mirror sites to each other.

As shown in Fig. 2, the DNA databases contained 40,000 DNA sequences with a total of 50 million base pairs in 1990, but within only a decade this number has increased 40-fold, now reaching 2 billion base pairs. This increase is due largely to advances in DNA technology and robot-assisted sequencing, allowing a shift from genetics to genomics; by now, the complete genomes of 14 bacteria, baker's yeast, 12 viruses and organelles, and the nematode Caenorhabditis elegans have been published on the Internet, and many others are approaching completion, among them the human genome with a total of about 3 billion base pairs alone. This enormous increase in numbers made new types of databases possible and necessary, e.g., web sites devoted to particular organisms such as the chromosome maps of the mouse. As the number of sequenced genomes increases and can be compared to individual geno- and phenotypes ("polymorphisms"), more and more important conclusions about the structure and regulation of single genes and proteins and their interrelation in health and disease can be drawn.

On the level of individual proteins, the first sequence databanks were set up in the mid 1980s, including SwissProt at the Swiss Institute of Bioinformatics, Geneva, Switzerland, and the Protein Information Resource established by the National Biomedical Research Foundation, Washington, DC, USA. When protein structure analysis by X-ray crystallography and later by NMR spectroscopy began to grow rapidly in the 1970s, the Protein Data Bank (PDB) was established at the Brookhaven National Laboratory, Upton, Long Island, NY, USA. It contains at present over 9000 entries on protein structures. Protein science, for a long time focused on protein structure and architecture, is now in a vigorous development in its own right; comparison of protein sequences based on DNA analysis and prediction of their tertiary structure ("from sequence to structure") is an active area of research, fueled by the quest for the so-called proteome, the sum of proteins expressed by a genome under different conditions of regulation and metabolism.

Bioinformatics Databanks and Web Sites

Table 1 lists a few important examples of the many extremely useful web sites related to the life sciences. Much of the experimental work required to arrive at such findings includes the use of complex algorithms which can, in turn, often be found on appropriate Internet pages. Finally, owing to its widespread accessibility, the Internet has also become a huge blackboard for scientific information, including online versions of scientific journals, free science information (such as the public database PubMed offered over the Internet by the National Library of Medicine at Bethesda, MD, USA, which allows free access to over 9 million scientific publications), tutorials, conference announcements, and information on grants and job offers. As a particular consequence of the Internet, the access to information of scientists working in less developed countries has dramatically increased. Thus, as just four among dozens of examples, there now exist the following web sites:

Challenges to Bioinformatics

The present shift from sequencing single genes to sequencing whole genomes is expected to expand widely our understanding of the regulation of expression, the interaction of proteins, and, finally, of the function of cells and multicellular organisms. Such progress implies new challenges to bioinformatics. There are at present two major problems:

  • Databases that deal with protein sequences and structures, on one hand, or with the function of whole cells, on the other, contain quite different, though interrelated, types of data. Research groups active in either area tend to chose data formats optimized for their particular purpose. As a result, consistency and coherence of databases can become a major problem.

  • The higher the complexity of the data, the more difficult is their analysis and their graphical presentation. Most future projects will be highly interdisciplinary, requiring the collaboration of experts from several or even many fields. In this situation, it will be inevitable to support the interaction with databases by expert systems, which integrate the knowledge of specialists and are user-friendly.

Future of Bioinformatics

As a probable consequence of all these developments, the biological and biochemical experiments of the future will, to some extent, be carried out not only in vivo and in vitro, but also in silico. Biology-related information will be the pertinent raw material, available from databases through the WWW, which can be profitable. As seen already in the case of the "gene hunt in silico", it becomes more and more feasible to transform this computer-based information into valuable research results or even products. Thus, it is becoming a reality that novel targets for drugs or new powerful biocatalysts can be identified in the huge and growing mass of computer-based genomic sequence information and that metabolic fluxes in living beings can be clustered, via a bioinformatics approach, to allow the genetic reengineering of metabolic pathways in microorganisms, plants, animals, or man.

References

_

News from IUPAC

Bioinformatics and the Internet

Table 1. Examples of Useful Web Sites in Bioinformatics

DNA and Protein Sequence Databases

Genomics

Protein Structure

Literature Searches

Homology Searches

Structure Prediction

Protein Architectures

International Organizations

_

Database TypeDescriptionURL
DNA and Protein Sequence Databases
SRSSRS Browser for 38 databanks in molecular biologyhttp://www.embl-heidelberg.de/srs5/
SWISS-PROT and TrEMBLAnnotated protein sequence database (78,082 and 178,957 sequences, respectively)http://expasy.hcuge.ch/sprot/sprot-top.html
PIRProtein Information Resource (116,372 sequences)http://www-nbrf.georgetown.edu/pir/
EMBLNucleotide Sequence DNA sequence database (3,046,471 Database sequences)http://www.ebi.ac.uk/ebi_docs/embl_db/ ebi/topembl.html
GenBankDNA sequence database (3,044,000 sequences)http://www.ncbi.nlm.nih.gov/Entrez/ nucleotide.html
DDBJDNA Data Bank of Japan (3,073,166 sequences)http://www.ddbj.nig.ac.jp/
Genomics
Pedant at MIPSSoftware system for completely automatic and exhaustive analysis of protein sequence sets (21 complete, 21 unfinished genomes)http://pedant.mips.biochem.mpg.de/
TIGR DatabaseMicrobial database (20 published genomes, 60 genomes in progress)http://www.tigr.org/tdb/tdb.html
Sanger CenterHuman genome and 24 more genomeshttp://www.sanger.ac.uk/
Protein Structure
PDBArchive of experimentally determined three-dimensional structures (9,179 entries)http://www.pdb.bnl.gov/
Literature Searches
MedlineSearch for citationshttp://www4.ncbi.nlm.nih. gov/PubMed/
SWISS-PROTjournals list List of online journalshttp://www.expasy.ch/cgi-bin/ jourlist?jourlist.txt
Homology Searches
BLASTSequence similarity search in 22 sequence databases and 42 genomeshttp://www.ncbi.nlm.nih.gov/BLAST/
FASTASequence similarity search in 25 sequence databaseshttp://www2.ebi.ac.uk/fasta3/
Structure Prediction
Swiss-ModelHomology modelinghttp://expasy.hcuge.ch/swissmod/SWISS-MODEL.html
Biotech Validation Suite for Protein StructuresQuality checks of protein structureshttp://biotech.embl-heidelberg.de:8400/
PredictProteinPrediction of aspects of protein structurehttp://www.embl-heidelberg.de/predictprotein/ predictprotein.html
Protein Architectures
SCOPProtein structure classificationhttp://scop.mrc-lmb.cam.ac.uk/scop/
CATHProtein structure classificationhttp://www.biochem.ucl.ac.uk/bsm/cath/
International Organizations
FAOPartnership programs of FAOhttp://www.fao.org/GENINFO/partner/ default.htm
UNESCOBiotechnology fellowship programs of UNESCOhttp://www.unesco.org/general/ eng/programmes/science/life/index.htm

Published Online: 2009-09-01
Published in Print: 1999-03

© 2014 by Walter de Gruyter GmbH & Co.

Articles in the same Issue

  1. Bioinformatics and the Internet
  2. IUPAC–NIST Solubility Data Series
  3. IUPAC, IUPHAR, and IUTOX Report on Natural and Anthropogenic Environmental Oestrogens: The Scientific Basis for Risk Assessment
  4. A New NMR Data Standard for the Exchange and Archiving for Multidimensional Data Sets
  5. Present Status of Science in Cuba: Focus on Chemistry
  6. Scientific Committee on Problems of the Environment (SCOPE) of the International Council for Science (ICSU)
  7. Water Pollution Management in India (VI.3)
  8. Final Report on the Design and Field Testing of a Teaching Package for Environmental Chemistry (CTC)
  9. Fatty Acids
  10. Metabolic Pathways of Agrochemicals
  11. Benefit-Risk Balance for Marketed Drugs: Evaluating Safety Signals, Report of CIOMS Working Group IV
  12. Guidelines for Drinking-Water Quality, Second Edition, Addendum to Volume 1: Recommendations
  13. Toxicological Evaluation of Certain Veterinary Drug Residues in Food
  14. Pesticide Residues in Food 1997, Part I: Toxicological and Environmental Evaluations
  15. Boron
  16. Guide to Drug Financing Mechanisms
  17. New Publications from ILSI Europe
  18. Other Books and Publications
  19. National Profile to Assess the Chemicals Management in Slovenia
  20. Commission on High-Temperature Materials and Solid State Chemistry (II.3)
  21. Maison de la Chimie Foundation Prize
  22. King Faisal International Prize
  23. James Economy Wins American Chemical Society Mark Award
  24. 12th International Symposium on Polymer Analysis and Characterization (ISPAC-12), 28–30 June 1999, La Rochelle, France
  25. 13th Bratislava International Conference on Polymers: Separation and Characterization of Macromolecules, 4–9 July 1999, Bratislava, Slovakia
  26. 17th ICHC International Congress of Heterocyclic Chemistry, 1–6 August 1999, Vienna, Austria
  27. 58th Chemical Conference and Exhibition and 7th Caribbean Chemical Conference, 3–6 August 1999, Hato Rey, Puerto Rico
  28. 4th International Symposium on Philosophy, History, and Education in Analytical Chemistry, 3–4 September 1999, Vienna, Austria
  29. Symposium on Common Themes in Transcription and RNA Processing, 6–8 September 1999, Buenos Aires, Argentina
  30. 113th AOAC International Annual Meeting and Exposition, 26–30 September 1999, Houston, Texas, USA
  31. 8th International Conference on Multiphoton Processes, 3–8 October 1999, Monterey, California, USA
  32. Conference Calendar
Downloaded on 4.3.2026 from https://www.degruyterbrill.com/document/doi/10.1515/ci.1999.21.2.33/html
Scroll to top button