Bioinformatics and the Internet
News from IUPAC
Bioinformatics and the Internet
Dr. Jürgen Pleiss and Professor Rolf D. Schmid, Chairman and Titular Members of the IUPAC Commission on Biotechnology (Institute for Technical Biochemistry, University of Stuttgart, Allmandring 31, D-70569 Stuttgart, Germany; e-mail: jpleiss@tebio1.biologie.uni-stuttgart.de; rolf.d.schmid@rus.uni-stuttgart.de), contributed the following article on the combination of two new technologies that are having a major impact on the pharmaceutical, agrochemical, and food industries.
Introduction
Explosive Growth of the World Wide Web
Life Sciences and the World Wide Web
Protein Sequencing Databanks
Bioinformatics Databanks and Web Sites
Challenges to Bioinformatics
Future of Bioinformatics
References
At the turn of the millennium, two young technologies can be singled out which have a major impact on science, industry, and society: recombinant DNA and information technology. As they combine in the field of bioinformatics, they are transforming the pharmaceutical, agrochemical, and food industries and, as a consequence, university education. Much of today's information in the life sciences is generated by collaborative efforts at different locations worldwide, and effective communication is essential for success. Thus, the huge amount of data generated by large-scale genome sequencing activities, e.g., the human genome project, depends heavily on computing and telecommunications and stimulates further efforts in this area.
Explosive Growth of the World Wide Web
In information technology, the World Wide Web (WWW) has become the dominant global communication network. It is based on the Internet, which has served already for more than 20 years as a communication resource among scientists. But only when the hypertext transfer protocol (HTTP) was introduced in 1990 did communication via the Internet became sufficiently easy and inexpensive to allow its general use. Moreover, HTTP is hardware-independent and thus accessible even through inexpensive personal computers which are connected directly to the Internet or via a modem to an Internet provider.

Number of Internet hosts advertised in the DNS - Internet Domain Survey, July 1998, http://www.nw.com/zone/WWW/report.html)
This development has stimulated all kinds of commercial activities, and the number of Internet hosts and Internet web sites has reached nearly 40 and 4 million (Fig. 1), respectively. At present, the number of web sites doubles every year, 100 million people worldwide are estimated to be active Internet users, and business on the order of USD 8 billion is done via the Internet. It is expected that within two more years the number of active users might increase tenfold to reach 1 billion, a dramatic increase driven mainly by the populous Asian nations, and that Internet-based sales will account for USD 300 billion or 1% of all global sales within only four years.
Life Sciences and the World Wide Web
Though by now a majority of the 4 million web sites have a commercial background, the scientific use of the WWW will increase as well. Among the initiatives to enhance its quality and speed up transfer of large volumes of data, the Internet2 project is the most ambitious. It will start by mid-1999 with 141 participating universities and 14 companies across the United States. The Internet2 will serve exclusively scientific purposes and "facilitate and coordinate the development, deployment, operation, and technology transfer of advanced, network-based applications and network services to further U.S. leadership in research and higher education and accelerate the availability of new services and applications on the Internet".
Even now in the era of Internet commerce, many thousands of WWW sites are devoted to the global science network. In fact, many recent discoveries and developments, particularly in the life sciences, would be unthinkable without the Internet. The modern era of life sciences started in the 1950s and accelerated in the early 1970s, when the modern tools of genetic engineering were developed, i.e., how to isolate, sequence, and clone DNA and express it in a host organism of one's choice. In those early days, DNA sequencing was cumbersome and restricted to single genes, minor gene clusters, or small virus genomes. In order to store the resulting DNA sequences, the National Biomedical Research Foundation, Washington, DC, USA, created the first sequence databank in 1965.
News from IUPAC
Bioinformatics and the Internet
Introduction
Explosive Growth of the World Wide Web
Life Sciences and the World Wide Web
Protein Sequencing Databanks
Bioinformatics Databanks and Web Sites
Challenges to Bioinformatics
Future of Bioinformatics
References
Protein Sequencing Databanks
When DNA sequence information started to grow exponentially during the 1980s, three DNA sequence databanks were established as GenBank (National Center for Biotechnology Information) in Bethesda, MD, USA; the European Molecular Biology Laboratory (EMBL/EBI) Nucleotide Sequence Database, now at the European Bioinformatics Institute (EBI) in Hinxton, UK; and the DNA Data Bank of Japan (DDBJ), Mishima, Japan, serving as mirror sites to each other.

As shown in Fig. 2, the DNA databases contained 40,000 DNA sequences with a total of 50 million base pairs in 1990, but within only a decade this number has increased 40-fold, now reaching 2 billion base pairs. This increase is due largely to advances in DNA technology and robot-assisted sequencing, allowing a shift from genetics to genomics; by now, the complete genomes of 14 bacteria, baker's yeast, 12 viruses and organelles, and the nematode Caenorhabditis elegans have been published on the Internet, and many others are approaching completion, among them the human genome with a total of about 3 billion base pairs alone. This enormous increase in numbers made new types of databases possible and necessary, e.g., web sites devoted to particular organisms such as the chromosome maps of the mouse. As the number of sequenced genomes increases and can be compared to individual geno- and phenotypes ("polymorphisms"), more and more important conclusions about the structure and regulation of single genes and proteins and their interrelation in health and disease can be drawn.
On the level of individual proteins, the first sequence databanks were set up in the mid 1980s, including SwissProt at the Swiss Institute of Bioinformatics, Geneva, Switzerland, and the Protein Information Resource established by the National Biomedical Research Foundation, Washington, DC, USA. When protein structure analysis by X-ray crystallography and later by NMR spectroscopy began to grow rapidly in the 1970s, the Protein Data Bank (PDB) was established at the Brookhaven National Laboratory, Upton, Long Island, NY, USA. It contains at present over 9000 entries on protein structures. Protein science, for a long time focused on protein structure and architecture, is now in a vigorous development in its own right; comparison of protein sequences based on DNA analysis and prediction of their tertiary structure ("from sequence to structure") is an active area of research, fueled by the quest for the so-called proteome, the sum of proteins expressed by a genome under different conditions of regulation and metabolism.
Bioinformatics Databanks and Web Sites
Table 1 lists a few important examples of the many extremely useful web sites related to the life sciences. Much of the experimental work required to arrive at such findings includes the use of complex algorithms which can, in turn, often be found on appropriate Internet pages. Finally, owing to its widespread accessibility, the Internet has also become a huge blackboard for scientific information, including online versions of scientific journals, free science information (such as the public database PubMed offered over the Internet by the National Library of Medicine at Bethesda, MD, USA, which allows free access to over 9 million scientific publications), tutorials, conference announcements, and information on grants and job offers. As a particular consequence of the Internet, the access to information of scientists working in less developed countries has dramatically increased. Thus, as just four among dozens of examples, there now exist the following web sites:
an Asia-Pacific Network of Science and Technology Centers:
an African Network for Essential National Health Research:
a West Africa Research Network (WARN):
Uninet - The South African Academic and Research Network:
Challenges to Bioinformatics
The present shift from sequencing single genes to sequencing whole genomes is expected to expand widely our understanding of the regulation of expression, the interaction of proteins, and, finally, of the function of cells and multicellular organisms. Such progress implies new challenges to bioinformatics. There are at present two major problems:
Databases that deal with protein sequences and structures, on one hand, or with the function of whole cells, on the other, contain quite different, though interrelated, types of data. Research groups active in either area tend to chose data formats optimized for their particular purpose. As a result, consistency and coherence of databases can become a major problem.
The higher the complexity of the data, the more difficult is their analysis and their graphical presentation. Most future projects will be highly interdisciplinary, requiring the collaboration of experts from several or even many fields. In this situation, it will be inevitable to support the interaction with databases by expert systems, which integrate the knowledge of specialists and are user-friendly.
Future of Bioinformatics
As a probable consequence of all these developments, the biological and biochemical experiments of the future will, to some extent, be carried out not only in vivo and in vitro, but also in silico. Biology-related information will be the pertinent raw material, available from databases through the WWW, which can be profitable. As seen already in the case of the "gene hunt in silico", it becomes more and more feasible to transform this computer-based information into valuable research results or even products. Thus, it is becoming a reality that novel targets for drugs or new powerful biocatalysts can be identified in the huge and growing mass of computer-based genomic sequence information and that metabolic fluxes in living beings can be clustered, via a bioinformatics approach, to allow the genetic reengineering of metabolic pathways in microorganisms, plants, animals, or man.
References
Internet Domain Survey, July 1998, http://www.nw.com/zone/WWW/top.html;
The Netcraft Web Server Survey,
http://www.netcraft.com/Survey/;
Internet Statistics: Growth and Usage of the Web and the Internet, http://www.mit.edu/people/mkgray/net/;
eMarketer,
Hermes project,
The Internet2 project, http://www.internet2.edu/
News from IUPAC
Bioinformatics and the Internet
Table 1. Examples of Useful Web Sites in Bioinformatics
DNA and Protein Sequence Databases
Genomics
Protein Structure
Literature Searches
Homology Searches
Structure Prediction
Protein Architectures
International Organizations
| Database Type | Description | URL |
| DNA and Protein Sequence Databases | ||
| SRS | SRS Browser for 38 databanks in molecular biology | http://www.embl-heidelberg.de/srs5/ |
| SWISS-PROT and TrEMBL | Annotated protein sequence database (78,082 and 178,957 sequences, respectively) | http://expasy.hcuge.ch/sprot/sprot-top.html |
| PIR | Protein Information Resource (116,372 sequences) | http://www-nbrf.georgetown.edu/pir/ |
| EMBL | Nucleotide Sequence DNA sequence database (3,046,471 Database sequences) | http://www.ebi.ac.uk/ebi_docs/embl_db/ ebi/topembl.html |
| GenBank | DNA sequence database (3,044,000 sequences) | http://www.ncbi.nlm.nih.gov/Entrez/ nucleotide.html |
| DDBJ | DNA Data Bank of Japan (3,073,166 sequences) | http://www.ddbj.nig.ac.jp/ |
| Genomics | ||
| Pedant at MIPS | Software system for completely automatic and exhaustive analysis of protein sequence sets (21 complete, 21 unfinished genomes) | http://pedant.mips.biochem.mpg.de/ |
| TIGR Database | Microbial database (20 published genomes, 60 genomes in progress) | http://www.tigr.org/tdb/tdb.html |
| Sanger Center | Human genome and 24 more genomes | http://www.sanger.ac.uk/ |
| Protein Structure | ||
| PDB | Archive of experimentally determined three-dimensional structures (9,179 entries) | http://www.pdb.bnl.gov/ |
| Literature Searches | ||
| Medline | Search for citations | http://www4.ncbi.nlm.nih. gov/PubMed/ |
| SWISS-PROT | journals list List of online journals | http://www.expasy.ch/cgi-bin/ jourlist?jourlist.txt |
| Homology Searches | ||
| BLAST | Sequence similarity search in 22 sequence databases and 42 genomes | http://www.ncbi.nlm.nih.gov/BLAST/ |
| FASTA | Sequence similarity search in 25 sequence databases | http://www2.ebi.ac.uk/fasta3/ |
| Structure Prediction | ||
| Swiss-Model | Homology modeling | http://expasy.hcuge.ch/swissmod/SWISS-MODEL.html |
| Biotech Validation Suite for Protein Structures | Quality checks of protein structures | http://biotech.embl-heidelberg.de:8400/ |
| PredictProtein | Prediction of aspects of protein structure | http://www.embl-heidelberg.de/predictprotein/ predictprotein.html |
| Protein Architectures | ||
| SCOP | Protein structure classification | http://scop.mrc-lmb.cam.ac.uk/scop/ |
| CATH | Protein structure classification | http://www.biochem.ucl.ac.uk/bsm/cath/ |
| International Organizations | ||
| FAO | Partnership programs of FAO | http://www.fao.org/GENINFO/partner/ default.htm |
| UNESCO | Biotechnology fellowship programs of UNESCO | http://www.unesco.org/general/ eng/programmes/science/life/index.htm |
© 2014 by Walter de Gruyter GmbH & Co.
Articles in the same Issue
- Bioinformatics and the Internet
- IUPAC–NIST Solubility Data Series
- IUPAC, IUPHAR, and IUTOX Report on Natural and Anthropogenic Environmental Oestrogens: The Scientific Basis for Risk Assessment
- A New NMR Data Standard for the Exchange and Archiving for Multidimensional Data Sets
- Present Status of Science in Cuba: Focus on Chemistry
- Scientific Committee on Problems of the Environment (SCOPE) of the International Council for Science (ICSU)
- Water Pollution Management in India (VI.3)
- Final Report on the Design and Field Testing of a Teaching Package for Environmental Chemistry (CTC)
- Fatty Acids
- Metabolic Pathways of Agrochemicals
- Benefit-Risk Balance for Marketed Drugs: Evaluating Safety Signals, Report of CIOMS Working Group IV
- Guidelines for Drinking-Water Quality, Second Edition, Addendum to Volume 1: Recommendations
- Toxicological Evaluation of Certain Veterinary Drug Residues in Food
- Pesticide Residues in Food 1997, Part I: Toxicological and Environmental Evaluations
- Boron
- Guide to Drug Financing Mechanisms
- New Publications from ILSI Europe
- Other Books and Publications
- National Profile to Assess the Chemicals Management in Slovenia
- Commission on High-Temperature Materials and Solid State Chemistry (II.3)
- Maison de la Chimie Foundation Prize
- King Faisal International Prize
- James Economy Wins American Chemical Society Mark Award
- 12th International Symposium on Polymer Analysis and Characterization (ISPAC-12), 28–30 June 1999, La Rochelle, France
- 13th Bratislava International Conference on Polymers: Separation and Characterization of Macromolecules, 4–9 July 1999, Bratislava, Slovakia
- 17th ICHC International Congress of Heterocyclic Chemistry, 1–6 August 1999, Vienna, Austria
- 58th Chemical Conference and Exhibition and 7th Caribbean Chemical Conference, 3–6 August 1999, Hato Rey, Puerto Rico
- 4th International Symposium on Philosophy, History, and Education in Analytical Chemistry, 3–4 September 1999, Vienna, Austria
- Symposium on Common Themes in Transcription and RNA Processing, 6–8 September 1999, Buenos Aires, Argentina
- 113th AOAC International Annual Meeting and Exposition, 26–30 September 1999, Houston, Texas, USA
- 8th International Conference on Multiphoton Processes, 3–8 October 1999, Monterey, California, USA
- Conference Calendar
Articles in the same Issue
- Bioinformatics and the Internet
- IUPAC–NIST Solubility Data Series
- IUPAC, IUPHAR, and IUTOX Report on Natural and Anthropogenic Environmental Oestrogens: The Scientific Basis for Risk Assessment
- A New NMR Data Standard for the Exchange and Archiving for Multidimensional Data Sets
- Present Status of Science in Cuba: Focus on Chemistry
- Scientific Committee on Problems of the Environment (SCOPE) of the International Council for Science (ICSU)
- Water Pollution Management in India (VI.3)
- Final Report on the Design and Field Testing of a Teaching Package for Environmental Chemistry (CTC)
- Fatty Acids
- Metabolic Pathways of Agrochemicals
- Benefit-Risk Balance for Marketed Drugs: Evaluating Safety Signals, Report of CIOMS Working Group IV
- Guidelines for Drinking-Water Quality, Second Edition, Addendum to Volume 1: Recommendations
- Toxicological Evaluation of Certain Veterinary Drug Residues in Food
- Pesticide Residues in Food 1997, Part I: Toxicological and Environmental Evaluations
- Boron
- Guide to Drug Financing Mechanisms
- New Publications from ILSI Europe
- Other Books and Publications
- National Profile to Assess the Chemicals Management in Slovenia
- Commission on High-Temperature Materials and Solid State Chemistry (II.3)
- Maison de la Chimie Foundation Prize
- King Faisal International Prize
- James Economy Wins American Chemical Society Mark Award
- 12th International Symposium on Polymer Analysis and Characterization (ISPAC-12), 28–30 June 1999, La Rochelle, France
- 13th Bratislava International Conference on Polymers: Separation and Characterization of Macromolecules, 4–9 July 1999, Bratislava, Slovakia
- 17th ICHC International Congress of Heterocyclic Chemistry, 1–6 August 1999, Vienna, Austria
- 58th Chemical Conference and Exhibition and 7th Caribbean Chemical Conference, 3–6 August 1999, Hato Rey, Puerto Rico
- 4th International Symposium on Philosophy, History, and Education in Analytical Chemistry, 3–4 September 1999, Vienna, Austria
- Symposium on Common Themes in Transcription and RNA Processing, 6–8 September 1999, Buenos Aires, Argentina
- 113th AOAC International Annual Meeting and Exposition, 26–30 September 1999, Houston, Texas, USA
- 8th International Conference on Multiphoton Processes, 3–8 October 1999, Monterey, California, USA
- Conference Calendar