XML in Chemistry and Chemical Identifiers
-
by Antony (Tony) N. Davies
XML in Chemistry and Chemical Identifiers
by Antony (Tony) N. Davies
Steve Stein of the National Institute of Standards and Technology (NIST) in Gaithersburg, Maryland, USA, and Alan McNaught of the Royal Society of Chemistry, Cambridge, UK, jointly hosted a three-day meeting to discuss IUPAC projects on XML in Chemistry and the Chemical Identifier Project. The meeting was held at NIST from 12–14 November 2003.
The meeting was exceptionally well attended with over 50 attendees from governmental and regulatory bodies, research and academic institutes, and industry. A wide range of experts in the field were brought together for a lively exchange of views on many of the topics covered.
XML in Chemistry
Numerous speakers related tales of XML initiatives involving chemistry in their respective organizations, including the European Patent Office, the International Union of Crystallography, and the U.S. Food and Drug Administration’s Center for Drug Evaluation and Research. Various projects within NIST itself were also discussed, such as UnitsML for scientific units and ThermoML for thermodynamic properties. ToxML was described for toxicology data. Despite the range of speakers’ views on the issue of XML in chemistry, one thing became clear. The decision of IUPAC to take a leading role to avoid multiplication of effort was clearly correct.
Some very detailed technical discussions were held on the mechanisms surrounding the generation of controlled ontologies or data dictionaries that highlighted the speed at which the field is moving. The number of XML initiatives that have been born, flourished briefly, and then vanished into obscurity was also discussed.
These arguments underlined the essential nature of the problem, which is that the research effort ought to be better placed in producing novel ways to handle information to enhance productivity and produce better more advanced tools for data mining rather than repeatedly discussing how best to move the data from A to B. With luck, the IUPAC initiative will bring a certain degree of stability to the information technology base in chemistry and allow teams working in this area to concentrate on their core business without having to worry whether their underlying technology is about to be made obsolete!
IUPAC/NIST Chemical Identifiers (INChI)
Alan McNaught introduced the project, the aim of which is to produce a public Chemical Identifier to uniquely identify compounds. The current version is available for testing and has been expanded to cover organic, inorganic, and organometallic chemistry. It should be noted that the project acronym IChI (for IUPAC Chemical Identifiers) has been changed to INChI, where N stands for NIST. This change was made to recognize the immense contribution of NIST to the project.
But how does INChI work? Well, INChI starts off by looking at the chemistry of the structure to be assigned an “Identifier.” The structure is normalized and a number of chemical rules applied. Next, some mathematics “canonoicalises” the structure (labels atoms) with equivalent atoms receiving the same numbers. Finally, the labelled structure is “serialized” and the output is a character string. Sound simple? Well, as they say in Germany, the devil hides in the details!
The normalization of the structure involves a series of layers for the raw chemical substance, the molecular formula, and a connectivity layer followed where necessary by a stereochemistry and isotopic layer. The connectivity layer consists of four “sub layers,” with increasing amounts of detail, generated as follows:
disconnect all H and meta atoms to create a “skeleton”
reconnect fixed hydrogen atoms to reveal tautomers
optionally reconnect all mobile hydrogen atoms
optionally reconnect all metal atoms
As you would expect this very simple approach came in for some heavy discussion, but “the proof of the pudding is in the eating,” as they say. So far, with some very large structural databases being analyzed in this way, no insurmountable problems have arisen. The developers are looking for beta testers so please get in touch through the IUPAC Web site if you are interested!
Antony N. Davies <tony.davies@creonlabcontrol.com> works at Creon Lab Control AG, in Frechen, Germany. He is secretary of the IUPAC Committee on Printed and Electronic Publications and chairman of the Subcommittee on Spectroscopic Data Standards; he is JCAMP-DX external professor at the University of Glamorgan, Wales, United Kingdom.
www.iupac.org/projects/2002/2002-022-1-024.html
www.iupac.org/projects/2000/2000-025-1-800.html
Page last modified 2 July 2004.
Copyright © 2003-2004 International Union of Pure and Applied Chemistry.
Questions regarding the website, please contact edit.ci@iupac.org
© 2014 by Walter de Gruyter GmbH & Co.
Articles in the same Issue
- Contents
- IUPAC’s Recognition of Chemists
- Chemical Education and Sustainable Development in Russia. Chemistry Clearing House
- Collaborative Trial Tests for Method Validation: Lessons to be Learned
- IUPAC Working Party on Structure and Properties of Commercial Polymers–History, Output, and Future Prospects
- John Pople
- XML in Chemical Education
- Executive Committee Looks at IUPAC's Role in the World
- 2004 Winners of the IUPAC Prize for Young Chemists
- Questionable Stereoformulas of Diastereomers
- Graphical Representation Standards for Chemical Structure Diagrams
- Explanatory Dictionary of Concepts in Toxicokinetics
- Chemistry's Contributions to Humanity–A Feasibility Study
- Bio-Physical Chemistry of Fractal Structures and Processes in Environmental Systems
- Quantifying the Effects of Compound Combinations
- XML in Chemistry and Chemical Identifiers
- XML-Based IUPAC Standard for Experimental and Critically Evaluated Thermodynamic Property Data Storage and Capture
- IUPAC Seeks Your Comments
- Kids and Science
- Bio-Based Polymers
- Advanced Materials
- Biological Polyesters
- Biotechnology
- Liquid Chromatography/Mass Spectrometry
- Chemistry for Agriculture
- Young Chemists
- Mark Your Calendar
Articles in the same Issue
- Contents
- IUPAC’s Recognition of Chemists
- Chemical Education and Sustainable Development in Russia. Chemistry Clearing House
- Collaborative Trial Tests for Method Validation: Lessons to be Learned
- IUPAC Working Party on Structure and Properties of Commercial Polymers–History, Output, and Future Prospects
- John Pople
- XML in Chemical Education
- Executive Committee Looks at IUPAC's Role in the World
- 2004 Winners of the IUPAC Prize for Young Chemists
- Questionable Stereoformulas of Diastereomers
- Graphical Representation Standards for Chemical Structure Diagrams
- Explanatory Dictionary of Concepts in Toxicokinetics
- Chemistry's Contributions to Humanity–A Feasibility Study
- Bio-Physical Chemistry of Fractal Structures and Processes in Environmental Systems
- Quantifying the Effects of Compound Combinations
- XML in Chemistry and Chemical Identifiers
- XML-Based IUPAC Standard for Experimental and Critically Evaluated Thermodynamic Property Data Storage and Capture
- IUPAC Seeks Your Comments
- Kids and Science
- Bio-Based Polymers
- Advanced Materials
- Biological Polyesters
- Biotechnology
- Liquid Chromatography/Mass Spectrometry
- Chemistry for Agriculture
- Young Chemists
- Mark Your Calendar