Cheminformatics: Data and Standards
A Pure and Applied Chemistry Special Issue
Vol 94, Issue 6, June 2022
by Vincent F. Scalfani (Guest editor)

Many readers of Pure and Applied Chemistry and Chemistry International are likely familiar with the IUPAC Color Books. The current collection of IUPAC Color Books is comprised of eight volumes, and includes authoritative descriptions of chemical units, nomenclature, terminology, and symbols [1]. IUPAC standards and recommendations exist outside of the Color Books as well, for example, in individual publications and reports from IUPAC Project task groups. Relevant to cheminformatics, two longstanding IUPAC standards and recommendations include the InChI for chemical identification [2], and JCAMP-DX for spectral data exchange [3, 4]. In addition to continued development of InChI and JCAMP-DX within IUPAC, the IUPAC Committee on Publications and Cheminformatics Standards (CPCDS) along with numerous IUPAC task groups have been engaged with advancing cheminformatics, or more broadly, digital chemistry standards over the last two decades. For example, a basic search in the IUPAC Project database [5] for the words ‘cheminformatics’ or ‘digital’ resulted in 26 projects, with 15 of the projects still active. Some examples of current projects include enhancements to the Gold Book, metadata schema for solubility data, FAIR data projects, and efforts to enhance molecular representations (e.g., SMILES and InChI). Clearly, digital standards and cheminformatics are active areas of interest to IUPAC, the contributing volunteers, and the broader chemistry community. Readers are encouraged to visit the IUPAC Digital Standards webpage and submit feedback on current projects or ideas for new standards projects [6].
While the current IUPAC Color Books, publications, and task group reports contain some digital and cheminformatics standards, a need for a dedicated IUPAC Cheminformatics Color Book was recognized by a task group in 2017 [7].The task group completed a review of the current cheminformatics community needs and began preliminary planning for a new Cheminformatics Color Book. In late 2020, the IUPAC Subcommittee on Cheminformatics Data Standards (SCDS) aims were updated to focus on the digital dissemination of IUPAC cheminformatics data standards:
“The Subcommittee on Cheminformatics Data Standards (SCDS) coordinates and leads the digital dissemination of IUPAC cheminformatics data standards. SCDS works collaboratively with relevant IUPAC Divisions, IUPAC Committees, publishers, external scientific organizations, and the chemistry community to manage source reference material for IUPAC outputs related to cheminformatics and data standards. Such outputs may include cheminformatics related theory, policy around digital data/metadata exchange, standards, recommendations, validation data, and community engagement. SCDS maintains a vision for the dissemination of IUPAC cheminformatics standards and works closely with contributors and the community to share and maintain this information.”
In 2021, SCDS was then tasked with continuing the planning and development of a new IUPAC Cheminformatics Color Book. These new goals for SCDS and discussions within the Committee led to this Pure and Applied Chemistry special issue on Cheminformatics: Data and Standards as a first step towards a Cheminformatics Color Book. A call for papers was published in May 2021, with the aim of discussing cheminformatics standards and future needs. The proposed topics for the issue were as follows [8]:
Cheminformatics standards use-cases and workflows across disciplines.
Discussions around how cheminformatics standards advance research and teaching.
Perspectives related to current cheminformatics standards and future needs, for example interoperability and metadata considerations.
Cheminformatics datasets useful for teaching and/or validation.
Standardization needs related to infrastructure (e.g., repositories), cheminformatics toolkits, or data sharing.
Conference, symposia, or workshop based outcomes related to cheminformatics standardization.
How would a Pure and Applied Chemistry special issue on Cheminformatics: Data and Standards lead to an official IUPAC Color Book? Discussions are ongoing within IUPAC. One potential strategy is for the Cheminformatics: Data and Standards special issue to become a virtual “rolling” issue; in other words, articles are continually added. From the collection of cheminformatics articles in Pure and Applied Chemistry, selected articles could then be adapted into a cheminformatics Color Book chapter in collaboration with the author(s) and relevant IUPAC committees. This model for the Cheminformatics Color Book would be similar to an “overlay” journal [9]. Determining which articles would be appropriate for inclusion into the Color Book could be made through a combination of input from the community, IUPAC committees (e.g., SCDS) and related IUPAC task groups. Selected articles would then go through a similar approval process as other IUPAC Color Books. This workflow of completing the Cheminformatics Color Book piece-wise would spread the work across many contributors, while also serving to more quickly disseminate authoritative cheminformatics standards and recommendations to the global community.
Having now described the history and thought behind this special issue and future ideas for an IUPAC Cheminformatics Color Book, let me summarize the content within this issue.
There are four main themes of the contributed articles. The first theme is education with a contributed article on experiences with the Royal Society of Chemistry Chemical Information and Computer Applications interest Group’s open source chemical data and cheminformatics virtual workshops (Swain, see also box on next page), and another contributed article with a detailed account of experiences related to launching a materials informatics program (Lipscomb). The second theme is cheminformatics file formats and use cases. This includes an article on a proposed data model for compounds and assays (Kappler), an article describing a big data use case with RInChI (reaction InChI) (Blanke), and an article on the Reaction Structured Product Labeling format and associated use-cases (Nicklaus). The third theme is spectroscopic data and includes an article with an overview of the JCAMP-DX file format (Davies), and another article describing a proposed IUPAC specification for the FAIR management and sharing of spectroscopic data (Hanson). And the fourth theme was surveying the landscape, including an article reviewing chemical ontologies (Koepler) and a review of analytical data standards (Rauh). I expect that there will be many more contributions and themes that emerge with new submissions. As more machine-readable chemical data is continuing to be shared, standardization efforts will become even more important, and I hope articles collected within this rolling special issue can serve as a starting point toward identifying key chapters for a Cheminformatics Color Book, as well as identifying new IUPAC projects that are needed to advance cheminformatics standardization efforts.
Thank you for reading this special issue. And thank you to the contributing authors, reviewers, and Hugh Burrows, Editor of Pure and Applied Chemistry, for his enthusiasm and help making this issue a reality. Finally, thank you to my colleagues within IUPAC for their encouragement and feedback during the planning of this special issue on Cheminformatics: Data and Standards.
Vincent F. Scalfani <vfscalfani@ua.edu> is Assistant Professor/Science and Engineering Librarian at The University of Alabama, and current Chair of the IUPAC Subcommittee on Cheminformatics Data Standards. ORCID.org/0000-0002-7363-531X
Preface reprinted from PAC
https://www.degruyter.com/PAC ; keyword: Cheminformatics
https://www.degruyter.com/journal/key/pac/94/6/html
Promoting Open Chemical Science Online
by Christopher J. Swain, Jeremy G. Frey and Jonathan M. Goodman
What is the state of open science in chemistry? What are the scope and limitations of the current chemistry situation in data, in publishing and in scientific software? How is it possible to run meetings without meeting in-person?
These questions were addressed during a meeting of the Royal Society of Chemistry’s Chemical Information and Computer Applications interest group (RSC CICAG) organised in Nov 2020, by Christopher Swain, Jeremy Frey, and Jonathan Goodman. The five-day online meeting entitled Open Chemical Science (https://www.rsc.org/events/detail/42090/open-chemical-science) had three interwoven themes: Open Data, Open Access publishing and Open-Source tools. One of the advantages of an online event was it could be attended by people from 45 different countries. The challenges involved in converting what was planned as a three-day physical event into a five-day virtual event with three intertwined strands.
Open science is flourishing in chemistry, but the current infrastructure of both academic and commercial research developed mainly with closed science in mind. A transition to an open infrastructure represents a significant shift. How open science is best funded and assessed is not clear, and researchers are reasonably concerned that their work may not receive all of the recognition that it deserves. However, the move in this direction may be unstoppable.
Publishing data is increasingly coming into focus as a major challenge. Open data is central to current chemical science, but how can this be funded and sustained? The benefit of everyone else’s data being open accessible long term is obvious; the driver for making your own data available can be less clear.
Open access publishing is becoming more accepted as a normal way of publishing research. However, a large number of important publishing resources are not open-access, and it is not clear when or if this will change. The Declaration on Research Assessment (DORA https://sfdora.org) is important, but it is not clear that members of institutions which have committed to DORA are not always aware of the obligations that this entails.
Open software for chemistry is thriving. People learned about examples and best-practice in open data from the meeting. This section of the meeting generated more immediate data and feedback than the other two sections. Open software is a key component of open chemistry. Feedback from software users may provide a more immediate link to the developers than a publication.
The organizers initial apprehension about the success of the meeting, as the date approached and the pandemic did not subside, turned out to be unfounded. The meeting almost certainly had a higher and more diverse attendance than it would have done had it been a traditional in-person meeting. Informal interactions over a video conference, however, were more stilted than actually meeting people in person. In planning future meetings, the CICAG committee will balance the benefits of a more international, economical and environmentally-friendly on-line meeting with a more intense but smaller in-person meeting.
The online meeting was recognised by the RSC with the award of the “2021 Inspirational Committee Award” (https://www.rsc.org/prizes-funding/prizes/2021-winners/rsc-chemical-information-and-computer-applications-group/). The conference has led to a continuing series of workshops.
All the workshops are available on the RSC CICAG YouTube channel https://www.youtube.com/c/RSCCICAG
See full account in PAC: Swain, Christopher J., Frey, Jeremy G. and Goodman, Jonathan M.. "RSC CICAG Open Chemical Science meeting: integrating chemical data from two symposia and a series of workshops" Pure and Applied Chemistry, AOP 21 Apr 2022; https://doi.org/10.1515/pac-2021-1003.
References
1. IUPAC Color Books; https://iupac.org/what-we-do/books/color-books/Search in Google Scholar
2. Heller, S.; McNaught, A.; Stein, S.; Tchekhovskoi, D.; Pletnev, I. InChI - the Worldwide Chemical Structure Identifier Standard. Journal of Cheminformatics2013, 5; https://doi.org/10.1186/1758-2946-5-710.1186/1758-2946-5-7Search in Google Scholar PubMed PubMed Central
3. Davies, A. N.; Lampen, P. JCAMP-DX for NMR. Applied Spectroscopy1993, 47(8), 1093–1099; https://doi.org/10.1366/000370293406787410.1366/0003702934067874Search in Google Scholar
4. Lampen, P.; Hillig, H.; Davies, A. N.; Linscheid, M. JCAMP-DX for mass-spectrometry. Applied Spectroscopy1994, 48 (12), 1545–1552; https://doi.org/10.1366/0003702944027840.10.1366/0003702944027840Search in Google Scholar
5. IUPAC Projects; https://iupac.org/projects/Search in Google Scholar
6. IUPAC Digital Standards; https://iupac.org/what-we-do/digital-standards/Search in Google Scholar
7. IUPAC Project 2017-011-3-024: Digital Dissemination of Data Standards: Planning for a new Cheminformatics Color Book; https://iupac.org/project/2017-011-3-024Search in Google Scholar
8. PAC Cheminformatics Special Issue https://iupac.org/pac-cheminformatics-special-issue/Search in Google Scholar
9. Brown, Josh. An Introduction to Overlay Journals; https://discovery.ucl.ac.uk/id/eprint/19081/Search in Google Scholar
©2022 IUPAC & De Gruyter. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. For more information, please visit: http://creativecommons.org/licenses/by-nc-nd/4.0/
Articles in the same Issue
- Masthead - Full issue pdf
- Treasurer's Column
- Wir schaffen das!
- Features
- The Garden Party at Wiltzangk
- The 2021 IUPAC World Chemistry Leadership Meeting
- Benign by Design
- Not just Good Chemistry
- TSAW—a lifelong challenge or simply an unsolved mystery?
- IUPAC Wire
- Winners of the 2022 IUPAC-Solvay International Award for Young Chemists
- Hanwha-TotalEnergies IUPAC Young Polymer Scientist Award 2022
- 8th Polymer International-IUPAC Award Goes to Zachary Hudson
- 2023 Distinguished Women in Chemistry/Chemical Engineering Award—Call for Nominations
- GWB2023 Sponsorship Opportunities
- Scientific Editor for Pure and Applied Chemistry—Call for Nominations
- IUPAC Centenary Endowment Board—Call for members
- IUPAC Blue Book
- IUPAC Emeritus Fellows
- Project Place
- Terms for Mechanisms of Polymer Growth
- Digital Representation of Units of Measurement
- IUPAC Green Book—Update and More
- Making an imPACt
- Seabed mining and blue growth: exploring the potential of marine mineral deposits as a sustainable source of rare earth elements (MaREEs) (IUPAC Technical Report)
- Standard atomic weights of the elements 2021 (IUPAC Technical Report)
- Terminology and the naming of conjugates based on polymers or other substrates (IUPAC Recommendations 2021)
- Glossary of terms used in physical organic chemistry (IUPAC Recommendations 2021)
- Synthesis design using mass related metrics, environmental metrics, and health metrics
- Bookworm
- Cheminformatics: Data and Standards
- Systematic Nomenclature of Organic, Organometallic and Coordination Chemistry. Chemical-Abstracts Guidelines with IUPAC Recommendations and Many Trivial Names
- Conference Call
- InChI Open Meeting
Articles in the same Issue
- Masthead - Full issue pdf
- Treasurer's Column
- Wir schaffen das!
- Features
- The Garden Party at Wiltzangk
- The 2021 IUPAC World Chemistry Leadership Meeting
- Benign by Design
- Not just Good Chemistry
- TSAW—a lifelong challenge or simply an unsolved mystery?
- IUPAC Wire
- Winners of the 2022 IUPAC-Solvay International Award for Young Chemists
- Hanwha-TotalEnergies IUPAC Young Polymer Scientist Award 2022
- 8th Polymer International-IUPAC Award Goes to Zachary Hudson
- 2023 Distinguished Women in Chemistry/Chemical Engineering Award—Call for Nominations
- GWB2023 Sponsorship Opportunities
- Scientific Editor for Pure and Applied Chemistry—Call for Nominations
- IUPAC Centenary Endowment Board—Call for members
- IUPAC Blue Book
- IUPAC Emeritus Fellows
- Project Place
- Terms for Mechanisms of Polymer Growth
- Digital Representation of Units of Measurement
- IUPAC Green Book—Update and More
- Making an imPACt
- Seabed mining and blue growth: exploring the potential of marine mineral deposits as a sustainable source of rare earth elements (MaREEs) (IUPAC Technical Report)
- Standard atomic weights of the elements 2021 (IUPAC Technical Report)
- Terminology and the naming of conjugates based on polymers or other substrates (IUPAC Recommendations 2021)
- Glossary of terms used in physical organic chemistry (IUPAC Recommendations 2021)
- Synthesis design using mass related metrics, environmental metrics, and health metrics
- Bookworm
- Cheminformatics: Data and Standards
- Systematic Nomenclature of Organic, Organometallic and Coordination Chemistry. Chemical-Abstracts Guidelines with IUPAC Recommendations and Many Trivial Names
- Conference Call
- InChI Open Meeting