Combining molecular fragmentation and machine learning for accurate prediction of adiabatic ionization potentials

Debashis Swain; Sarah Maier; Surya Sekhar Manna; Krishnan Raghavachari

doi:10.1515/pac-2025-0574

Article Publicly Available

Combining molecular fragmentation and machine learning for accurate prediction of adiabatic ionization potentials

Debashis Swain , Sarah Maier , Surya Sekhar Manna and Krishnan Raghavachari

Published/Copyright: September 23, 2025

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Pure and Applied Chemistry Volume 97 Issue 11

Abstract

Accurate wavefunction methods like CCSD(T) can predict chemical and thermodynamic properties of small molecules with near-experimental precision. However, their steep computational cost limits their use for large systems or extensive datasets. In contrast, DFT is faster and more practical for large molecules but often fails to accurately capture electronic changes, especially upon geometry relaxation. To address this, we have developed a graph neural network (GNN)-based Delta-Machine Learning (ΔML) model using Connectivity-Based Hierarchy (CBH-2) molecular fragments as descriptors to predict adiabatic ionization potentials (IPs). Our approach shows systematic error cancellation, guided by electron population difference maps to identify ionization sites within the molecule to construct CBH-2 fragments. We have considered both structural and electronic features to improve our ΔML model. We have observed that upon incorporation of electronic features, the graph-based ΔML model achieved a coupled cluster level of accuracy with an MAE of 0.02 eV for IP prediction. Furthermore, we compared adiabatic and vertical IP predictions across different methods. While raw DFT errors are strongly dependent on the functional used, our ΔML model is robust and much less functional-sensitive.

Keywords: Computational chemistry; Density Functional Theory (DFT); electronic structures; quantum science and technology; machine learning

Introduction

Computational quantum chemistry has advanced substantially in recent decades, developing methods capable of analyzing relatively large molecules with good accuracy at acceptable computational cost. ¹ Currently, hybrid and range-separated DFT approaches are the methods of choice that are widely employed in large-scale computational studies of molecules and materials. ² Despite these developments, achieving results comparable to those from highly accurate correlation-driven methods like coupled cluster (e.g., CCSD(T)) remains challenging, particularly for exploring extensive molecular databases or systems comprising numerous heavy atoms. ³ ^, ⁴ ^, ⁵

Prior to the availability of modern-day computers, the accurate calculation of thermodynamic properties of molecular systems was a challenging endeavor. The earliest attempts employed simplified models such as Hartree–Fock theory with moderate-sized basis sets. ⁶ A notable advancement to enhance computational accuracy was John Pople’s introduction of the isodesmic bond separation (IBS) method in 1970. The IBS method selects molecular fragments based on bonded heavy atoms to preserve formal bond types in chemical reactions, significantly facilitating error cancellation (Fig. 1). By utilizing balanced reaction energies, it enabled significant improvements in thermochemical accuracy for lower-level computational methods. This fundamental approach of leveraging error cancellation opens the gateway for the developments of different novel computational methods. Recently, many groups extended the foundational work to develop new methods based on systematic error cancellation via bond connectivity approach. Most popular among those are hybridization-based homodesmotic scheme and a range of related methods. ⁷ ^, ⁸ ^, ⁹ To organize these diverse approaches, our group introduced the generalized Connectivity-Based Hierarchy (CBH) framework, which provides a chemically intuitive method for applying corrections to simpler theoretical models using reaction schemes inspired by IBS principles, guided exclusively by connectivity and chemical bonding. ¹⁰ The CBH works through a hierarchical series of increasingly sophisticated reaction schemes, each incorporating progressively larger segments of the molecular environment. This approach significantly enhances error cancellation and improves overall accuracy.

Fig. 1:

Isodesmic bond separation (IBS) reaction scheme for 2,3-Dimethylpent-2-ene.

The CBH protocol is well-defined, and a detailed explanation of the protocol and its applications are available in a series of papers by Raghavachari and co-workers. ¹¹ ^, ¹² ^, ¹³ ^, ¹⁴ The hierarchy of CBH rungs grows systematically with the size of the fragments, CBH-0 units consisting of each single heavy atom, CBH-1 units consisting of one heavy-atom bond, CBH-2 units consisting of one heavy atom along with all heavy atoms in its immediate bonding environment, etc. These rungs alternate between ‘atom-centric’ and ‘bond-centric’, all odd-numbered rungs (CBH-1 and CBH-3) are atom-centric and even-numbered rungs (CBH-0, CBH-2 and CBH-4) are bond-centric in nature. The fragments form higher rungs of CBH capture larger portions of the molecule to provide better error cancellation, though the fragments themselves become large. Thus, an optimum CBH rung is important to maintain the cost-accuracy balance for a certain molecule. CBH-2 is the mostly used fragmentation technique in our studies which appears to be adequate to provide near chemical accuracy at modest cost (Fig. 2). ¹¹ ^, ¹² ^, ¹³ ^, ¹⁴ ^, ¹⁵ ^, ¹⁶

Fig. 2:

Atom-centered CBH-2 fragmentation scheme for 2,5-Dimethylhexa-1,3,5-triene.

In addition to the traditional computational chemistry methodologies, complementary techniques have been added to the computational toolbox to overcome the ongoing battle between accuracy and efficiency. Importantly, several schemes based on machine learning (ML) have emerged as effective complementary tools in this quest for computational efficiency and accuracy. ¹⁷ ^, ¹⁸ ^, ¹⁹ ^, ²⁰ ML methods are efficient due to their high accuracy and low computing cost. ²¹ Many ML models are trained on medium accuracy DFT data and thus predicted properties cannot achieve high accuracy (chemical accuracy defined to be ±1 kcal/mol) of correlated quantum mechanical (QM) methods. Thus, to attain chemical accuracy, ML models must be trained on higher accuracy data (e.g., coupled-cluster quality). Recent studies highlight the potential of ML when coupled with high-level QM data, specifically through delta machine learning (ΔML). ²² ^, ²³ ^, ²⁴ ^, ²⁵ ^, ²⁶ ^, ²⁷ ^, ²⁸ ^, ²⁹ ^, ³⁰ This hybrid approach leverages the strengths of both QM and ML by accurately computing correction terms (Δ), to reconcile lower-level theory results with more precise, correlated methods as follows:

(1) ∆ ML ≈ E high ‐ level − E low ‐ level

Overall, CBH and ΔML share a common strategy of correcting the systematic errors inherent in lower-level methods, such as DFT, to achieve high accuracy in chemical predictions. Although, CBH can effectively correct many deficiencies of low-level methods, its accuracy may be limited when the chosen fragments fail to fully represent the broader molecular environment. CBH focuses on correcting errors within localized molecular units, whereas ML is not constrained by such locality and can capture more global molecular features. Recently, our group has shown the systematic improvement of the accuracy from lower level DFT to high level G4 accuracy using CBH-2 and ΔML models for the prediction of vertical ionization potentials. ³¹ Earlier, a fragmented-based graph neural network (GNN labeled FragGraph) model has been used successfully for the prediction of atomization energies when coupled with ΔML corrections incorporating molecular graphs derived from CBH fragmentation as input descriptors. ³² These studies demonstrate the effectiveness of CBH and corresponding ML techniques for the prediction of highly accurate chemical properties.

In the present study, we have employed analogous CBH error correction schemes and GNN models in conjunction with ΔML to calculate adiabatic ionization potentials (IPs) with different functionals. We have used the automated CBH method to identify oxidation sites using an electron population difference map constructed from atomic electron populations of neutral and cationic species. For the graph-based ΔML method, we imposed CBH-like features to predict the adiabatic IPs. The fully developed ΔML model embedded with QM based electronic features yields a performance with high-level accuracy.

Methods

CBH framework

The CBH framework systematically breaks a molecule into smaller units defined by a particular rung in the hierarchy based on its connectivity patterns, significantly simplifying complex quantum chemical calculations. The CBH correction, defined as ΔCBH, was calculated by taking the difference between the energies of the fragments computed at a high-level theory (G4(MP2)) and a low-level theory (DFT). The CBH energy can be calculated using below-mentioned equation with respect to low level DFT values.

E high full − E low full ≈ ∑ i E high i − ∑ i E low i

(2) = ∆ CB H correction

(3) E high full ≈ E low full + ∆ CB H correction = E CBH

where E _high(full) and E _low(full) are the energies of the full molecule calculated at the high-level and low-level of theory, respectively. E _high(i) and E _low(i) are the energies of the ith fragment calculated at the high-level and low-level of theory, respectively. ΔCBH_correction is the total CBH energy correction from the low to the high level. Figure 3 shows how ΔCBH_correction and E ^CBH are calculated for the molecule 2,5-Dimethylhexa-1,3,5-triene. Overall, CBH is a chemically intuitive approach to obtain highly accurate thermochemical energies, using structure-based information to derive local corrections to the electronic environment of a molecule.

Fig. 3:

Pictorial representation for the calculation of CBH corrected energy (E ^CBH) of 2,5-Dimethylhexa-1,3,5-triene.

The CBH-2 protocol used, focusing on one heavy atom and its immediate bonding environment, thus provides accurate corrections with modest computational costs. In the CBH calculations, each fragment is capped appropriately with hydrogen atoms to preserve the original hybridization. The CBH-2 protocol for an ionization process is shown in Fig. 4.

Fig. 4:

Illustrative example of CBH fragmentation method: (a) full CBH-2 fragmentation scheme of ionization process for 2,7-Dimethylocta-4,7-dien-1-ol and (b) CBH-2 based CBH_correction equation after elimination of common fragments.

The neutral and cation structures differ only by a single electron in the molecular process. Both systems have similar fragments except the ionization site of the heavy atom fragment. This similarity cancels all the common fragments between reactants and products, and only a few fragments need to be calculated at high-level, providing a computational advantage. The CBH correction for ionization, after elimination of common fragments, is illustrated in Fig. 4b. To identify the sites of oxidation, an automated method has been used in this work. ¹² The site of electron loss is identified by taking the NPA (natural population analysis) differences between the neutral and ionized species. The electron loss localized to atoms has been shown through a population difference map in Fig. 5. The atom with highest electron loss is considered as the site of ionization and becomes the CBH-2 fragment center. If several atoms have identical populations, as in Fig. 5, the atom listed earliest in the xyz file is taken as the site of ionization. ³¹

Fig. 5:

Carbons with similar NPA (natural population analysis) values are shown in matching colors, and the numbers in the circles denote their NPA changes upon ionization at the B3LYP-D3BJ level for 1,4-dimethylcyclohexane.

Data set and electronic structure calculations

The QM7b dataset has been considered for our study, used extensively by computational chemists. ³³ The QM7b dataset consists of 7211 molecules with C, N, O, S and Cl atoms of maximum size of 7 heavy atoms. This is sufficient to demonstrate the computational advantage from CBH-2 corrections. We obtained all our calculations, viz. DFT, CBH-corrected DFT, ΔML, and ΔML+ using GNN model. All calculations were performed using the Gaussian 16 suite of programs. ³⁴ The G4(MP2) method has been chosen as a reference theory for the calculation of IPs, since it is typically accurate to ∼1 kcal/mol compared to experimental values. ³⁵ The B3LYP-D3BJ functional with the 6-31G(2df,p) basis set for low-level method has been used to test the performance of CBH and ΔML. ³¹ ^, ³⁶ ^, ³⁷ ^, ³⁸ In addition to B3LYP, a hybrid meta-GGA functional (M06-2X) with the 6-31G(2df,p) basis set has been used to check the density functional dependency. ³⁹

Both neutral and ionized full molecules and their fragments were optimized at the B3LYP/6-31G(2df,p) level of theory, with frequencies of the full molecules being scaled by 0.9854. It is noteworthy that for the successful G4 method, the same level of theory was used to optimize the geometry. ⁴⁰ ^, ⁴¹ All structures were verified to be local minima. All the optimization of full molecules and fragments were performed at the low and high levels of theory for both ionized and neutral species.

CBH based descriptors and ΔML model architecture

Previously, our group demonstrated a ΔML based FragGraph model for predicting atomization energies with a remarkable accuracy of 1 kJ/mol compared to target G4(MP2) calculated energies for molecules in the relatively large QM9 data set of ∼130,000 systems. ³² In the FragGraph model, heavy atoms are represented as nodes in a computational graph, and covalent bonds are represented as edges, although fully connected (FC) graphs are also possible. In the above-mentioned study, each atom’s chemical environment was numerically encoded as node-wise descriptors. Also, CBH-2 fragments were converted to vector embeddings using a pretrained mol2vec model and assigned to the molecular graph nodes. ³¹ Since we have a smaller number of training data points, we have considered a simplified GNN-based three edge-conditioned convolutional (ECC) model (a neural network technique that adapts based on the links between atoms) from our earlier work on vertical IPs. ³¹ To enable a direct and consistent comparison between the vertical and adiabatic IP approaches, we kept the ΔML model identical for both cases. In this study, we used traditional CBH-2 fragment-based atomic features to build the ECFP-like fingerprint. These features encode a range of structural properties, including the atomic number, the total number of attached atoms, the number of heavy atoms, and the count of hydrogen atoms. The bonding connectivity within CBH fragments was captured via the convolutional steps of the GNN, utilizing both node and edge information. Atoms embedded within rings or aromatic environments were further represented by one-hot encodings (each category represented by a separate on/off flag). Finally, to improve the accuracy of the model, we incorporated electronic descriptors derived from NPA charge differences, allowing the model to better capture the underlying pattern of individual atoms’ charge distributions for the generation of cationic species from the neutral upon removal of an electron.

The Python Spektral library was used to build the graph network employed for the present study. Spektral is a Python library for graph deep learning, based on the Keras API and TensorFlow 2. ²⁰ ^, ⁴² ^, ⁴³ ^, ⁴⁴ ^, ⁴⁵ ^, ⁴⁶ ^, ⁴⁷ The ML architecture comprised of three edge-conditioned convolutional (ECC) layers, each incorporating batch normalization, followed by a global attention pooling layer and a dense neural network layer. Each ECC layer performed vector updates using neighborhood information for given node i in following equation,

(4) x i ′ = x i W root + ∑ j ∈ N i x j MLP e j → i + b

where x _i contains attributes for node i, W _root is a weight matrix, x _j contains attributes for node j which is contained in the neighborhood of node i, MLP is a multi-layer perceptron that outputs an edge-specific weight as a function of edge attributes for the edge connecting nodes i and j, and b is the bias term. Covalent bonds are considered as edges in the model. Each ECC layer consisted of ReLU activation function (more details in Text S1, Supporting Information). The dataset was partitioned into training, validation, and test sets with a 70:10:20 ratio. Predictive performance was quantitatively assessed using standard metrics such as MAE, RMSE. These metrics provide a rigorous evaluation of the accuracy and reliability of the ΔML-corrected predictions compared to the reference G4(MP2) results.

Results and discussion

CBH-2 corrections for adiabatic IPs

The fragments were formed from the atom-centric second rung of CBH, significantly correcting the low level DFT errors with low computational cost. The accuracy of the CBH-2 methodology was evaluated for adiabatic IPs, analogous to previously studied vertical IPs. However, before the assessment, we have pruned the dataset to eliminate systems where there is an obvious mismatch in the CBH fragments between the molecule and the cation. While optimizing both neutral and cations, we have found out that 322 species have significantly different structures in the parent molecule and cation, and 37 species are too small to perform CBH-2 analysis. Additionally, some molecules with complex Kekulé structures and problematic functional groups such as nitro groups, sulfoxides, etc. cannot easily be used with the CBH protocol as they contain delocalized bonds resulting in fragments unrepresentative of the local bonding environment in the parent molecule. So, we started with the remaining structurally stable pairs of 5552 neutral and cationic molecules for adiabatic IP calculations. Analysis of a comprehensive set of molecules revealed a similar trend in error distribution for neutral and cationic energies compared to vertical IP computations. Using CBH-2 corrections, the MAE of low-level DFT calculations performed at the B3LYP-D3BJ/6-31G(2df,p) level was reduced from 0.59 eV to 0.22 eV as shown in Fig. 6, achieving an accuracy comparable to that of vertical IPs (DFT to CBH-2, the MAE reduced from 0.61 to 0.16 eV). To assess functional dependence, we also performed calculations at the M06-2X/6-31G(2df,p) level of theory. While uncorrected M06-2X is more accurate than B3LYP, its MAE reduced from 0.30 eV to 0.14 eV in CBH-2 correction. Notably, the corrected MAEs remain within a similar range when switching from B3LYP to M06-2X, indicating significant accuracy improvements can be achieved using low-level DFT.

Fig. 6:

The functional dependency of adiabatic IPs calculation from uncorrected DFT to CBH-2 correction for B3LYP-D3BJ/6-31G(2df,p) and M06-2X/6-31G(2df,p).

ΔML corrections for adiabatic IP predictions

By correcting systematic errors in local chemical units, CBH-2 significantly cuts down the low level DFT error, though it is outside the goal chemical accuracy of 1 kcal/mol (0.043 eV). To achieve chemical accuracy, we embedded structural and electronic features incorporated from DFT calculations in the ML techniques. By encoding node and edge features, graph networks naturally leverage connectivity information, much like CBH. However, our ML protocol includes an electronic descriptor from population analysis on the full molecule and knows about bond distances rather than bond orders. Thus, our ML protocol does not have the limitations resulting from the multiple resonance structures of CBH fragments. ΔML corrections significantly improve the prediction accuracy of adiabatic IPs beyond CBH-2 results. Two variants of the ML model were assessed: ΔML using structural features alone, and ΔML+ incorporating electronic descriptors from Natural Population Analysis (NPA). Figure 7 shows the scatter plot of actual G4MP2 level IPs with respect to the ΔML+ predicted IPs for the test set of ∼1400 molecules. The MAE, RMSE, R ² value shows the accuracy.

Fig. 7:

ΔML+ predicted B3LYP-D3BJ/6-31G(2df,p) IPs (in blue color) plotted against G4(MP2) IPs (in red color).

Overall, the ΔML model exhibited dramatic improvements, reducing MAE from 0.59 to 0.06 eV as shown in Fig. 8. The inclusion of electronic descriptors (ΔML+) further reduced MAE substantially to approximately 0.02 eV, underscoring the importance of electron population changes in accurately modeling ionization. As we go up in the method sophistication from uncorrected DFT, CBH-2, ΔML and ΔML+, the MAEs are sequentially decreased with values of 0.59, 0.22, 0.06, and 0.02, respectively. RMSE trends are also similar with MAE. Moreover, we have performed the error distribution analysis for the ΔML+ model, showing that the error (difference between G4(MP2) and ML predicted IP) points fall within the one standard deviation range for most of the test molecules (∼75 %) with the standard deviation of 0.02 eV (Figure S2, Supporting Information). This clearly demonstrates that the ΔML+ model achieves high accuracy and can be treated as a robust model for the prediction of IP for unknown molecules.

Fig. 8:

MAEs for uncorrected DFT, CBH-2, ΔML and ΔML+ for both level of theory B3LYP (orange colored) and M06-2X (blue colored).

The trends from M06-2X/6-31G(2df,p) level theory remain similar to those from B3LYP, the MAEs from uncorrected DFT, CBH-2, ΔML, and ΔML+ decreasing sequentially with values of 0.30, 0.17, 0.06, and 0.025, respectively (Fig. 8). Evaluation of model robustness across functional and basis set variations affirmed minimal sensitivity of ΔML+ performance to these parameters. Both B3LYP-D3BJ and M06-2X corrected adiabatic IPs displayed comparable MAEs (∼0.02–0.03 eV) in the ΔML+ model, despite differing significantly in raw DFT errors (∼0.3 eV). Despite the presence of many more empirically optimized parameters in M06-2X, IPs calculated using the less empirical B3LYP-D3BJ functional serve as efficient input data for ΔML+ model, ultimately yielding a slightly better-fitted model for predicting IPs of simple organic molecules.

Adiabatic IP vs. vertical IP

Here, we have compared our adiabatically calculated IPs with previously calculated vertical IP values as shown in Fig. 9. ³¹ Figure 9 shows that the MAEs are not changed significantly when transitioning from the vertical to the adiabatic approach. The MAE values remain comparable across uncorrected DFT to ΔML+ model. In the case of CBH-2 corrections, a slightly higher MAE of 0.22 eV was observed for the adiabatic approach compared to 0.17 eV for the vertical approach. This increase is likely due to the lack of coarse-grained CBH modeling for aromatic systems in the adiabatic framework. Overall, these findings suggest that the DFT is sufficient to train high accuracy ΔML models (with MAE well within <1 kcal/mol) for predicting the vertical and adiabatic IPs of small organic molecules.

Fig. 9:

MAEs of all the models for both vertical (magenta in colored) and adiabatic (cyan in colored) approach.

Conclusions

By embedding established quantum chemical strategies, such as molecular fragmentation, systematic error cancellation, and modern machine learning methods, we have demonstrated that chemical accuracy is achievable for many molecular properties. Although DFT frequently struggles with quantitative precision, its errors typically exhibit systematic behavior. Such predictable inaccuracies provide a valuable foundation for developing effective correction methodologies aimed at achieving high-level computational accuracy. In this work, we demonstrated an automated approach utilizing electron population difference maps to accurately identify ionization sites within relaxed molecular structures, greatly simplifying and streamlining the CBH correction protocol for calculating IPs. Furthermore, integrating electronic descriptors derived from DFT, specifically electron population difference features calculated from optimized neutral and cationic species, significantly enhances our ΔML model’s predictive accuracy with the MAE of 0.02 eV, exceeding conventional chemical accuracy thresholds (1 kcal/mol) and closely approaching benchmark level methods. While raw DFT results for IPs remain notably sensitive to the choice of functional, our optimized ΔML approach consistently delivered robust predictions with substantially reduced dependence on the functional variations. We were able to compare both vertical and adiabatic IPs in series of chemical toolbox techniques to achieve the chemical accuracy which completes our model dependency. Overall, the employment of ΔML as a computational tool provides an exciting pathway to understand the correlations between molecular fragments, electronic descriptors, and DFT errors, offering promising advancements in computational chemistry.

Corresponding author: Krishnan Raghavachari, Department of Chemistry, Indiana University, Bloomington, IN 47405, USA, e-mail: kraghava@iu.edu

Article note: A collection of invited papers to celebrate the UN’s proclamation of 2025 as the International Year of Quantum Science and Technology.

Funding source: U. S. National Science Foundation

Award Identifier / Grant number: CHE-2102583

Acknowledgments

The Big Red 3 supercomputing facility at Indiana University was used for the calculations in this study.

Research ethics: Not applicable.
Informed consent: Not applicable.
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Use of Large Language Models, AI and Machine Learning Tools: None declared.
Conflict of interest: The authors state no conflict of interest.
Research funding: This project was supported by the National Science Foundation (NSF) grant CHE-2102583 at Indiana University.
Data availability: Not applicable.

References

1. Grimme, S.; Schreiner, P. R. Computational Chemistry: The Fate of Current Methods and Future Challenges. Angew. Chem., Int. Ed. 2018, 57, 4170–4176; https://doi.org/10.1002/anie.201709943.Search in Google Scholar PubMed

2. Mardirossian, N.; Head-Gordon, M. Thirty Years of Density Functional Theory in Computational Chemistry: An Overview and Extensive Assessment of 200 Density Functionals. Mol. Phys. 2017, 115, 2315–2372; https://doi.org/10.1080/00268976.2017.1333644.Search in Google Scholar

3. Boese, A. D.; Oren, M.; Atasoylu, O.; Martin, J. M. L.; Kállay, M.; Gauss, J. W3 Theory: Robust Computational Thermochemistry in the kJ mol-1 Accuracy Range. J. Chem. Phys. 2004, 120, 4129–4141; https://doi.org/10.1063/1.1638736.Search in Google Scholar PubMed

4. Christiansen, O. Coupled Cluster Theory with Emphasis on Selected New Developments. Theor. Chem. Acc. 2006, 116, 106–123; https://doi.org/10.1007/s00214-005-0037-5.Search in Google Scholar

5. Raghavachari, K.; Trucks, G. W.; Pople, J. A.; Head-Gordon, M. A Fifth-Order Perturbation Comparison of Electron Correlation Theories. Chem. Phys. Lett. 1989, 157, 479–483; https://doi.org/10.1016/S0009-2614(89)87395-6.Search in Google Scholar

6. Hehre, W. J.; Ditchfield, R.; Radom, L.; Pople, J. A. Molecular Orbital Theory of Electronic Structure of Organic Compounds. 5. Molecular Theory of Bond Separation. J. Am. Chem. Soc. 1970, 92, 4796–4801; https://doi.org/10.1021/ja00719a006.Search in Google Scholar

7. George, P.; Trachtman, M.; Bock, C. W.; Brett, A. M. An Alternative Approach to the Problem of Assessing Stabilization Energies in Cyclic Conjugated Hydrocarbons. Theor. Chim. Acta 1975, 38, 121–129; https://doi.org/10.1007/BF00581469.Search in Google Scholar

8. Pieniazek, S. N.; Clemente, F. R.; Houk, K. N. Sources of Error in DFT Computations of C–C Bond Formation Thermochemistries: Pi → Sigma Transformations and Error Cancellation by DFT Methods. Angew. Chem., Int. Ed. 2008, 47, 7746–7749; https://doi.org/10.1002/anie.200801843.Search in Google Scholar PubMed

9. Wheeler, S. E.; Houk, K. N.; Schleyer, P. V. R.; Allen, W. D. A Hierarchy of Homodesmotic Reactions for Thermochemistry. J. Am. Chem. Soc. 2009, 131, 2547–2560; https://doi.org/10.1021/ja805843n.Search in Google Scholar PubMed PubMed Central

10. Ramabhadran, R. O.; Raghavachari, K. Theoretical Thermochemistry for Organic Molecules: Development of the Generalized Connectivity-Based Hierarchy. J. Chem. Theory Comput. 2011, 7, 2094–103; https://doi.org/10.1021/ct200279q.Search in Google Scholar PubMed

11. Debnath, S.; Sengupta, A.; Raghavachari, K. Eliminating Systematic Errors in DFT via Connectivity-Based Hierarchy: Accurate Bond Dissociation Energies of Biodiesel Methyl Esters. J. Phys. Chem. A 2019, 123, 3543–3550; https://doi.org/10.1021/acs.jpca.9b01478.Search in Google Scholar PubMed

12. Maier, S.; Thapa, B.; Raghavachari, K. G4 Accuracy at DFT Cost: Unlocking Accurate Redox Potentials for Organic Molecules using Systematic Error Cancellation. Phys. Chem. Chem. Phys. 2020, 22, 4439–4452; https://doi.org/10.1039/C9CP06622E.Search in Google Scholar PubMed

13. Sengupta, A.; Raghavachari, K. Prediction of Accurate Thermochemistry of Medium and Large Sized Radicals using Connectivity-Based Hierarchy (CBH). J. Chem. Theory Comput. 2014, 10, 4342–4350; https://doi.org/10.1021/ct500484f.Search in Google Scholar PubMed

14. Thapa, B.; Raghavachari, K. Accurate pKa Evaluations for Complex Bio-Organic Molecules in Aqueous Media. J. Chem. Theory Comput. 2019, 15, 6025–6035; https://doi.org/10.1021/acs.jctc.9b00606.Search in Google Scholar PubMed

15. Collins, E. M.; Sengupta, A.; AbuSalim, D. I.; Raghavachari, K. Accurate Thermochemistry for Organic Cations via Error Cancellation using Connectivity-based Hierarchy. J. Phys. Chem. A 2018, 122, 1807–1812; https://doi.org/10.1021/acs.jpca.7b12202.Search in Google Scholar PubMed

16. Sanchez, A. J.; Maier, S.; Raghavachari, K. Leveraging DFT and Molecular Fragmentation for Chemically Accurate pKa Prediction using Machine Learning. J. Chem. Inf. Model. 2024, 64, 712–723; https://doi.org/10.1021/acs.jcim.3c01923.Search in Google Scholar PubMed

17. Pilania, G.; Mannodi-Kanakkithodi, A.; Uberuaga, B. P.; Ramprasad, R.; Gubernatis, J. E.; Lookman, T. Machine Learning Bandgaps of Double Perovskites. Sci. Rep. 2016, 6, 19375; https://doi.org/10.1038/srep19375.Search in Google Scholar PubMed PubMed Central

18. Butler, K. T.; Davies, D. W.; Cartwright, H.; Isayev, O.; Walsh, A. Machine Learning for Molecular and Materials Science. Nature 2018, 559, 547–555; https://doi.org/10.1038/s41586-018-0337-2.Search in Google Scholar PubMed

19. Schutt, K. T.; Sauceda, H. E.; Kindermans, P. J.; Tkatchenko, A.; Muller, K. R. Schnet – a Deep Learning Architecture for Molecules and Materials. J. Chem. Phys. 2018, 148, 241722; https://doi.org/10.1063/1.5019779.Search in Google Scholar PubMed

20. Zubatyuk, R.; Smith, J. S.; Leszczynski, J.; Isayev, O. Accurate and Transferable Multitask Prediction of Chemical Properties with an Atoms-in-Molecules Neural Network. Sci. Adv. 2019, 5, eaav6490; https://doi.org/10.1126/sciadv.aav6490.Search in Google Scholar PubMed PubMed Central

21. von Lilienfeld, O. A.; Burke, K. Retrospective on a Decade of Machine Learning for Chemical Discovery. Nat. Commun. 2020, 11, 4895; https://doi.org/10.1038/s41467-020-18556-9.Search in Google Scholar PubMed PubMed Central

22. Gupta, A. K.; Raghavachari, K. Three-Dimensional Convolutional Neural Networks Utilizing Molecular Topological Features for Accurate Atomization Energy Predictions. J. Chem. Theory Comput. 2022, 18, 2132–2143; https://doi.org/10.1021/acs.jctc.1c00504.Search in Google Scholar PubMed

23. King, D. S.; Truhlar, D. G.; Gagliardi, L. Machine-Learned Energy Functionals for Multiconfigurational Wave Functions. J. Phys. Chem. Lett. 2021, 12, 7761–7767; https://doi.org/10.1021/acs.jpclett.1c02042.Search in Google Scholar PubMed

24. Qiao, Z. R.; Welborn, M.; Anandkumar, A.; Manby, F. R.; Miller, T. F. Orbnet: Deep Learning for Quantum Chemistry using Symmetry-Adapted Atomic-Orbital Features. J. Chem. Phys. 2020, 153, 124111; https://doi.org/10.1063/5.0021955.Search in Google Scholar PubMed

25. Ramakrishnan, R.; Dral, P. O.; Rupp, M.; von Lilienfeld, O. A. Big Data Meets Quantum Chemistry Approximations: The Delta-Machine Learning Approach. J. Chem. Theory Comput. 2015, 11, 2087–96; https://doi.org/10.1021/acs.jctc.5b00099.Search in Google Scholar PubMed

26. Ruth, M.; Gerbig, D.; Schreiner, P. R. Machine Learning of Coupled Cluster (T) Energy Corrections via Δ-Learning. J. Chem. Theory Comput. 2022, 18, 4846–4855; https://doi.org/10.1021/acs.jctc.2c00501.Search in Google Scholar PubMed

27. Sun, G.; Sautet, P. Toward Fast and Reliable Potential Energy Surfaces for Metallic Pt Clusters by Hierarchical Δ Neural Networks. J. Chem. Theory Comput. 2019, 15, 5614–5627; https://doi.org/10.1021/acs.jctc.9b00465.Search in Google Scholar PubMed

28. Zaspel, P.; Huang, B.; Harbrecht, H.; von Lilienfeld, O. A. Boosting Quantum Machine Learning Models with a Multilevel Combination Technique: Pople Diagrams Revisited. J. Chem. Theory Comput. 2019, 15, 1546–1559; https://doi.org/10.1021/acs.jctc.8b00832.Search in Google Scholar PubMed

29. Friesner, R. A. Ab Initio Quantum Chemistry: Methodology and Applications. Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 6648–6653; https://doi.org/10.1073/pnas.0408036102.Search in Google Scholar PubMed PubMed Central

30. Mata, R. A.; Suhm, M. A. Benchmarking Quantum Chemical Methods: Are we Heading in the Right Direction? Angew. Chem., Int. Ed. 2017, 56, 11011–11018; https://doi.org/10.1002/anie.201611308.Search in Google Scholar PubMed PubMed Central

31. Maier, S.; Collins, E. M.; Raghavachari, K. Quantitative Prediction of Vertical Ionization Potentials from DFT via a Graph-Network-Based Delta Machine Learning Model Incorporating Electronic Descriptors. J. Phys. Chem. A 2023, 127, 3472–3483; https://doi.org/10.1021/acs.jpca.2c08821.Search in Google Scholar PubMed

32. Collins, E. M.; Raghavachari, K. A Fragmentation-Based Graph Embedding Framework for QM/ML. J. Phys. Chem. A 2021, 125, 6872–6880; https://doi.org/10.1021/acs.jpca.1c06152.Search in Google Scholar PubMed

33. Blum, L. C.; Reymond, J. L. 970 Million Druglike Small Molecules for Virtual Screening in the Chemical Universe Database GDB-13. J. Am. Chem. Soc. 2009, 131, 8732–8733; https://doi.org/10.1021/ja902302h.Search in Google Scholar PubMed

34. Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Scalmani, G.; Barone, V.; Petersson, G. A.; Nakatsuji, H.; Li, X.; Caricato, M.; Marenich, A. V.; Bloino, J.; Janesko, B. G.; Gomperts, R.; Mennucci, B.; Hratchian, H. P.; Ortiz, J. V.; Izmaylov, A. F.; Sonnenberg, J. L.; Williams-Young, D.; Ding, F.; Lipparini, F.; Egidi, F.; Goings, J.; Peng, B.; Petrone, A.; Henderson, T.; Ranasinghe, D.; Zakrzewski, V. G.; Gao, J.; Rega, N.; Zheng, G.; Liang, W.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Vreven, T.; Throssell, K.; Montgomery Jr, J. A.; Peralta, J. E.; Ogliaro, F.; Bearpark, M. J.; Heyd, J. J.; Brothers, E. N.; Kudin, K. N.; Staroverov, V. N.; Keith, T. A.; Kobayashi, R.; Normand, J.; Raghavachari, K.; Rendell, A. P.; Burant, J. C.; Iyengar, S. S.; Tomasi, J.; Cossi, M.; Millam, J. M.; Klene, M.; Adamo, C.; Cammi, R.; Ochterski, J. W.; Martin, R. L.; Morokuma, K.; Farkas, O.; Foresman, J. B.; Fox, D. J. Gaussian-16; Gaussian, Inc.: Wallingford, CT, 2016.Search in Google Scholar

35. Curtiss, L. A.; Redfern, P. C.; Raghavachari, K. Gaussian-4 Theory using Reduced-Order Perturbation Theory. J. Chem. Phys. 2007, 126, 084108; https://doi.org/10.1063/1.2770701.Search in Google Scholar PubMed

36. Grimme, S.; Antony, J.; Ehrlich, S.; Krieg, H. A Consistent and Accurate ab Initio Parametrization of Density Functional Dispersion Correction (DFT-D) for the 94 Elements H–Pu. J. Chem. Phys. 2010, 132, 154104; https://doi.org/10.1063/1.3382344.Search in Google Scholar PubMed

37. Grimme, S.; Ehrlich, S.; Goerigk, L. Effect of the Damping Function in Dispersion-Corrected Density Functional Theory. J. Comput. Chem. 2011, 32, 1456–1465; https://doi.org/10.1002/jcc.21759.Search in Google Scholar PubMed

38. Lee, C. T.; Yang, W. T.; Parr, R. G. Development of the Colle – Salvetti Correlation-Energy Formula into a Functional of the Electron Density. Phys. Rev. B 1988, 37, 785–789; https://doi.org/10.1103/PhysRevB.37.785.Search in Google Scholar PubMed

39. Wang, Y.; Verma, P.; Jin, X.; Truhlar, D. G.; He, X. Revised M06 Density Functional for Main-Group and Transition-Metal Chemistry. Proc. Natl. Acad. Sci. U. S. A. 2018, 115, 10257–10262; https://doi.org/10.1073/pnas.1810421115.Search in Google Scholar PubMed PubMed Central

40. Curtiss, L. A.; Redfern, P. C.; Raghavachari, K. Interdisciplinary Reviews: Computational Molecular Science. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2011, 1, 810–836; https://doi.org/10.1002/wcms.59.Search in Google Scholar

41. Curtiss, L. A.; Redfern, P. C.; Raghavachari, K. Gaussian-4 Theory. J. Chem. Phys. 2007, 127, 084108; https://doi.org/10.1063/1.2436888.Search in Google Scholar PubMed

42. Grattarola, D.; Alippi, C. Graph Neural Networks in TensorFlow and Keras with Spektral. IEEE Comput. Intell. Mag. 2021, 16, 99–106; https://doi.org/10.1109/MCI.2020.3039072.Search in Google Scholar

43. Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; Kudlur, M.; Levenberg, J.; Monga, R.; Moore, S.; Murray, D. G.; Steiner, B.; Tucker, P.; Vasudevan, V.; Warden, P.; Wicke, M.; Yu, Y.; Zheng, X. TensorFlow: A System for Large-Scale Machine Learning. In Proc. 12th USENIX Symp. Oper. Syst. Des. Implement., 2016; pp. 265–283.Search in Google Scholar

44. Chollet, F. K. Tensorflow Software Package, 2020. https://github.com/fchollet/keras (accessed 2020-04-01).Search in Google Scholar

45. Li, G. Q.; Rudshteyn, B.; Shee, J.; Weber, J. L.; Coskun, D.; Bochevarov, A. D.; Friesner, R. A. Accurate Quantum Chemical Calculation of Ionization Potentials: Validation of the DFT-loc Approach via a Large Dataset Obtained from Experiments and Benchmark Quantum Chemical Calculations. J. Chem. Theory Comput. 2020, 16, 5956–596; https://doi.org/10.1021/acs.jctc.9b00875.Search in Google Scholar PubMed

46. Montavon, G.; Rupp, M.; Gobre, V.; Vazquez-Mayagoitia, A.; Hansen, K.; Tkatchenko, A.; Müller, K. R.; Anatole von Lilienfeld, O. Machine Learning of Molecular Electronic Properties in Chemical Compound Space. New J. Phys. 2013, 15, 095003; https://doi.org/10.1088/1367-2630/15/9/095003.Search in Google Scholar

47. Hassan, M.; Brown, R. D.; Varma-O’Brien, S.; Rogers, D. Cheminformatics Analysis and Learning in a Data-Pipelining Environment. Mol. Diversity 2006, 10, 283–299; https://doi.org/10.1007/s11030-006-9041-5.Search in Google Scholar PubMed

Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/pac-2025-0574).

Received: 2025-07-24

Accepted: 2025-08-25

Published Online: 2025-09-23

Published in Print: 2025-11-25

Supplementary Material

Articles in the same Issue

https://doi.org/10.1515/pac-2025-0574

Keywords for this article

Computational chemistry; Density Functional Theory (DFT); electronic structures; quantum science and technology; machine learning