Advancing chemical safety prediction: an integrated GNN framework with DFT-augmented cyclic compound solution
- Authors
- Lee, Seul; Lee, Jooyeon; Yoon, Unghwi; Koo, Jahyun; Yoon, Young Wook; Cho, Yoonjae; Hwang, Seung-Ryul; Jeong, Keunhong
- Issue Date
- 2026-02
- Publisher
- BMC
- Citation
- Journal of Cheminformatics, v.18, no.1
- Abstract
- The rapid proliferation of chemical substances presents significant challenges in assessing their safety-critical physicochemical properties. This study presents an integrated approach using Graph Neural Networks (GNNs) to predict three crucial properties for chemical safety assessment: Heat of Combustion (HoC), Vapor Pressure (VP), and Flashpoint. Leveraging comprehensive datasets of 4780, 3573, and 14,696 compounds respectively, we developed a unified prediction model that outperforms existing approaches. Our model achieves mean absolute errors of 126 J/mol (R2 = 0.993) for HoC, 0.617 log units (R2 = 0.898) for VP, and 14.42 degrees C (R2 = 0.839) for Flashpoint, representing notable improvements over conventional methods. Through detailed analysis, we identified and addressed a specific challenge in predicting HoC for cyclic compounds by implementing a hybrid approach combining DFT calculations and Random Forest modeling. This specialized treatment expanded our cyclic compound dataset from 12 to 55 compounds and achieved an R2 of 0.918 for these traditionally challenging structures. The model was integrated into a real-time prediction system using Flask, allowing users to input chemical structures through SMILES notation or direct drawing. The system includes features for comparing predictions with experimental data and benchmarking against common industrial chemicals (acetone, n-hexane, and n-decane), enhancing its practical utility in emergency response scenarios. Our approach provides a robust, unified solution for predicting multiple safety-critical properties simultaneously, addressing a crucial need in chemical safety assessment and emergency response planning.Scientific contributionOverall, this study provides an integrated framework that deploys three GNN-based prediction models within a common architecture and a real-time prediction system. For cyclic compounds, which exhibit systematic prediction challenges under the GNN framework, we incorporate a targeted alternative modeling strategy to improve predictive reliability, thereby enhancing the practical applicability of machine-learning approaches to chemical safety assessment.
- Keywords
- LIQUID; CORRELATED MOLECULAR CALCULATIONS; GAUSSIAN-BASIS SETS; SOLID-PHASE HEATS; VAPOR-PRESSURE; APPROXIMATION; BENCHMARK; Density functional theory (DFT); Real-time prediction system; Data augmentation; Graph neural networks (GNN); Chemical safety prediction
- ISSN
- 1758-2946
- URI
- https://pubs.kist.re.kr/handle/201004/154374
- DOI
- 10.1186/s13321-026-01151-3
- Appears in Collections:
- KIST Article > 2026
- Export
- RIS (EndNote)
- XLS (Excel)
- XML
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.