DFT-machine learning approach for accurate prediction of pKa

Title
DFT-machine learning approach for accurate prediction of pKa
Authors
김진영Robin LawlerYao-Hao LiuNessa MajayaOmar Allam주현철장승순
Issue Date
2021-10
Publisher
The journal of physical chemistry. A, Molecules, spectroscopy, kinetics, environment & general theory
Citation
VOL 125-8722
Abstract
In this study, we propose a novel method of pKa prediction in a diverse set of acids, which combines density functional theory (DFT) method with machine learning (ML) methods. First, the DFT method with B3LYP/6-31++G**/SM8 is used to predict pKa, yielding a mean absolute error of 1.85 pKa units. Subsequently, such pKa values predicted from the DFT method are employed as one of 10 molecular descriptors for developing ML models trained on experimental data. Kernel Ridge Regression (KRR), Gaussian Process Regression, and Artificial Neural Network are optimized using three Pipelines: Pipeline 1 involving only hyperparameter optimization (HPO), Pipeline 2 involving HPO followed by a relative contribution analysis (RCA) and recursive feature elimination (RFE), and Pipeline 3 involving HPO followed by RCA and RFE on an expanded set of composite features. Finally, it is demonstrated that KRR with Pipeline 3 yields optimal pKa prediction at an MAE of 0.60 log units. This algorithm was then utilized to predict the pKa of 37 novel acids. The two most important features were determined to be the number of hydrogen atoms in the molecule and the degree of oxidation of the acid. The predicted pKa values were documented for future reference.
URI
http://pubs.kist.re.kr/handle/201004/74247
ISSN
1089-5639
Appears in Collections:
KIST Publication > Article
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE