Display options
Share it on

Mol Inform. 2016 Feb;35(2):62-9. doi: 10.1002/minf.201500113. Epub 2015 Nov 24.

Machine Learning Estimation of Atom Condensed Fukui Functions.

Molecular informatics

Qingyou Zhang, Fangfang Zheng, Tanfeng Zhao, Xiaohui Qu, João Aires-de-Sousa

Affiliations

  1. Institute of Environmental and Analytical Sciences, College of Chemistry and Chemical Engineering, Henan University, Kaifeng, 475004, PR China.
  2. Environmental Energy Technology Division, Lawrence Berkeley National Laboratory, Berkeley, USA.
  3. LAQV-REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal phone/fax: +351?21?2948300.
  4. LAQV-REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, 2829-516 Caparica, Portugal phone/fax: +351?21?2948300. [email protected].

PMID: 27491791 DOI: 10.1002/minf.201500113

Abstract

To enable the fast estimation of atom condensed Fukui functions, machine learning algorithms were trained with databases of DFT pre-calculated values for ca. 23,000 atoms in organic molecules. The problem was approached as the ranking of atom types with the Bradley-Terry (BT) model, and as the regression of the Fukui function. Random Forests (RF) were trained to predict the condensed Fukui function, to rank atoms in a molecule, and to classify atoms as high/low Fukui function. Atomic descriptors were based on counts of atom types in spheres around the kernel atom. The BT coefficients assigned to atom types enabled the identification (93-94 % accuracy) of the atom with the highest Fukui function in pairs of atoms in the same molecule with differences ≥0.1. In whole molecules, the atom with the top Fukui function could be recognized in ca. 50 % of the cases and, on the average, about 3 of the top 4 atoms could be recognized in a shortlist of 4. Regression RF yielded predictions for test sets with R(2) =0.68-0.69, improving the ability of BT coefficients to rank atoms in a molecule. Atom classification (as high/low Fukui function) was obtained with RF with sensitivity of 55-61 % and specificity of 94-95 %.

© 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Keywords: Bradley-Terry Models; Chemoinformatics; QSPR; Quantum Chemistry; Random Forest

MeSH terms

Publication Types