Display options
Share it on

J Cheminform. 2016 Feb 25;8:10. doi: 10.1186/s13321-016-0122-x. eCollection 2016.

Examining the predictive accuracy of the novel 3D N-linear algebraic molecular codifications on benchmark datasets.

Journal of cheminformatics

César R García-Jacas, Ernesto Contreras-Torres, Yovani Marrero-Ponce, Mario Pupo-Meriño, Stephen J Barigye, Lisset Cabrera-Leyva

Affiliations

  1. Escuela de Sistemas y Computación, Pontificia Universidad Católica del Ecuador Sede Esmeraldas (PUCESE), Esmeraldas, Ecuador ; Grupo de Investigación de Bioinformática, Centro de Estudio de Matemática Computacional (CEMC), Universidad de las Ciencias Informáticas, La Habana, Cuba ; Computer-Aided Molecular "Biosilico" Discovery and Bioinformatics Research International Network (CAMD-BIR IN), Cumbayá, Quito, Ecuador ; Instituto de Simulación Computacional (ISC-USFQ), Universidad San Francisco de Quito (USFQ), Diego de Robles y vía Interoceánica, 17-1200-841 Quito, Ecuador.
  2. Departamento de Técnicas de Programación, Facultad 6, Universidad de las Ciencias Informáticas, La Habana, Cuba.
  3. Computer-Aided Molecular "Biosilico" Discovery and Bioinformatics Research International Network (CAMD-BIR IN), Cumbayá, Quito, Ecuador ; Instituto de Simulación Computacional (ISC-USFQ), Universidad San Francisco de Quito (USFQ), Diego de Robles y vía Interoceánica, 17-1200-841 Quito, Ecuador ; Escuela de Medicina, Colegio de Ciencias de la Salud, Edificio de Especialidades Médicas, Hospital de los Valles, Universidad San Francisco de Quito (USFQ), Av. Interoceánica Km 12 ½ - Cumbayá, Quito, Ecuador.
  4. Grupo de Investigación de Bioinformática, Centro de Estudio de Matemática Computacional (CEMC), Universidad de las Ciencias Informáticas, La Habana, Cuba.
  5. Computer-Aided Molecular "Biosilico" Discovery and Bioinformatics Research International Network (CAMD-BIR IN), Cumbayá, Quito, Ecuador ; Departamento de Química, Universidade Federal de Lavras, UFLA, Caixa Postal 3037, Lavras, MG 37200-000 Brazil.
  6. Escuela de Sistemas y Computación, Pontificia Universidad Católica del Ecuador Sede Esmeraldas (PUCESE), Esmeraldas, Ecuador ; Grupo de Investigación de Inteligencia Artificial (AIRES), Facultad de Informática, Universidad de Camagüey, Camagüey, Cuba.

PMID: 26925168 PMCID: PMC4768433 DOI: 10.1186/s13321-016-0122-x

Abstract

BACKGROUND: Recently, novel 3D alignment-free molecular descriptors (also known as QuBiLS-MIDAS) based on two-linear, three-linear and four-linear algebraic forms have been introduced. These descriptors codify chemical information for relations between two, three and four atoms by using several (dis-)similarity metrics and multi-metrics. Several studies aimed at assessing the quality of these novel descriptors have been performed. However, a deeper analysis of their performance is necessary. Therefore, in the present manuscript an assessment and statistical validation of the performance of these novel descriptors in QSAR studies is performed.

RESULTS: To this end, eight molecular datasets (angiotensin converting enzyme, acetylcholinesterase inhibitors, benzodiazepine receptor, cyclooxygenase-2 inhibitors, dihydrofolate reductase inhibitors, glycogen phosphorylase b, thermolysin inhibitors, thrombin inhibitors) widely used as benchmarks in the evaluation of several procedures are utilized. Three to nine variable QSAR models based on Multiple Linear Regression are built for each chemical dataset according to the original division into training/test sets. Comparisons with respect to leave-one-out cross-validation correlation coefficients[Formula: see text] reveal that the models based on QuBiLS-MIDAS indices possess superior predictive ability in 7 of the 8 datasets analyzed, outperforming methodologies based on similar or more complex techniques such as: Partial Least Square, Neural Networks, Support Vector Machine and others. On the other hand, superior external correlation coefficients[Formula: see text] are attained in 6 of the 8 test sets considered, confirming the good predictive power of the obtained models. For the [Formula: see text] values non-parametric statistic tests were performed, which demonstrated that the models based on QuBiLS-MIDAS indices have the best global performance and yield significantly better predictions in 11 of the 12 QSAR procedures used in the comparison. Lastly, a study concerning to the performance of the indices according to several conformer generation methods was performed. This demonstrated that the quality of predictions of the QSAR models based on QuBiLS-MIDAS indices depend on 3D structure generation method considered, although in this preliminary study the results achieved do not present significant statistical differences among them.

CONCLUSIONS: As conclusions it can be stated that the QuBiLS-MIDAS indices are suitable for extracting structural information of the molecules and thus, constitute a promissory alternative to build models that contribute to the prediction of pharmacokinetic, pharmacodynamics and toxicological properties on novel compounds.Graphical abstractComparative graphical representation of the performance of the novel QuBiLS-MIDAS 3D-MDs with respect to other methodologies in QSAR modeling of eight chemical datasets.

Keywords: 3D-QSAR; Multiple Linear Regression; QuBiLS-MIDAS; TOMOCOMD-CARDD

References

  1. J Comput Aided Mol Des. 1999 May;13(3):271-96 - PubMed
  2. J Chem Inf Comput Sci. 1999 Sep-Oct;39(5):861-7 - PubMed
  3. J Chem Inf Comput Sci. 2000 May;40(3):796-800 - PubMed
  4. J Mol Graph Model. 2002 Mar;20(5):373-88 - PubMed
  5. J Chem Inf Comput Sci. 2002 May-Jun;42(3):682-92 - PubMed
  6. J Chem Inf Comput Sci. 2003 Mar-Apr;43(2):493-500 - PubMed
  7. J Med Chem. 2004 Oct 21;47(22):5541-54 - PubMed
  8. J Chem Inf Model. 2007 Nov-Dec;47(6):2462-74 - PubMed
  9. J Chem Inf Model. 2008 Feb;48(2):409-25 - PubMed
  10. J Chem Inf Model. 2008 Jun;48(6):1167-73 - PubMed
  11. J Chem Inf Model. 2008 Jul;48(7):1337-44 - PubMed
  12. Nucleic Acids Res. 2010 Jul;38(Web Server issue):W622-7 - PubMed
  13. J Cheminform. 2011 Jan 10;3(1):3 - PubMed
  14. J Comput Chem. 2011 May;32(7):1466-74 - PubMed
  15. J Cheminform. 2011 Oct 07;3:33 - PubMed
  16. J Chem Inf Model. 2012 Feb 27;52(2):302-7 - PubMed
  17. J Am Chem Soc. 1988 Aug 1;110(18):5959-67 - PubMed
  18. J Chem Inf Model. 2012 May 25;52(5):1146-58 - PubMed
  19. J Chem Inf Model. 2012 Aug 27;52(8):1984-93 - PubMed
  20. J Chem Inf Model. 2012 Aug 27;52(8):2157-64 - PubMed
  21. J Comput Chem. 2013 Feb 5;34(4):259-74 - PubMed
  22. J Comput Aided Mol Des. 2012 Nov;26(11):1229-46 - PubMed
  23. Curr Comput Aided Drug Des. 2013 Jun;9(2):164-83 - PubMed
  24. J Comput Chem. 2014 Jul 5;35(18):1395-409 - PubMed
  25. Curr Drug Metab. 2014;15(4):441-69 - PubMed
  26. Mol Divers. 2015 May;19(2):305-19 - PubMed
  27. Mol Inform. 2015 Jan;34(1):60-9 - PubMed
  28. Arzneimittelforschung. 1986 Feb;36(2):176-83 - PubMed
  29. J Med Chem. 1997 Dec 19;40(26):4360-71 - PubMed

Publication Types