Display options
Share it on

Chem Sci. 2018 Jun 22;9(28):6091-6098. doi: 10.1039/c8sc02339e. eCollection 2018 Jul 28.

"Found in Translation": predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models.

Chemical science

Philippe Schwaller, Théophile Gaudin, Dávid Lányi, Costas Bekas, Teodoro Laino

Affiliations

  1. IBM Research , Zurich , Switzerland . Email: {phs,tga,dla,bek,teo}@zurich.ibm.com.

PMID: 30090297 PMCID: PMC6053976 DOI: 10.1039/c8sc02339e

Abstract

There is an intuitive analogy of an organic chemist's understanding of a compound and a language speaker's understanding of a word. Based on this analogy, it is possible to introduce the basic concepts and analyze potential impacts of linguistic analysis to the world of organic chemistry. In this work, we cast the reaction prediction task as a translation problem by introducing a template-free sequence-to-sequence model, trained end-to-end and fully data-driven. We propose a tokenization, which is arbitrarily extensible with reaction information. Using an attention-based model borrowed from human language translation, we improve the state-of-the-art solutions in reaction prediction on the top-1 accuracy by achieving 80.3% without relying on auxiliary knowledge, such as reaction templates or explicit atomic features. Also, a top-1 accuracy of 65.4% is reached on a larger and noisier dataset.

References

  1. Chemistry. 2017 May 2;23(25):6118-6128 - PubMed
  2. ACS Cent Sci. 2016 Oct 26;2(10):725-732 - PubMed
  3. ACS Cent Sci. 2017 Oct 25;3(10):1103-1113 - PubMed
  4. Mol Inform. 2014 Jun;33(6-7):469-76 - PubMed
  5. ACS Cent Sci. 2017 May 24;3(5):434-443 - PubMed
  6. J Chem Inf Model. 2012 Oct 22;52(10):2526-40 - PubMed
  7. Drug Discov Today. 2018 Jun;23(6):1203-1218 - PubMed
  8. Science. 1969 Oct 10;166(3902):178-92 - PubMed
  9. Chemistry. 2017 May 2;23(25):5966-5971 - PubMed
  10. J Med Chem. 2016 May 12;59(9):4385-402 - PubMed
  11. ACS Cent Sci. 2018 Jan 24;4(1):120-131 - PubMed
  12. Neural Comput. 1997 Nov 15;9(8):1735-80 - PubMed
  13. J Chem Inf Model. 2016 Dec 27;56(12):2336-2346 - PubMed
  14. Angew Chem Int Ed Engl. 2014 Jul 28;53(31):8108-12 - PubMed
  15. Neural Netw. 2005 Jun-Jul;18(5-6):602-10 - PubMed
  16. J Chem Inf Model. 2015 Jan 26;55(1):39-53 - PubMed
  17. ACS Cent Sci. 2018 Feb 28;4(2):268-276 - PubMed
  18. J Org Chem. 2013 Mar 1;78(5):2118-27 - PubMed

Publication Types