Display options
Share it on

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa095.

The impact of compound library size on the performance of scoring functions for structure-based virtual screening.

Briefings in bioinformatics

Louison Fresnais, Pedro J Ballester

PMID: 32568385 DOI: 10.1093/bib/bbaa095

Abstract

Larger training datasets have been shown to improve the accuracy of machine learning (ML)-based scoring functions (SFs) for structure-based virtual screening (SBVS). In addition, massive test sets for SBVS, known as ultra-large compound libraries, have been demonstrated to enable the fast discovery of selective drug leads with low-nanomolar potency. This proof-of-concept was carried out on two targets using a single docking tool along with its SF. It is thus unclear whether this high level of performance would generalise to other targets, docking tools and SFs. We found that screening a larger compound library results in more potent actives being identified in all six additional targets using a different docking tool along with its classical SF. Furthermore, we established that a way to improve the potency of the retrieved molecules further is to rank them with more accurate ML-based SFs (we found this to be true in four of the six targets; the difference was not significant in the remaining two targets). A 3-fold increase in average hit rate across targets was also achieved by the ML-based SFs. Lastly, we observed that classical and ML-based SFs often find different actives, which supports using both types of SFs on those targets.

© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected].

Keywords: big data; docking; drug design; machine learning; virtual screening

Substances

MeSH terms

Publication Types