Development and Application of Compound Class-Specific Benchmark Data Sets for Differentiated Assessment of Docking and Scoring Algorithm Performance
- This thesis focuses on the development and application of benchmark data sets for diverse compound classes and the differentiated assessment of docking and scoring algorithm performance using the curated sets. Various popular software, including AutoDock, AutoDock Vina, GOLD, MOE, FlexX and FITTED were assessed for two important types of compounds, which are summarized as follows.
In publication I, we investigated the fragment placement performance of molecular docking software AutoDock, AutoDock Vina, GOLD and FlexX. For this assessment we constructed LEADS-FRAG, a benchmark data set containing 93 high-quality protein-fragment complexes. GOLD with ChemPLP and AutoDock Vina performed best and generated near-native conformations (root mean square deviation <1.5 Å) for more than 50% of the data set considering the top-ranked docking pose. Taking into account all docking poses, the tested programs generated near-native conformations for up to 86% of the fragments. By rescoring with the GOLD scoring functions and PLIff, the number of near-native conformations increased up to 40% with respect to the top-rescored poses, showing that conventional small-molecule docking programs achieve a satisfactory fragment docking performance.
In manuscript 2, we examined covalently bound ligands and tested the efficiency of covalent docking options in the software programs AutoDock, GOLD, MOE and FITTED. We generated the LEADS-COV data set, containing 89 high-quality covalently bound protein-ligand complexes: 47 with a cysteine bound ligand and 42 with serine. For Cysteine GOLD with ChemPLP or ChemScore performed best and generated near-native conformations (root mean square deviation <1.5 Å) for more than 40% of the data set considering the top-ranked docking pose. Serine in comparison had better results, with over 65% top-ranked near native poses by GOLD with ChemPLP. Taking into account all generated poses values went up to over 65% for cysteine and over 80% for serine.