Docking Factor-Xa ligands with Lead Finder

Abstract

Lead Finder1 is a protein-ligand docking tool for the virtual screening of molecules and quantitative evaluation of interactions between protein and ligands. In this case study, two different Lead Finder docking modes (standard and extra precision) were used in docking studies on a small number of Factor-Xa (FXa) protein-ligand complexes originally used in the CSAR 2014 benchmark exercise2. Results show the robustness of Lead Finder at finding the bioactive conformation of the ligands, when starting from a random conformation. In addition, it shows that the standard docking mode and the extra-precision mode work well at docking ligands and the later gives tighter dockings and may highlight ligands with lower activity and that do not fit into the active site.

Introduction

Lead Finder is a docking tool from BioMolTech3 which generates docked ligand poses starting from the 3D structure of a protein (either experimentally derived by X-ray, or modeled by homology) and one or more 3D ligand structures. Lead Finder assumes that the protein is rigid, and analyses the possible conformations of the ligand by rotating functional groups along each freely rotatable bond.

FXa has been the target of drug discovery efforts at many pharmaceutical companies, where structure-based design has been used extensively. For this reason, FXa has been frequently used to benchmark new methodologies in structure-based design.

In this case study, we sought to replicate the typical experiments performed with docking engines during the lead optimization phase of small molecule discovery. We used, and compared, two different Lead Finder docking

modes (standard and extra precision) in two separate experiments. First we carried out self-docking studies on a small number of FXa protein-ligand complexes originally used in the CSAR 2014 benchmark exercise. Secondly we applied the two docking modes to a set of 45 related compounds, again taken from the CSAR 2014 dataset.

Lead Finder docking workflow

The ideal docking process with Lead Finder (Stage 1 in Figure 1) starts with an accurate protein preparation with BioMolTech’s Build Model.4 This includes:

  • addition of hydrogens to the heavy atoms of the protein, and assignment of optimal ionisation states of protein residues;
  • optimization of the spatial positions of polar hydrogen atoms to maximize hydrogen bond interactions and minimize steric strain;
  • optimization of side chain orientations of His, Asn and Gln residues for which X-ray analysis can return flipped orientations due to apparent symmetry.


Figure 1. A typical Lead Finder workflow.

Build Model uses an original graph-theoretical approach5 to assign optimal ionization states of protein residues at arbitrary pH conditions, which is based on the Screened Coulomb Potential (SCP) model.5,6

After completing protein preparation, an energy grid map (Stage 2) is calculated and saved for the protein binding site. This energy map is then used to dock the ligand structures.

The docking engine in Lead Finder (Stage 3) combines a genetic algorithm search with local optimization procedures, which make Lead Finder efficient in coarse sampling of ligands poses and following refinement of promising solutions.

The standard docking mode provides an accurate and exhaustive search algorithm. However, in extra-precision mode, Lead Finder uses the most rigorous sampling and scoring algorithms to increase accuracy and reliability of predictions at the cost of slightly slower speed of processing.

Scoring functions1 in Lead Finder (Stage 4) are based on a semi-empiric molecular mechanical functional that explicitly accounts for various types of molecular interactions. Individual energy contributions are scaled with empiric coefficients to produce three scoring functions tailored for:

  • correct energy-ranking of docked ligand poses (Rank-score);
  • correct rank-ordering of active and inactive compounds in virtual screening experiments (VS-score);
  • binding energy predictions (dG-score).

In this study we concentrated on the poses that were generated and hence were focused on the Rank-score function.

Method

Initially we used a crystal structure of the FXa protein in complex with compound GCT000006 (GCT, PDB ID: 4ZH8). As can be seen in Figure 2, the 6-chloronapth-2-yl group of GCT binds into the S1 primary specificity pocket, while the morpholino group occupies the aromatic box (Tyr99, Try215, Phe174) of the S4 pocket.

 

Figure 2. Structure of the FXa protein in complex with the GCT ligand.

The protein was prepared with the default options of Build Model, in which the ligand is removed from the active site and the water molecules are retained. The coordinates of the ligand were then used to define the bounding box for the calculation of the energy grid maps.

Self-docking experiment

We started by re-docking GCT to the 4Z­H8 crystal structure to address the ability of Lead Finder of correctly identifying its bioactive conformation. In order to avoid bias in the self-docking experiment, the 3D conformation of GCT was flattened to 2D and then converted back into 3D using Cresset’s XedConvert7. A minimization with Cresset’s XedMin7 was followed to relax the ligand to a local minimum. The GCT ligand was then docked to the protein PDB 4ZH8 using the standard docking mode and the extra-precision docking mode.

Protein-ligand docking

A sub-set of 45 small molecules from the CSAR  2014 dataset with known activity against FXa (Table 1) were docked to the crystal structure 4ZH8.

Most of these ligands have in common a chlorinated mono or polyaromatic group and a morpholino group. All ligands were converted into 2D and then back to 3D with XedConvert and subsequently minimized with XedMin. The crystallographic ligand was used to define the bounding box for the energy grid maps. The 45 ligands were then docked to the protein using the standard docking mode and the extra-precision mode.


Table 1. Representative structures for 45 ligands  used in the docking study.
 


Figure 3. Lead Finder self-docking experiment on 4ZH8 using the standard (top row) and extra precision (bottom row) docking modes. The RMSD (in Å) between the docked pose (thick sticks) and the X-ray coordinates of GCT (thin sticks) is reported for each pose.

Results

Self-docking

When using the standard docking mode and the extra-precision mode, Lead Finder outputs up to 10 best poses (if available) ranked in order of increasing rank score.

Figure 3 shows the five top ranking poses of GCT obtained using the Lead Finder standard (top row) and extra precision (bottom row) docking modes. The poses are ordered from the best ranking (left) to the worst ranking (right).

As can be seen in this picture, the five top ranking poses for both standard docking mode and extra-precision mode are closely aligned to the X-ray conformation of the ligand, correctly orienting the naphthalene ring of GCT into the S1 binding pocket.

In terms of RMSD deviation, values obtained are similar with the two different modes with each method able to find a solution within 2A RMSD of the x-ray pose in the top 5 results. However, the extra precision mode finds this result at position 2 rather than 4 and the pose is very close to the xray-ligand (RMSD 1.44) with a single R-group oriented differently. A small but potentially significant improvement.

Figure 4 shows the self-docking of other FXa proteins (4ZHA, 4Y7A and 4Y79) performed with the standard docking mode and with the extra-precision mode.

For these less flexible ligands, the extra-precision mode seems to have little effect on the RMSD of the results. Both modes are again able to identify the correct orientation of the ligand in the FXa active site with a RMSD that is within 2A for the top scoring pose..

Figure 4. Self-docking experiment on 4ZHA, 4Y7A and 4Y79 using Lead Finder’s extra-precision mode. The RMSD (in Å) between the top-scoring docked pose (thick sticks) and the X-ray coordinates of the native ligand (thin sticks) is reported for each pose.

Protein-ligand-docking

Figure 5 shows a side-by-side comparison of the superimposed top-ranking poses for the 45 FXa ligands docked into the 4ZH8 protein using standard (left) and extra precision (right) docking modes.

For the standard mode, the majority of ligands are docked with the naphthalene ring correctly pointing down into the S1 binding site. However, one ligand is not docked as expected, with the pyrrolidine group pointing to the outside of the protein (GCT98A). Interestingly, this compound is the one with the lowest activity (pIC50 6.2) in the dataset.

When using the extra-precision mode, the docked poses in general look tidier, even though two ligands docked with the naphthalene group pointing outside of the S1 pocket: one is again GCT98A, and the other is GCT44A, the compound in the dataset with the second lowest activity (pIC50 = 6.4). These findings seem to indicate that Lead Finder may be able to provide useful suggestions for discriminating between active and non-active compounds.

Figure 5. Docking FXa ligands to 4ZH8 using the Lead Finder’s standard (left) and extra precision (right) docking modes.

Conclusion

This case study shows a typical Lead Finder docking workflow and demonstrates the robustness of the program by means of several self-docking experiments. Results show that Lead Finder does a good job at finding the bioactive conformation of flexible ligands, when started from a random conformation. In addition, we explored two docking modes (the standard and extra-

precision) to dock a sub-set of FXa ligands from CSAR 2014. While both methods seem to work well at generating sensibly aligned poses, the extra-precision mode provides tighter dockings and may be able to highlight ligands with lower activity which may not fit into the active site.

References

  1. Stroganov et al., Lead Finder: An Approach to Improve Accuracy of Protein-Ligand Docking, Binding Energy Estimation, and Virtual Screening, Chem. Inf. Model. 2008; 48, 2371-2385.
  2. Carlson et al., CSAR 2014: A Benchmark Exercise Using Unpublished Data from Pharma, J. Chem. Inf. Model. 2016; 56, 1063-1077.
  3. http://www.biomoltech.com/
  4. Stroganov et al., TSAR, a new graph-theoretical approach to computational modeling of protein side-chain flexibility: Modeling of ionization properties of proteins, Proteins, 2011; 79, 2693-2710.
  5. L. Mehler, Self-Consistent, Free Energy Based Approximation To Calculate pH Dependent Electrostatic Effects in Proteins, J. Phys. Chem. 1996; 100, 16006-16018.
  6. L. Mehler and F. Guarnieri, A Self-Consistent, Microenvironment Modulated Screened Coulomb Potential Approximation to Calculate pH-Dependent Electrostatic Effects in Proteins, Biophysical Journal, 1999, 75, 3–22.
  7. http://www.cresset-group.com/products/xedtools/

What’s great about Lead Finder?

We recently announced our collaboration with BioMolTech, a small modeling software company best known for their docking software, Lead Finder. Cresset has been traditionally focused on ligand-based design, but as we expand our capabilities into more structure-based methods we realized that we would have to supply a robust and accurate docking method to our customers. So, why did we choose Lead Finder?

docking_expt_setup

A graphical interface to Lead Finder will be included on our new structure-based design application.

The requirements for a great docking engine are simple to state: it needs to be fast and it needs to be accurate. The latter is by far the most important: nobody cares how quickly you got the answer if it is wrong! Our first question when evaluating docking methods was therefore to ask how good it was. This is actually a difficult question to ask, as there are several different definitions of ‘good’ depending on what you want: virtual screening enrichment? Good pose prediction? Accurate ranking of active molecules?

The first of these, virtual screening, is what most people think of when they think of docking success. Lead Finder has been validated on a wide variety of target classes and shows excellent enrichment rates (median ROC value across 34 protein targets was 0.94), even on targets traditionally seen as very hard such as PPAR-γ. The performance on kinases was uniformly excellent, with ROC values ranging from 0.86 for fibroblast growth factor receptor kinase (FGFR) to 0.96 for tyrosine kinase c-Src.

docked_syk_ligands

A series of SYK ligands docked to PDB 4yjq with crystal ligand shown in purple.

Pose prediction is of more interest to those working in the lead optimization phase, where assessing the likely bound conformation of a newly-proposed structure can be very helpful in guiding design. Here, too, Lead Finder performs well. On the widely-used Astex Diverse Set, used to test docking performance, Lead Finder produces the correct pose as the top-scoring result 82% of the time, which is comparable to other state-of-the-art methods (Gold, for example, gets 81% on the same measure). On a number of literature data sets testing self-docking performance Lead Finder finds the correct pose between 81 and 96% of the time, which is excellent.

leadfinder_command_lines

Lead Finder includes dedicated modes for extra-precision and virtual screening experiments.

One of the most intriguing things about Lead Finder is the makeup of the scoring functions. In contrast to many other scoring functions which use heuristic or knowledge-based potentials, the Lead Finder scoring functions comprise a set of physics-based potentials describing electrostatics, hydrogen bonding, desolvation energy, entropic losses on binding and so on. Different scoring functions can be obtained by weighting these contributions differently: BioMolTech have found that the optimal weights for pose prediction differ slightly from those for energy prediction, for example. A separate scoring function has been developed which aims to compute a ΔG of binding given a correct pose. This is a difficult task, and the success of the Lead Finder function was demonstrated in the in the 2010 CSAR blind challenge, where the binding energy of 343 protein-ligand complexes had to be predicted ab initio. Lead Finder was the best-performing docking method in that challenge. BioMolTech are actively building on this excellent result with the aim of making robust and reliable activity predictions a standard outcome of a Lead Finder experiment.

Cresset are proud to be the worldwide distributors for Lead Finder. It is available today as a command-line application and will be built into Cresset’s upcoming structure-based drug design workbench.

Request an evaluation of Lead Finder.

What’s in the CDS virtual screening toolbox?

Cresset is very well known for providing fast and accurate ligand-based virtual screening through Blaze. We have now added the Lead Finder docking engine to our virtual screening toolbox, giving Cresset Discovery Services (CDS) the most comprehensive virtual screening capabilities available anywhere in the industry.

Based on an informal survey of our contacts and customers, I estimate that something like 50% of all current pharma SME projects are ‘structure enabled’. Lead discovery and lead optimization are driven through the use of in-house structures, public structures (typically from the PDB) and homology models. These structures inform lead optimization programs by explaining observed SAR and providing feedback and a detailed context for the design of further analogues.

CDS routinely uses the Cresset software Blaze for ligand-based virtual screening. Although we had access to structure-based methods, we are pleased to have brought Lead Finder in-house, giving us full capability in conducting ligand-protein docking.

Ligand-based virtual screening with Blaze

Virtual screening with Blaze remains one of the most consistently requested projects for CDS. What makes Blaze extremely useful for our customers is:

  • Virtual screening is probably the only way to really sample adequate chemical diversity
  • Virtual screens are far more cost effective than wet HTS
  • Excellent enrichments can be achieved
  • The chemotype diversity in the output is second to none.

Blaze also relies on two very simple premises:

  1. A bioactive conformation encodes, in its shape and electrostatic field, both the properties, recognition features and solvation pattern optimised for interaction with its protein target site.
  2. A molecule conformation with increasing ‘shape and field’ similarity to that bioactive conformation has an increasing probability of also being active.

So, the key determinants of real activity obtained from hit lists (other than was this truly the ‘bioactive conformation’?) is often just how relevant and what distribution that hit conformation has in the population. This is fundamentally why our ligand-centric screening invariably works extremely well. Given that a molecule can adopt a similar shape, and project the same electrostatic patterns, from a completely different chemical architecture, leads to a very diverse output.

Structure-based virtual screening with Lead Finder

The Lead Finder software has been developed to provide cutting-edge docking for an array of typical tasks, from high-throughput virtual screening to best-in-class prediction of bioactive conformations to accurate prediction of binding energies. In combination with the companion Build Model protein preparation tool, Lead Finder has been shown to match or outperform the historically leading docking solutions.

When preparing ligands for virtual screening in Blaze, CDS scientists use modeling to help define the best ‘hand-crafted’ estimate of a bioactive conformation, based on the widest data for any given system. We apply the same care to exploring and preparing protein targets prior to structure-based virtual screens. We take advantage of three main approaches. Firstly, Lead Finder includes the excellent Build Model protein preparation tool. Secondly, we are privileged to be able to model proteins and ligands using the same proprietary XED force field used to give the accurate electrostatics that all Cresset software is based on. Finally, at CDS we have access to the latest Cresset software that is still under development. This gives us capability to provide protein electrostatic field maps and water analysis, providing a very reliable starting position for structure-based virtual screening.

vs_2bsm3

Lead Finder uses a stochastic ligand sampling workflow, with conformations generated on-the-fly, and a genetic algorithm for processing these into pools of the best docking poses. Multiple interaction grids are generated from the protein target and combined to define a scoring system for poses. More importantly, the scoring method has been shown to outperform some of the more conventional docking engines currently available commercially.

Structure-based or ligand-based?

What are the advantages of having structure-based and ligand-based virtual screening?  And how do we choose which is the best approach for a project?

Ligand-based virtual screening is less computationally intensive, making it a preferred option when there is a known ligand available. An average protein of 400 amino acids has over 20,000 heavy atoms and 9,600 bonds and in excess of 50 charges, making it a more challenging system to model.

However, even when there is a known ligand there are some situations when a ligand-based virtual screening is not viable, such as when the known ligand does not exploit all the interactions available in an active site or when a protein has an unattractive orthosteric site and attractive allosteric sites with no known ligands. In these cases, we prefer to use a structure-based method.

In the case of protein-protein interaction sites and protein-DNA/RNA sites, Blaze can take DNA and protein fragments as a template in place of a ligand. However, it is useful to have a structure-based approach available for comparison.

In fact, we often find it useful to combine different virtual screening techniques. In lead discovery, one of the key requirements for virtual screening is to maximise the diversity of hits returned.  All virtual screening techniques, be they ligand-based or structure-based, are probabilistic techniques in that they may be used to increase the likelihood of getting hits from a wet screen. No technique guarantees to give absolute binding energies (at least not in the context of virtual screening on any realistic size of screening library), but they do give good rank ordering of compounds and can, therefore, be used as a means of selection and prioritisation.

Ligand-based techniques, whether 2D or 3D, are algorithmically distinct from structure-based techniques such as docking and, therefore, give different rankings to compounds. Different approaches return different hits and the results can be combined into an enriched final list.

Combining the results of structure-based and ligand-based techniques provides further diversity, leading to better hit rates and more interesting hits.

A one-stop shop for virtual screening

Through combining the strengths of Blaze in the ligand-based world with Lead Finder for docking, CDS now has the most comprehensive virtual screening capabilities available anywhere in the industry. Both Blaze and Lead Finder are available to purchase as software or as a service through CDS. CDS is truly now a one stop shop for virtual screening and indeed very much more.

Download a free evaluation of Lead Finder or access the Blaze demo server.

Cresset to distribute BioMolTech protein preparation and docking software worldwide

Cambridge, UK – 15th June 2016 – Cresset, innovative provider of software and contract research services for small molecule discovery and design, is pleased to announce partnership with BioMolTech for the exclusive global distribution and support of BioMolTech’s Lead Finder docking software.

Lead Finder is a software solution for virtual screening of candidate molecules and quantitative evaluation of interactions between protein and ligand molecules through docking. Lead Finder ranks ligands by their predicted biological activity, determines 3D structures of protein-ligand complexes and estimates the energy of ligand binding. Lead Finder’s docking algorithm enables fast processing of large collections of compounds to guide the development of focused libraries with high enrichment of active compounds.

Lead Finder satisfies the needs of computational chemists and medicinal chemists involved in the discovery process, pharmacologists and toxicologists involved in the modeling and evaluation of ADMET properties in silico, and biochemists and enzymologists working on enzyme specificity and rational enzyme design. Lead Finder can be used to predict free energy of protein-ligand binding with high accuracy and prepare protein structures for independent docking experiments.

“We are delighted to be representing BioMolTech worldwide,” says Dr Rob Scoffin, CEO at Cresset. “They have an excellent scientific reputation and their products support a broad range of applications in many life science areas. We will offer Lead Finder alongside our existing software range and be integrating it into our upcoming structure-based design application.”

“BioMolTech have chosen Cresset as their distributor because we believe that Cresset is a scientifically competent, ambitious and reliable partner for representing our software worldwide” says Val Kulkov, CEO at BioMolTech. “Docking of small molecules to protein structures and evaluation of parameters of such molecular interaction is an important technique that provides valuable insight to researchers in many life science areas.”

END

Contacts

Cresset: Sue Peffer, Marketing Manager +44 (0)1223 858890
BioMolTech: Prof. Val Kulkov, CEO +1 (416) 238-1263

About Cresset

Cresset’s software and consulting services are used by chemists from the world’s leading research organizations. Our patented methods deliver novel, realistic results for discovering, designing and optimizing the best small molecules in industry sectors including: agrochemicals, fine chemicals, flavor, fragrance, petrochemicals and pharmaceuticals.

About BioMolTech

BioMolTech aims at raising the bar in accuracy and efficiency in virtual screening and molecular docking technologies. Lead Finder, BioMolTech’s core product, is used by leading research institutions to guide the development of focused libraries with high enrichment of physiologically active compounds. We believe the continuous advancement of computer-assisted drug design technology will uncover new and innovative opportunities in life science areas.