Last chance for early access to Flare, new structure-based design application

We are delighted to announce the release of Flare beta 2. This version has many enhancements suggested by users as part of the on-going beta test program and is available for evaluation from your account manager. This final round of beta testing will focus on fine tuning the operation of Flare – perfecting keyboard shortcuts, adding more quick access items and polishing dialogue boxes in the run up to launch. So you have an application that meets your needs, we are interested in hearing about where you think the application can be improved.

Significant improvements in beta 2

Group ligands together

Since the first beta test we have made a number of improvements both in response to your feedback and from our own experience. One of the most significant changes is an overhaul of the relationship between ligands in the ligand table and their parent protein. In Flare beta 2, each ligand has a parent protein that is set automatically and can be manually adjusted by simply double clicking the table cell. This enables ligands to be grouped together by chemistry, source, or parent protein making full use of the ‘Molecule roles’ feature.

Molecules in two roles within the ligand table with their Title, associated Protein, and Rank Score from docking.

Improved calculation dialogues

All the calculation dialogues have been significantly improved to enable parallel processing and more visual feedback on the extent of the calculations. Now, whenever you setup a calculation the 3D window will display relevant calculation boxes, from the size of an active site in a docking experiment to the clipping boxes for surface generation.

A 3D RISM calculation in preparation showing the cube in which the RISM waters will be placed (magenta) and the hydration shell that surrounds the calculation (green).

Greater display control

The contact detection and display algortithm have been overhauled to give significantly greater performance and to show only the contacts that you are interested in. Flare now gives control over the display of individual interaction types, whether to include waters, and the inclusion of intramolecular interactions (such as H-bonds within a protein).

Interactions for the ligand from PDB 5MTO.

Cloud ready and enabled with Cresset Engine Broker

Finally, significant work has been put into job parallelization, particularly for WaterSwap. Here we have rewritten our unique Engine Broker that enables client machines (be they Windows®, MacOS® or Linux®) to use remote or cloud based compute resources to super-power their calculations. Using the Cresset Engine Broker (CEB) starting a cloud based calculation could not be simpler:

  1. Set the location of the CEB in the preferences
  2. Set up the calculation
  3. Press ‘Start’.

The new CEB has a completely different architecture such that it now handles all communication. This is particularly useful when running on the cloud or other situations where the client machine knows nothing of, or cannot communicate with, the individual calculation nodes of the cluster. For WaterSwap we have modified the algorithm to make full use of cloud resources where the perfect situation is to have an infinitely wide calculation that completes in seconds. For a monte-carlo based simulation there is a limit to how wide we can make the calculation but we do not have to limit ourselves to a single process either. In Flare Beta 2 we have enabled an option to split the WaterSwap job into parallel chunks that utilize the highly parallel nature of cloud resources to run the same simulation upto 4 times faster.

WaterSwap result for a ligand bound to TNNI3K (PDB 4YFI) showing both the ligand bound and water bound protein results from a WaterSwap experiment.

Try it for yourself

Interested in Flare? Contact your account manager to join the Flare beta 2 program and gain early access to this cutting edge structure-based design method with intuitive GUI.

Sneak peek at Flare


As our new structure-based design application, Flare, nears release, I share some of the innovative features that will give you new insights into protein-ligand binding, and a sneak peek at the interface which is a mixture of a traditional Cresset application and something distinctly different.


A PERK ligand in the active site of pdb 4G31 with RISM waters, green = stable, red = unstable.
 

Easy ligand and protein navigation

Flare has been created with ligand design at its heart so you can easily navigate ligands and their proteins, comparing, contrasting and improving them. To do this the ‘Molecules’ table has been borrowed from Forge and Torch. The table holds ligands and their data, and has been enhanced with a separate table for proteins. Why two places for molecules? We felt that separating the two types of molecule has distinct advantages. First it enables you to store and display, next to each ligand, all the physico-chemical property data that chemistry designers need to assess designs for progression to synthesis. It enables separate, rapid control of which elements are displayed in the 3D window – for example, you can quickly create a grid and compare one ligand in many different proteins or many different ligands in one protein. Lastly, separating the ligands into their own table enables separation and navigation of ligands in a way that would otherwise not be possible.

To counter any lack of functionality in separating proteins and ligands, drag and drop between the tables has been enabled. To move a ligand into a protein, or separate it away, you simply drag the molecule from one table to the other. Equally, each ligand has a concept of its parent protein and hence it will be associated with the correct protein when viewing multiple ligand protein complexes.


Flare can be used to easily compare ligand-protein complexes. In this case all available A2A crystal structures were loaded into the application and ligands automatically split out.

Each ligand in Flare can be displayed in its associated protein in grid mode making comparisons between ligands or proteins straightforward.

Protein interaction potentials reveal the electrostatics that underlie ligand binding. In this case pdb code 4G31 (red = positive, blue = negative). Widgets can be undocked at any time and placed on additional monitors.

Powerful picking

Picking atoms, whether to change the display style, add a surface or perform a minimization is an amazingly frequent action in structure-based design. We wanted to make it as easy as possible, so common picking actions such as picking the active site or all ligand atoms are available directly from the ‘Home’ tab of the ribbon. However, this is just a small selection of the actions in Flare as they are enhanced through an extension, accessible from the ribbon, which gives a depth of functionality to Flare’s picking algorithms. For example using the extension you can pick atoms based on a SMARTS pattern, pick residues using a text query such as ‘ASN 83’, chains by name, residues by names or numbers, add or subtract to the existing pick or take the intersection. Using the enhanced picking widget you should be able to grab any atom within the application without needing to first find it in the 3D window.


Picking atoms is central to working with proteins. Flare provides common picking actions on the ribbon and gives an extended picking widget that enables complex queries.

Detailed logging

A key piece of feedback from alpha and beta test phases was that you wanted detailed logging. To get the right balance between finding the relevant information and seeing the detail of the step there is a hierarchy of logging. All top level events are recorded to a log window that you can choose to keep visible, move to the side or close as you prefer. At any time if you want the detail behind an operation then you can go to the log window and double click the relevant entry to see all the detail that underlies the operation in question.


Flare contains two levels of logging, a brief summary and detailed log text. Manual entries can be added at any time.

Flare contains two levels of logging, a brief summary and detailed log text (for RISM in this case). Manual entries can be added at any time.

Ribbon menu

Our intention is for Flare’s capabilities to grow significantly over time so we have built a GUI with room to expand the command structure without compromising usability. A key element is the choice of a ribbon interface instead of traditional menus; these provide a logical framework for commands with an easy search strategy to find the one that you need at that moment. We were always mindful to enable customization in the fullness of time and enable users to control their own work environment and the ribbon interface is the perfect environment for this. Our intention here is to avoid the nightmare growth of multiple, unexplained and unobvious icons suffered by many applications and classically described in the story of the microsoft ribbon.


Flare ribbon menus make actions always visible. Shown here with different application styles (Blue, White, Black).

Try it for yourself

Flare will be available for evaluation very soon. If you would like to test drive the novel interface, or apply one of the novel scientific methods to your project, please contact us to register your interest.

Blaze used in discovery of allosteric modulators of the high affinity choline transporter

A variety of neurological conditions can potentially be treated through the stimulation of cholinergic neurotransmission. The choline uptake into certain neurons is mediated by the choline transporter (CHT), which is well-characterized but otherwise unexplored as a potential drug target.

A team consisting of scientists from Pfizer, Neusentis, Nanion Technologies, and Kissei Pharmaceutical Company used two compound sets: (1) a specially created set of 887 molecules derived from the full Pfizer compound screening collection using Cresset’s virtual screening tool Blaze; (2) 2,753 molecules from the Pfizer Chemogenomic Library. From these sets they were able to identify nine active small molecules that modulate CHT.

This work will enable them to test the hypothesis that positive modulation of CHT will enhance activity-dependent cholinergic signaling. Read the full paper Discovery of Compounds that Positively Modulate the High Affinity Choline Transporter.

Using Blaze to develop a screening set from a corporate compound library

The team had identified two CHT modulators from the literature: one CHT positive allosteric modulator and one CHT negative allosteric modulator. Each of these was used within Blaze to search the full Pfizer compound screening collection for compounds with similar electrostatic and shape properties and therefore potentially similar biological activity.

The computational team kept the top 500 compounds from each virtual screen, based on the Blaze scoring function to form a set of 1000 compounds. This set was filtered based on compound availability and the removal of chemically unattractive groups, resulting in a test set of 887 compounds. This library was screened in assays, as detailed in the paper.

Identification of previously unknown active and structurally distinct molecules

Five compounds of interest were identified from the 887 test set created using Cresset’s Blaze. Three of these were confirmed as positive allosteric CHT modulators and two as negative allosteric modulators of CHT function. A further four compounds of interest were identified from the 2,753 molecules from the Pfizer Chemogenomic Library. The compounds of interest are shown in Table 2 ‘Tool compound data’ which forms part of the paper.

This paper demonstrates the high value of virtual screening in focusing a screening campaign. The team successfully identified previously unknown active and structurally distinct molecules that could be used as tools to further explore CHT biology or as a starting point for further medicinal chemistry.


Selected images from Blaze results with purported CHT modulator seed molecules (PAM MKC-351 and NAM ML-352) (green) shown on the left and output molecules 1-5 shown on the right (grey). Fields are shown with positive (red), negative (cyan), van der Waals (yellow), and hydrophobic (orange) regions.

Boosting RDKit molecular simulations through OpenMM

I am a big fan of the RDKit. In case you have never heard about it, it is an open-source, BSD-licensed C++ toolkit which allows one to accomplish a wide range of cheminformatics tasks. You can build a molecule from SMILES, create 2D depictions, generate 3D conformations, do substructure searches, run chemical reactions, and much more. It comes with C++, Python, Java and C# bindings, so you can access its functionality from your favourite programing language. It features tight integration with the Jupyter notebook, so you can display your molecules in 2D and 3D interactively while you develop your Python workflow. In case you are not much of a programer, a lot of the RDKit functionality is exposed through RDKit KNIME nodes. And, last but not least, the RDKit comes with a PostgreSQL cartridge which enables dealing with molecules in PostgreSQL databases.

Now you know why I love the RDKit, and I hope I managed to convince you to give it a go, if you haven’t already. There are a number of tutorials to get yourself started, and an amazing community of users which can help you out when you are stuck.

Cresset software incorporates the RDKit, which is mostly used to parse SMARTS queries: in Forge, Torch and Spark you may apply SMARTS filters to your molecule tables. In Spark you may also request certain SMARTS patterns to be, or not to be, included in the results; furthermore, the Torsion Library which analyses the torsional angles of the fragments retrieved by a Spark search is based on a database of SMARTS strings.

We also use the RDKit in our internal research projects, in Cresset Discovery Services, and occasionally to integrate or customize the functionality already available in Cresset desktop applications, command-line tools, and KNIME nodes.

Besides being RDKit users, we are also RDKit contributors. In 2015 we contributed a resonance structure enumerator, while at the 2016 RDKit User Group Meeting, which was hosted at the Novartis Campus in Basel, we presented some preliminary work on boosting RDKit molecular simulations through OpenMM.

OpenMM is an open-source toolkit for high-performance molecular simulations running on CPUs and GPUs. Originally developed in the Pande Lab at Stanford, it is currently supported also by other groups and individuals. OpenMM natively implements AMBER, CHARMM and AMOEBA force fields, which are focused on biological macromolecules, and provides support for implementing custom force fields. The RDKit natively implements MMFF94 and UFF force-fields. MMFF94 is a general-purpose, accurate force-field, while UFF is geared towards small molecules, and trades accuracy for wide chemistry coverage and speed. We thought that it would be interesting to:

  • implement MMFF94 in OpenMM
  • build an OpenMM interface into the RDKit, and
  • compare the performance of the native RDKit implementation of MMFF94 (CPU-only, single-threaded) with the OpenMM implementation (CPU and GPU, multi-threaded).

Even though OpenMM features outstanding support for custom force fields (it has a lexical parser for energy equations and can even compute their analytical derivatives), MMFF94 has rather complicated combination and scaling rules for non-bonded parameters, which required some tweaking on the OpenMM library to be implemented efficiently. I managed to implement under CPU and GPU platforms the seven energy terms of MMFF94 using a combination of AMOEBA and custom forces:

Below (and on GitHub) you will find a Jupyter notebook showing a geometry optimization benchmark on a small protein, villin.

As you may appreciate going through the notebook, the increase in performance provided by this preliminary proof-of-concept implementation is impressive: OpenMM single and double precision are respectively 150 and 11 times faster than the RDKit implementation on a GeForce GTX 960 graphics card.

Our goal is now to code a native implementation of the MMFF94 and UFF force fields within OpenMM, and then provide the RDKit with an interface to the latter, in order to benefit from the speed-up. Possible applications include the automated minimization of protein-ligand complexes after docking, or the molecular dynamics simulation of small molecules in explicit solvent under periodic boundary conditions. The availability of the latter will be announced on the Cresset website and on the RDKit mailing list.

Here follows the Jupyter Notebook (see it on GitHub):

In [1]:
import sys
import math
import timeit
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
import py3Dmol
from simtk.openmm import openmm

This is the villin headpiece as downloaded from the PDB:

In [2]:
villin = open('/home/paolo/pdb/2F4K.pdb', 'r').read()
p = py3Dmol.view(width = 400,height = 400)
p.addModel(villin, 'pdb')
p.setStyle({'cartoon': {'color':'spectrum'}})
p.zoomTo()
p.show()
Out[2]:
In [3]:
molOrig = Chem.MolFromPDBBlock(villin)
In [4]:
mol = Chem.AddHs(molOrig, addCoords = True)
In [5]:
pyMP = AllChem.MMFFGetMoleculeProperties(mol)
In [6]:
pyMP.SetMMFFVariant('MMFF94s')

Let’s create 4 forcefields, the “traditional” one and those spiced up with OpenMM, respectively using single and double precision CUDA kernels, and the multi-threaded single-precision CPU implementation.

In [7]:
for i in range(3):
    mol.AddConformer(Chem.Conformer(mol.GetConformer(0)), assignId = True)
In [8]:
platformNames = ['RDKit', 'OpenMM_CUDA_s', 'OpenMM_CUDA_d', 'OpenMM_CPU']
pyFF = {}
pyFF[platformNames[0]] = AllChem.MMFFGetMoleculeForceField(mol, pyMP, confId = 0)
for i in range(1, 4):
    pyFF[platformNames[i]] = AllChem.MMFFGetMoleculeOpenMMForceField(mol, pyMP, confId = i)

Now we instruct our RDKit interface to OpenMM to use the appropriate hardware platform:

In [9]:
pyFF['OpenMM_CUDA_s'].InitializeContext(
    'CUDA', {'Precision': 'single', 'DeviceName': 'GeForce GTX 960'})
In [10]:
pyFF['OpenMM_CUDA_d'].InitializeContext(
    'CUDA', {'Precision': 'double', 'DeviceName': 'GeForce GTX 960'})
In [11]:
pyFF['OpenMM_CPU'].InitializeContext('CPU')

These are the energies of the protein before minimization computed with the 4 methods; differences are negligible, as they should ideally be:

In [12]:
for name in platformNames:
    sys.stdout.write('{0:20s}{1:8.4f} kcal/mol\n'.format(name, pyFF[name].CalcEnergy()))
RDKit               826.8740 kcal/mol
OpenMM_CUDA_s       826.8734 kcal/mol
OpenMM_CUDA_d       826.8727 kcal/mol
OpenMM_CPU          826.8728 kcal/mol

Now we will carry out a geometry optimization with all methods, and take some timings.

The OpenMM minimizations in single precision bails out of the OpenMM L-BFGS minimizer with a LBFGSERR_MAXIMUMLINESEARCH error (-998) before the RMS gradient criterion kicks in. This is probably due to insufficient precision for the minimizer to behave correctly during the line search. Nonetheless, the energy values are not dramatically different from those computed by OpenMM using the GPU in double precision mode.

In [13]:
t = []
for i, name in enumerate(platformNames):
    ff = pyFF[name]
    t.append(timeit.timeit('ff.Minimize(maxIts = 100000, forceTol = 0.01)',
                      'from __main__ import ff', number = 1))
    sys.stdout.write('{0:20s}{1:8.4f} s ({2:.1f}x)\n'.format(name, t[i], t[0] / t[i]))
RDKit                82.7275 s (1.0x)
OpenMM_CUDA_s         0.5488 s (150.7x)
OpenMM_CUDA_d         7.3300 s (11.3x)
OpenMM_CPU           25.0867 s (3.3x)

The timings are impressive: OpenMM single and double precision are respectively 150 and 11 times faster than the RDKit implementation on a hyperthreading quad-core 3.40GHz Intel Core i7-3770 CPU equipped with a GeForce GTX 960 graphics card.

Also the multi-threaded OpenMM CPU implementation (single precision) scales very well, as it runs >3 times faster than the single-threaded RDKit implementation on the 8 virtual cores (4 physical) of our Core i7.

Energy values at the end of the minimization are comparable; the slight differences between are most likely amenable to the different implementations of the L-BFGS minimizer between RDKit and OpenMM:

In [14]:
for name in platformNames:
    sys.stdout.write('{0:20s}{1:8.4f} kcal/mol\n'.format(name, pyFF[name].CalcEnergy()))
RDKit               -53.4757 kcal/mol
OpenMM_CUDA_s       -52.6213 kcal/mol
OpenMM_CUDA_d       -57.5980 kcal/mol
OpenMM_CPU          -52.6949 kcal/mol

If we look at the heavy-atom-RMSD matrix across the 4 minimizations, we see that the smallest deviation occurs, as might be expected, between the RDKit and the OpenMM double precision implementations. However, the RMSD for the single precision calculations is < 0.7 Å.

In [15]:
molNoH = Chem.RemoveHs(mol)
In [16]:
confsNoH = [molNoH.GetConformer(i) for i in range(4)]
In [17]:
for y in range(len(confsNoH)):
    if (y == 0):
        for name in [''] + platformNames:
            sys.stdout.write('{0:>16s}'.format(name))
        sys.stdout.write('\n')
    for x in range(len(confsNoH)):
        if (x == 0):
            sys.stdout.write('{0:>16s}'.format(platformNames[y]))
        if (x < y):
            sys.stdout.write('{0:16s}'.format(''))
        else:
            sys.stdout.write('{0:16.4f}'.format(
                AllChem.AlignMol(molNoH, molNoH, prbCid = x, refCid = y)))
    sys.stdout.write('\n')
                           RDKit   OpenMM_CUDA_s   OpenMM_CUDA_d      OpenMM_CPU
           RDKit          0.0000          0.6815          0.2669          0.6701
   OpenMM_CUDA_s                          0.0000          0.5457          0.0463
   OpenMM_CUDA_d                                          0.0000          0.5315
      OpenMM_CPU                                                          0.0000

This is the visual difference between RDKit and OpenMM single precision (largest deviation)

In [18]:
p = py3Dmol.view(width = 400,height = 400)
p.addModel(Chem.MolToPDBBlock(molNoH, confId = 0), 'pdb')
p.addModel(Chem.MolToPDBBlock(molNoH, confId = 1), 'pdb')
p.setStyle({'cartoon': {'color':'spectrum'}})
p.zoomTo()
p.show()
Out[18]:

And this is how RDKit and OpenMM double precision compare (smallest deviation)

In [19]:
p = py3Dmol.view(width = 400,height = 400)
p.addModel(Chem.MolToPDBBlock(molNoH, confId = 0), 'pdb')
p.addModel(Chem.MolToPDBBlock(molNoH, confId = 2), 'pdb')
p.setStyle({'cartoon': {'color':'spectrum'}})
p.zoomTo()
p.show()
Out[19]:

Call for beta testers for Flare, our new structure-based design application

Flare provides new insights for structure-based design by integrating cutting edge approaches from Cresset with significant open source and commercial methods.

Using Flare you will:

  • Gain vital knowledge of the electrostatic environment of the active site of your protein
  • Compare protein and ligands electrostatics to improve new molecule design
  • Study how the electrostatic pattern of the active site varies across closely related proteins
  • Use electrostatic patterns across a protein family to design more selective ligands
  • Understand the locations and stability of water in your protein using 3D RISM based on XED and AMBER force fields
  • Use energetically favourable water to influence the electrostatic properties of the active site and improve ligand design
  • Design new molecules and dock them into the active using Lead Finder
  • Find the energetic hotspots in your protein using the WaterSwap methodology.

Flare will be available for beta testing in early February. If you would like to get involved then please contact us.

 

Docking Factor-Xa ligands with Lead Finder

Abstract

Lead Finder1 is a protein-ligand docking tool for the virtual screening of molecules and quantitative evaluation of interactions between protein and ligands. In this case study, two different Lead Finder docking modes (standard and extra precision) were used in docking studies on a small number of Factor-Xa (FXa) protein-ligand complexes originally used in the CSAR 2014 benchmark exercise2. Results show the robustness of Lead Finder at finding the bioactive conformation of the ligands, when starting from a random conformation. In addition, it shows that the standard docking mode and the extra-precision mode work well at docking ligands and the later gives tighter dockings and may highlight ligands with lower activity and that do not fit into the active site.

Introduction

Lead Finder is a docking tool from BioMolTech3 which generates docked ligand poses starting from the 3D structure of a protein (either experimentally derived by X-ray, or modeled by homology) and one or more 3D ligand structures. Lead Finder assumes that the protein is rigid, and analyses the possible conformations of the ligand by rotating functional groups along each freely rotatable bond.

FXa has been the target of drug discovery efforts at many pharmaceutical companies, where structure-based design has been used extensively. For this reason, FXa has been frequently used to benchmark new methodologies in structure-based design.

In this case study, we sought to replicate the typical experiments performed with docking engines during the lead optimization phase of small molecule discovery. We used, and compared, two different Lead Finder docking

modes (standard and extra precision) in two separate experiments. First we carried out self-docking studies on a small number of FXa protein-ligand complexes originally used in the CSAR 2014 benchmark exercise. Secondly we applied the two docking modes to a set of 45 related compounds, again taken from the CSAR 2014 dataset.

Lead Finder docking workflow

The ideal docking process with Lead Finder (Stage 1 in Figure 1) starts with an accurate protein preparation with BioMolTech’s Build Model.4 This includes:

  • addition of hydrogens to the heavy atoms of the protein, and assignment of optimal ionisation states of protein residues;
  • optimization of the spatial positions of polar hydrogen atoms to maximize hydrogen bond interactions and minimize steric strain;
  • optimization of side chain orientations of His, Asn and Gln residues for which X-ray analysis can return flipped orientations due to apparent symmetry.


Figure 1. A typical Lead Finder workflow.

Build Model uses an original graph-theoretical approach5 to assign optimal ionization states of protein residues at arbitrary pH conditions, which is based on the Screened Coulomb Potential (SCP) model.5,6

After completing protein preparation, an energy grid map (Stage 2) is calculated and saved for the protein binding site. This energy map is then used to dock the ligand structures.

The docking engine in Lead Finder (Stage 3) combines a genetic algorithm search with local optimization procedures, which make Lead Finder efficient in coarse sampling of ligands poses and following refinement of promising solutions.

The standard docking mode provides an accurate and exhaustive search algorithm. However, in extra-precision mode, Lead Finder uses the most rigorous sampling and scoring algorithms to increase accuracy and reliability of predictions at the cost of slightly slower speed of processing.

Scoring functions1 in Lead Finder (Stage 4) are based on a semi-empiric molecular mechanical functional that explicitly accounts for various types of molecular interactions. Individual energy contributions are scaled with empiric coefficients to produce three scoring functions tailored for:

  • correct energy-ranking of docked ligand poses (Rank-score);
  • correct rank-ordering of active and inactive compounds in virtual screening experiments (VS-score);
  • binding energy predictions (dG-score).

In this study we concentrated on the poses that were generated and hence were focused on the Rank-score function.

Method

Initially we used a crystal structure of the FXa protein in complex with compound GCT000006 (GCT, PDB ID: 4ZH8). As can be seen in Figure 2, the 6-chloronapth-2-yl group of GCT binds into the S1 primary specificity pocket, while the morpholino group occupies the aromatic box (Tyr99, Try215, Phe174) of the S4 pocket.

 

Figure 2. Structure of the FXa protein in complex with the GCT ligand.

The protein was prepared with the default options of Build Model, in which the ligand is removed from the active site and the water molecules are retained. The coordinates of the ligand were then used to define the bounding box for the calculation of the energy grid maps.

Self-docking experiment

We started by re-docking GCT to the 4Z­H8 crystal structure to address the ability of Lead Finder of correctly identifying its bioactive conformation. In order to avoid bias in the self-docking experiment, the 3D conformation of GCT was flattened to 2D and then converted back into 3D using Cresset’s XedConvert7. A minimization with Cresset’s XedMin7 was followed to relax the ligand to a local minimum. The GCT ligand was then docked to the protein PDB 4ZH8 using the standard docking mode and the extra-precision docking mode.

Protein-ligand docking

A sub-set of 45 small molecules from the CSAR  2014 dataset with known activity against FXa (Table 1) were docked to the crystal structure 4ZH8.

Most of these ligands have in common a chlorinated mono or polyaromatic group and a morpholino group. All ligands were converted into 2D and then back to 3D with XedConvert and subsequently minimized with XedMin. The crystallographic ligand was used to define the bounding box for the energy grid maps. The 45 ligands were then docked to the protein using the standard docking mode and the extra-precision mode.


Table 1. Representative structures for 45 ligands  used in the docking study.
 


Figure 3. Lead Finder self-docking experiment on 4ZH8 using the standard (top row) and extra precision (bottom row) docking modes. The RMSD (in Å) between the docked pose (thick sticks) and the X-ray coordinates of GCT (thin sticks) is reported for each pose.

Results

Self-docking

When using the standard docking mode and the extra-precision mode, Lead Finder outputs up to 10 best poses (if available) ranked in order of increasing rank score.

Figure 3 shows the five top ranking poses of GCT obtained using the Lead Finder standard (top row) and extra precision (bottom row) docking modes. The poses are ordered from the best ranking (left) to the worst ranking (right).

As can be seen in this picture, the five top ranking poses for both standard docking mode and extra-precision mode are closely aligned to the X-ray conformation of the ligand, correctly orienting the naphthalene ring of GCT into the S1 binding pocket.

In terms of RMSD deviation, values obtained are similar with the two different modes with each method able to find a solution within 2A RMSD of the x-ray pose in the top 5 results. However, the extra precision mode finds this result at position 2 rather than 4 and the pose is very close to the xray-ligand (RMSD 1.44) with a single R-group oriented differently. A small but potentially significant improvement.

Figure 4 shows the self-docking of other FXa proteins (4ZHA, 4Y7A and 4Y79) performed with the standard docking mode and with the extra-precision mode.

For these less flexible ligands, the extra-precision mode seems to have little effect on the RMSD of the results. Both modes are again able to identify the correct orientation of the ligand in the FXa active site with a RMSD that is within 2A for the top scoring pose..

Figure 4. Self-docking experiment on 4ZHA, 4Y7A and 4Y79 using Lead Finder’s extra-precision mode. The RMSD (in Å) between the top-scoring docked pose (thick sticks) and the X-ray coordinates of the native ligand (thin sticks) is reported for each pose.

Protein-ligand-docking

Figure 5 shows a side-by-side comparison of the superimposed top-ranking poses for the 45 FXa ligands docked into the 4ZH8 protein using standard (left) and extra precision (right) docking modes.

For the standard mode, the majority of ligands are docked with the naphthalene ring correctly pointing down into the S1 binding site. However, one ligand is not docked as expected, with the pyrrolidine group pointing to the outside of the protein (GCT98A). Interestingly, this compound is the one with the lowest activity (pIC50 6.2) in the dataset.

When using the extra-precision mode, the docked poses in general look tidier, even though two ligands docked with the naphthalene group pointing outside of the S1 pocket: one is again GCT98A, and the other is GCT44A, the compound in the dataset with the second lowest activity (pIC50 = 6.4). These findings seem to indicate that Lead Finder may be able to provide useful suggestions for discriminating between active and non-active compounds.

Figure 5. Docking FXa ligands to 4ZH8 using the Lead Finder’s standard (left) and extra precision (right) docking modes.

Conclusion

This case study shows a typical Lead Finder docking workflow and demonstrates the robustness of the program by means of several self-docking experiments. Results show that Lead Finder does a good job at finding the bioactive conformation of flexible ligands, when started from a random conformation. In addition, we explored two docking modes (the standard and extra-

precision) to dock a sub-set of FXa ligands from CSAR 2014. While both methods seem to work well at generating sensibly aligned poses, the extra-precision mode provides tighter dockings and may be able to highlight ligands with lower activity which may not fit into the active site.

References

  1. Stroganov et al., Lead Finder: An Approach to Improve Accuracy of Protein-Ligand Docking, Binding Energy Estimation, and Virtual Screening, Chem. Inf. Model. 2008; 48, 2371-2385.
  2. Carlson et al., CSAR 2014: A Benchmark Exercise Using Unpublished Data from Pharma, J. Chem. Inf. Model. 2016; 56, 1063-1077.
  3. http://www.biomoltech.com/
  4. Stroganov et al., TSAR, a new graph-theoretical approach to computational modeling of protein side-chain flexibility: Modeling of ionization properties of proteins, Proteins, 2011; 79, 2693-2710.
  5. L. Mehler, Self-Consistent, Free Energy Based Approximation To Calculate pH Dependent Electrostatic Effects in Proteins, J. Phys. Chem. 1996; 100, 16006-16018.
  6. L. Mehler and F. Guarnieri, A Self-Consistent, Microenvironment Modulated Screened Coulomb Potential Approximation to Calculate pH-Dependent Electrostatic Effects in Proteins, Biophysical Journal, 1999, 75, 3–22.
  7. http://www.cresset-group.com/products/xedtools/

November release of Spark reagent databases now available

The November release of the Spark reagent databases derived from eMolecules is now available.

As announced in the October newsletter, Spark users can now benefit from monthly releases of reagent databases derived from eMolecules’ building blocks collection. The rolling updates are intended to provide the very best availability information on the reagents that you wish to employ.

The updated databases can be downloaded now through the Spark Database update widget (instructions on the installing Spark databases page) or using a command line utility (such as wget, please contact us for details).

Dr Scoffin, Cresset CEO, chairs and presents at International Congress of Medichem, Nanjing, China, 16-19 November 2016

Cambridge, UK – 15th November 2016 – Cresset, innovative provider of software and contract services for small molecule discovery and design, is pleased to announce that this week, Dr Robert Scoffin, CEO, will chair the ‘Sym 203: Pioneering Screening Technologies for Lead Compounds’ session at the Annual Congress of Medichem in Nanjing, China. Dr Scoffin will also present an overview of virtual screening methods for drug discovery.

The virtual screening of molecules is a commonly used technique within pharmaceutical drug discovery, and has many applications including the selection of compounds for ‘wet screening’ and the design of novel libraries of compounds.

“Cresset is a market leader in virtual screening,” says Dr Scoffin. “Blaze is an effective ligand-based virtual screening platform used by pharmaceutical companies globally. Blaze is also used by Cresset Discovery Services who carry out virtual screening in many consultancy projects. I am looking forward to sharing the various methods employed within a virtual screening cascade, including 2D methods, 3D ligand-based methods and 3D structure-based methods, and describing how each of these contributes to the overall value of the virtual screening process.”

dr-robert-scoffin
Dr Robert Scoffin

Review of Cresset seminar at the Chem-Bio Informatics Society Annual Meeting 2016, Japan

 Over 60 delegates attended this seminar, hosted by Cresset and Level Five (Cresset’s distributor in Japan) during the CBI Annual Meeting 2016, Tokyo, October 25th-27th.

Dr Giovanna Tedesco presented presented A picture tells a thousand words: Using activity cliffs maps to understand SAR. She focused on the combined use of activity cliffs analysis, as implemented in Activity Atlas and Activity Miner, as a useful method to analyze the Structure-Activity Relationships (SAR) of data sets of different size and complexity.

Summarizing SAR into simple, interpretable maps

Activity Atlas is a novel, qualitative method available in Forge, Cresset’s powerful workbench for ligand design and SAR analysis. It uses a probabilistic approach to take a global view of the data in a qualitative manner, analyzing the SAR of a set of aligned compounds as a function of their 3D electrostatic, hydrophobic and shape properties. Results are displayed as highly visual 3D activity cliffs summary maps that give an overview of the SAR landscape, focusing on the prevalent SAR signals.

Rapid navigation of complex SAR, highlighting key activity changes

Activity Miner, a module of Forge (and optional module in Torch), enables rapid navigation of activity cliffs. It can be used to drill down into the Activity Atlas maps and understand subtle molecule-to-molecule structure-activity changes, identifying potential outliers and giving concrete examples of changes that generate the summary.

The information derived from these analyses is an invaluable aid for drug discovery projects that help to inform design decisions and prioritize new molecules for synthesis.

Case studies

The application of the methods presented was illustrated on example data sets of different size and complexity taken from the literature. The case studies shown include:


cresset-seminar-at-the-chem-bio-informatics-society-annual-meeting-2016_japan
Cresset seminar at Chem-Bio Informatics Society Annual Meeting 2016, Japan
cresset-stand-at-the-chem-bio-informatics-society-annual-meeting-2016-japan
Cresset stand at Chem-Bio Informatics Society Annual Meeting 2016, Japan

What’s great about Lead Finder?

We recently announced our collaboration with BioMolTech, a small modeling software company best known for their docking software, Lead Finder. Cresset has been traditionally focused on ligand-based design, but as we expand our capabilities into more structure-based methods we realized that we would have to supply a robust and accurate docking method to our customers. So, why did we choose Lead Finder?

docking_expt_setup

A graphical interface to Lead Finder will be included on our new structure-based design application.

The requirements for a great docking engine are simple to state: it needs to be fast and it needs to be accurate. The latter is by far the most important: nobody cares how quickly you got the answer if it is wrong! Our first question when evaluating docking methods was therefore to ask how good it was. This is actually a difficult question to ask, as there are several different definitions of ‘good’ depending on what you want: virtual screening enrichment? Good pose prediction? Accurate ranking of active molecules?

The first of these, virtual screening, is what most people think of when they think of docking success. Lead Finder has been validated on a wide variety of target classes and shows excellent enrichment rates (median ROC value across 34 protein targets was 0.94), even on targets traditionally seen as very hard such as PPAR-γ. The performance on kinases was uniformly excellent, with ROC values ranging from 0.86 for fibroblast growth factor receptor kinase (FGFR) to 0.96 for tyrosine kinase c-Src.

docked_syk_ligands

A series of SYK ligands docked to PDB 4yjq with crystal ligand shown in purple.

Pose prediction is of more interest to those working in the lead optimization phase, where assessing the likely bound conformation of a newly-proposed structure can be very helpful in guiding design. Here, too, Lead Finder performs well. On the widely-used Astex Diverse Set, used to test docking performance, Lead Finder produces the correct pose as the top-scoring result 82% of the time, which is comparable to other state-of-the-art methods (Gold, for example, gets 81% on the same measure). On a number of literature data sets testing self-docking performance Lead Finder finds the correct pose between 81 and 96% of the time, which is excellent.

leadfinder_command_lines

Lead Finder includes dedicated modes for extra-precision and virtual screening experiments.

One of the most intriguing things about Lead Finder is the makeup of the scoring functions. In contrast to many other scoring functions which use heuristic or knowledge-based potentials, the Lead Finder scoring functions comprise a set of physics-based potentials describing electrostatics, hydrogen bonding, desolvation energy, entropic losses on binding and so on. Different scoring functions can be obtained by weighting these contributions differently: BioMolTech have found that the optimal weights for pose prediction differ slightly from those for energy prediction, for example. A separate scoring function has been developed which aims to compute a ΔG of binding given a correct pose. This is a difficult task, and the success of the Lead Finder function was demonstrated in the in the 2010 CSAR blind challenge, where the binding energy of 343 protein-ligand complexes had to be predicted ab initio. Lead Finder was the best-performing docking method in that challenge. BioMolTech are actively building on this excellent result with the aim of making robust and reliable activity predictions a standard outcome of a Lead Finder experiment.

Cresset are proud to be the worldwide distributors for Lead Finder. It is available today as a command-line application and will be built into Cresset’s upcoming structure-based drug design workbench.

Request an evaluation of Lead Finder.