Blaze used in discovery of allosteric modulators of the high affinity choline transporter

A variety of neurological conditions can potentially be treated through the stimulation of cholinergic neurotransmission. The choline uptake into certain neurons is mediated by the choline transporter (CHT), which is well-characterized but otherwise unexplored as a potential drug target.

A team consisting of scientists from Pfizer, Neusentis, Nanion Technologies, and Kissei Pharmaceutical Company used two compound sets: (1) a specially created set of 887 molecules derived from the full Pfizer compound screening collection using Cresset’s virtual screening tool Blaze; (2) 2,753 molecules from the Pfizer Chemogenomic Library. From these sets they were able to identify nine active small molecules that modulate CHT.

This work will enable them to test the hypothesis that positive modulation of CHT will enhance activity-dependent cholinergic signaling. Read the full paper Discovery of Compounds that Positively Modulate the High Affinity Choline Transporter.

Using Blaze to develop a screening set from a corporate compound library

The team had identified two CHT modulators from the literature: one CHT positive allosteric modulator and one CHT negative allosteric modulator. Each of these was used within Blaze to search the full Pfizer compound screening collection for compounds with similar electrostatic and shape properties and therefore potentially similar biological activity.

The computational team kept the top 500 compounds from each virtual screen, based on the Blaze scoring function to form a set of 1000 compounds. This set was filtered based on compound availability and the removal of chemically unattractive groups, resulting in a test set of 887 compounds. This library was screened in assays, as detailed in the paper.

Identification of previously unknown active and structurally distinct molecules

Five compounds of interest were identified from the 887 test set created using Cresset’s Blaze. Three of these were confirmed as positive allosteric CHT modulators and two as negative allosteric modulators of CHT function. A further four compounds of interest were identified from the 2,753 molecules from the Pfizer Chemogenomic Library. The compounds of interest are shown in Table 2 ‘Tool compound data’ which forms part of the paper.

This paper demonstrates the high value of virtual screening in focusing a screening campaign. The team successfully identified previously unknown active and structurally distinct molecules that could be used as tools to further explore CHT biology or as a starting point for further medicinal chemistry.


Selected images from Blaze results with purported CHT modulator seed molecules (PAM MKC-351 and NAM ML-352) (green) shown on the left and output molecules 1-5 shown on the right (grey). Fields are shown with positive (red), negative (cyan), van der Waals (yellow), and hydrophobic (orange) regions.

Conduct ligand-protein docking

A long-standing customer of Cresset Discovery Services asked us to identify new compounds that could be active at their protein target. We conducted ligand-protein docking to narrow down their 50k compound library to the best 1.5k compounds. The cost of the consulting project plus the chemistry for 1.5k compounds was about 20% of what it would have cost to buy and screen the entire 50k library.

Ligand-protein docking can be an excellent way to build up knowledge about the binding pocket. It can also form the basis for a virtual screen to identify new active compounds.

Cresset Discovery Services had been working with this customer on a particular ligand for some time, but there was very little information available about the protein target. There were homologues in the literature, but they were distantly related and nothing very similar had been crystallized.

Detailed preparatory work to model the protein active site

It was necessary to do a lot of modeling work to build up the relationship between the human target and the distantly related proteins available from the literature. We built sequence alignments and compared them, enabling us to build up 3D models of the target and its interaction with the ligand.

Some mutagenesis data was available on the known ligands, so we were able to use this to refine the 3D models and check that the correct residues were in the right places on the active site. This enabled us to define the active site for the ligands. We went on to calculate the energies for the protein-ligand interactions to make sure we had identified poses that made sense.

This was a complex system that required a great deal of protein preparation. This preparatory work was essential for successful docking and required expert knowledge, experience and skill.

Docking and virtual screening using different scenarios

At the end of this process we had a good model of the protein-ligand system. The next step was to remove the ligand and carry out docking.

Docking was first tested on the molecules that were known to bind to the target. This resulted in excellent retrieval rates, showing that the model would also be able to retrieve new compounds.

There were a number of different binding sites on the protein so we decided to carry out the virtual screening using different scenarios for the protein. We:

  • Kept the ligand intact in the binding site
  • Removed the ligand completely
  • Looked at partly bound situations and un-bound situations for each of the binding sites.

The customer provided us with a set of 50k ligands and we docked each of these against the binding pockets. A docking scoring system was used to rank the top 2k compounds from each of the screens.

Analyzing the results and compiling a purchasing list

The top 2k compounds from the four screens were analysed in detail. We visualized every one of the top 2k compounds and looked at each of the docking poses. The docking gave us good geometries for the ligands and we used Cresset software to check that the electrostatics made sense. Any compounds that were unlikely to bind well were rejected.

A final, ranked list was provided to the customer with a very high degree of confidence that it included compounds that were active at the protein target. They were able to procure about 75% of the compounds from the hit list, giving them a final set of 1.5k compounds to test.

An incredible saving in time and money

Carrying out virtual screening to focus the library in this way represented an incredible saving in time and money for our customer. The alternative approach would have been to buy and test the whole 50k compound set. Not only would the customer have needed to purchase all of the compounds, but also shipped them, stored them, plated them, screened them, and then they would still have to analyse the results.

The estimated cost of doing this for all 50k compounds would have been about five times the cost of the combined tasks of the Cresset Discovery Services project plus buying and testing 1.5k compounds.

Cost of

{buying and testing 50k compounds}

=  5 X

Cost of

{Cresset Discovery Services project + buying and testing 1.5k hit list}

Contact us to find out how we can add value to your project.

 

 

 

 

 

Dr Martin Slater, Director of Consulting Services

What’s great about Lead Finder?

We recently announced our collaboration with BioMolTech, a small modeling software company best known for their docking software, Lead Finder. Cresset has been traditionally focused on ligand-based design, but as we expand our capabilities into more structure-based methods we realized that we would have to supply a robust and accurate docking method to our customers. So, why did we choose Lead Finder?

docking_expt_setup

A graphical interface to Lead Finder will be included on our new structure-based design application.

The requirements for a great docking engine are simple to state: it needs to be fast and it needs to be accurate. The latter is by far the most important: nobody cares how quickly you got the answer if it is wrong! Our first question when evaluating docking methods was therefore to ask how good it was. This is actually a difficult question to ask, as there are several different definitions of ‘good’ depending on what you want: virtual screening enrichment? Good pose prediction? Accurate ranking of active molecules?

The first of these, virtual screening, is what most people think of when they think of docking success. Lead Finder has been validated on a wide variety of target classes and shows excellent enrichment rates (median ROC value across 34 protein targets was 0.94), even on targets traditionally seen as very hard such as PPAR-γ. The performance on kinases was uniformly excellent, with ROC values ranging from 0.86 for fibroblast growth factor receptor kinase (FGFR) to 0.96 for tyrosine kinase c-Src.

docked_syk_ligands

A series of SYK ligands docked to PDB 4yjq with crystal ligand shown in purple.

Pose prediction is of more interest to those working in the lead optimization phase, where assessing the likely bound conformation of a newly-proposed structure can be very helpful in guiding design. Here, too, Lead Finder performs well. On the widely-used Astex Diverse Set, used to test docking performance, Lead Finder produces the correct pose as the top-scoring result 82% of the time, which is comparable to other state-of-the-art methods (Gold, for example, gets 81% on the same measure). On a number of literature data sets testing self-docking performance Lead Finder finds the correct pose between 81 and 96% of the time, which is excellent.

leadfinder_command_lines

Lead Finder includes dedicated modes for extra-precision and virtual screening experiments.

One of the most intriguing things about Lead Finder is the makeup of the scoring functions. In contrast to many other scoring functions which use heuristic or knowledge-based potentials, the Lead Finder scoring functions comprise a set of physics-based potentials describing electrostatics, hydrogen bonding, desolvation energy, entropic losses on binding and so on. Different scoring functions can be obtained by weighting these contributions differently: BioMolTech have found that the optimal weights for pose prediction differ slightly from those for energy prediction, for example. A separate scoring function has been developed which aims to compute a ΔG of binding given a correct pose. This is a difficult task, and the success of the Lead Finder function was demonstrated in the in the 2010 CSAR blind challenge, where the binding energy of 343 protein-ligand complexes had to be predicted ab initio. Lead Finder was the best-performing docking method in that challenge. BioMolTech are actively building on this excellent result with the aim of making robust and reliable activity predictions a standard outcome of a Lead Finder experiment.

Cresset are proud to be the worldwide distributors for Lead Finder. It is available today as a command-line application and will be built into Cresset’s upcoming structure-based drug design workbench.

Request an evaluation of Lead Finder.

What’s in the CDS virtual screening toolbox?

Cresset is very well known for providing fast and accurate ligand-based virtual screening through Blaze. We have now added the Lead Finder docking engine to our virtual screening toolbox, giving Cresset Discovery Services (CDS) the most comprehensive virtual screening capabilities available anywhere in the industry.

Based on an informal survey of our contacts and customers, I estimate that something like 50% of all current pharma SME projects are ‘structure enabled’. Lead discovery and lead optimization are driven through the use of in-house structures, public structures (typically from the PDB) and homology models. These structures inform lead optimization programs by explaining observed SAR and providing feedback and a detailed context for the design of further analogues.

CDS routinely uses the Cresset software Blaze for ligand-based virtual screening. Although we had access to structure-based methods, we are pleased to have brought Lead Finder in-house, giving us full capability in conducting ligand-protein docking.

Ligand-based virtual screening with Blaze

Virtual screening with Blaze remains one of the most consistently requested projects for CDS. What makes Blaze extremely useful for our customers is:

  • Virtual screening is probably the only way to really sample adequate chemical diversity
  • Virtual screens are far more cost effective than wet HTS
  • Excellent enrichments can be achieved
  • The chemotype diversity in the output is second to none.

Blaze also relies on two very simple premises:

  1. A bioactive conformation encodes, in its shape and electrostatic field, both the properties, recognition features and solvation pattern optimised for interaction with its protein target site.
  2. A molecule conformation with increasing ‘shape and field’ similarity to that bioactive conformation has an increasing probability of also being active.

So, the key determinants of real activity obtained from hit lists (other than was this truly the ‘bioactive conformation’?) is often just how relevant and what distribution that hit conformation has in the population. This is fundamentally why our ligand-centric screening invariably works extremely well. Given that a molecule can adopt a similar shape, and project the same electrostatic patterns, from a completely different chemical architecture, leads to a very diverse output.

Structure-based virtual screening with Lead Finder

The Lead Finder software has been developed to provide cutting-edge docking for an array of typical tasks, from high-throughput virtual screening to best-in-class prediction of bioactive conformations to accurate prediction of binding energies. In combination with the companion Build Model protein preparation tool, Lead Finder has been shown to match or outperform the historically leading docking solutions.

When preparing ligands for virtual screening in Blaze, CDS scientists use modeling to help define the best ‘hand-crafted’ estimate of a bioactive conformation, based on the widest data for any given system. We apply the same care to exploring and preparing protein targets prior to structure-based virtual screens. We take advantage of three main approaches. Firstly, Lead Finder includes the excellent Build Model protein preparation tool. Secondly, we are privileged to be able to model proteins and ligands using the same proprietary XED force field used to give the accurate electrostatics that all Cresset software is based on. Finally, at CDS we have access to the latest Cresset software that is still under development. This gives us capability to provide protein electrostatic field maps and water analysis, providing a very reliable starting position for structure-based virtual screening.

vs_2bsm3

Lead Finder uses a stochastic ligand sampling workflow, with conformations generated on-the-fly, and a genetic algorithm for processing these into pools of the best docking poses. Multiple interaction grids are generated from the protein target and combined to define a scoring system for poses. More importantly, the scoring method has been shown to outperform some of the more conventional docking engines currently available commercially.

Structure-based or ligand-based?

What are the advantages of having structure-based and ligand-based virtual screening?  And how do we choose which is the best approach for a project?

Ligand-based virtual screening is less computationally intensive, making it a preferred option when there is a known ligand available. An average protein of 400 amino acids has over 20,000 heavy atoms and 9,600 bonds and in excess of 50 charges, making it a more challenging system to model.

However, even when there is a known ligand there are some situations when a ligand-based virtual screening is not viable, such as when the known ligand does not exploit all the interactions available in an active site or when a protein has an unattractive orthosteric site and attractive allosteric sites with no known ligands. In these cases, we prefer to use a structure-based method.

In the case of protein-protein interaction sites and protein-DNA/RNA sites, Blaze can take DNA and protein fragments as a template in place of a ligand. However, it is useful to have a structure-based approach available for comparison.

In fact, we often find it useful to combine different virtual screening techniques. In lead discovery, one of the key requirements for virtual screening is to maximise the diversity of hits returned.  All virtual screening techniques, be they ligand-based or structure-based, are probabilistic techniques in that they may be used to increase the likelihood of getting hits from a wet screen. No technique guarantees to give absolute binding energies (at least not in the context of virtual screening on any realistic size of screening library), but they do give good rank ordering of compounds and can, therefore, be used as a means of selection and prioritisation.

Ligand-based techniques, whether 2D or 3D, are algorithmically distinct from structure-based techniques such as docking and, therefore, give different rankings to compounds. Different approaches return different hits and the results can be combined into an enriched final list.

Combining the results of structure-based and ligand-based techniques provides further diversity, leading to better hit rates and more interesting hits.

A one-stop shop for virtual screening

Through combining the strengths of Blaze in the ligand-based world with Lead Finder for docking, CDS now has the most comprehensive virtual screening capabilities available anywhere in the industry. Both Blaze and Lead Finder are available to purchase as software or as a service through CDS. CDS is truly now a one stop shop for virtual screening and indeed very much more.

Download a free evaluation of Lead Finder or access the Blaze demo server.

Affordable virtual screening with Blaze: Benchmarks

Introduction

We released BlazeGPU a couple of years ago, allowing the full power of the Blaze virtual screening system to be used on a few consumer graphics cards rather than a full-scale Linux cluster. Since then, graphics cards and CPUs have only got faster, so we decided that it was time to update our benchmarks and see how well all of the new hardware performs.

For these benchmarks we took a random subset of 4,000 molecules from our in-house Blaze data set and searched with a medium-sized query molecule. The molecules in the data set average 80 conformers each. We’ve run with three different search conditions: the full slow-but-accurate simplex algorithm, the standard clique algorithm and the new fastclique algorithm. All of these were run with 50% fields and 50% shape.

CPU performance

Firstly, the CPU benchmarks. All of these are single-core performance, but with all cores loaded so that we’re not benefitting from Intel Turbo Boost. In most cases Blaze will be saturating all cores, so this is representative of real-world performance. Note that the vertical axis is on a log scale.

CPU benchmarks

As can be seen, there’s a significant performance difference between the older CPUs at Cresset (such as the Q6600) and the newer Ivy Bridge i7-3770K chips, but not nearly as much as you would expect given that the Q6600s are around 7-8 years old at this point. The significant speed improvements of the fastclique algorithm are clearly visible with the throughput being more than 4x greater than the original clique algorithm. The last set of columns on the graph are from an Amazon c4.xlarge instance and show that the performance of each core on those systems is roughly the same as the Sandy Bridge i3-2120.

GPU Performance

Moving on to the GPUs, we’ve tested the throughput on a variety of different systems. Firstly, we’ve tested a variety of GTX580s on different motherboards and processors. As you would expect, for the most part the performance is governed by the GPU, but the exception is the fifth test system which is noticeably slower than the others. That card is sitting in a much older chassis with an older motherboard and hence is probably suffering from lack of backplane bandwidth to the GPU.

GPU benchmarks

The newer GTX960s perform extremely well on the Blaze calculations. We weren’t sure if they would, after the disappointment of the GTX680 which was noticeably slower than the 580 (data not shown). The difference is noticeable in the clique stages, but really stands out in the simplex calculations where a GTX960 is 50% faster than the GTX580s. By contrast, the high-end Tesla hardware is not a great performer on the Blaze OpenCL kernels. By all accounts the Tesla hardware is significantly faster than the consumer hardware on double precision workloads, but the Blaze code is all single precision and in that realm the cheap consumer hardware has an unbeatable price/performance advantage.
Finally, the GRID K520 is the hardware found on the Amazon g2.2xlarge and g2.8xlarge instances. As can be seen, it’s not a brilliant performer on the Blaze workload, being around the same speed as the Tesla on the fastclique algorithm but noticeably slower than all of the other cards tested on the simplex workload. However, it provides a nice test of GPU scaling: when running on a 4 times larger data set on all 4 GPUs of a g2.8xlarge instance, we observed substantially the same throughput as running the original data set on a single K520 GPU, showing that we can parallelise across multi-GPU systems with no loss of performance.

Cost efficiency on Amazon

Converting the throughput shown above, we can look at the cost of screening on the Amazon cluster with Blaze. The raw cost to screen a million molecules is shown in the table. Note that the actual costs will be somewhat higher, due to job overheads and data transfer costs.

Cost efficiency on Amazon

The Amazon GPU solutions are noticeably cheaper for fastclique jobs, roughly cost-competitive for the clique runs, but the poor performance of the K520 on the simplex task means that it is significantly more expensive there. As a result, at the moment there’s no real impetus to use the Amazon GPU resources unless you can get them significantly more discounted than the CPU instances on the spot market.

Conclusion

New hardware is significantly faster at running Blaze than old stock as would be expected. However, the speed increases are much lower than they have been in the past, with CPUs that are well past their best still performing adequately. On the GPU side, Blaze performs particularly well on commodity graphics cards leaving few reasons for us to invest in dedicated GPU co-processing cards.

The cost of running a million molecule virtual screen on the Amazon cloud has never been cheaper. If tiered processing is used as is the default for Blaze then these screens can be performed for a very low cost indeed – less than $15 per million molecules for the processing costs.

Contact us for a free evaluation to try Blaze on your own cluster, or Blaze Cloud.

Spatial overlap of peptide hotspots and canonical drug pockets in a model enzyme

Spatial overlap of peptide hotspots and canonical drug pockets in a model enzyme was presented by Dr Walraj Gosal, Senior Scientist, Isogenica at the Cresset European User Group Meeting 2015.

Walraj’s talk described the process of moving from peptides from molecular display to small molecule inhibitors, with the help of Cresset technology. In collaboration with Cresset and Biolauncher, the team found that Cresset’s field patterns based on peptides can be used to find new inhibitors. The work was funded by the TSB (Technology Strategy Board).

Molecular (CIS) display1 is an Isogenica technology that allows you to find novel peptides and protein scaffolds that bind a given target.

Walraj described the basic problem: we don’t know how to move from the primary sequence to the precise 3D fold of the protein. He described it as the ultimate needle in a haystack problem, whereby a 100 amino acid protein relates to x 10130 sequences. They are trying to figure out how, if someone comes to us with a target, they can get a sequence that will bind to the target.

Ultimately, the ideal solution would be a complete algorithmic solution and he briefly highlighted recent computational approaches (e.g. Rosetta) that are showing evermore promise towards this goal2,3. However, at the moment the only viable approach is molecular display. For example, Humira – the biggest selling drug worldwide – was one of the first to reach the market that was partially discovered using display technology.

He went on to describe the basic premise of molecular display, which is to have a library of peptides or proteins that maintain their link to RNA or DNA (a ‘genotype to phenotype’ link). The process is then to enrich the library by presenting it with the target over many rounds of selection.

Moving onto the problem the team were trying to solve – can CIS display peptides inform small molecule discovery. Their target choice was thrombin, which Walraj described as ideal for a number of reasons. Firstly, there is already a mountain of medicinal chemistry data available in the public domain due to the race for a direct thrombin inhibitor in Industry. In general, the compounds that have made it on to the market are all based on the substrate, but are very basic – the reasons being that they mimic a key arginine-aspartate salt bridge in the so-called S1 pocket. This led to lead molecules where the bioavailability was low, and clever pro-drug strategies were necessary that eventually led to drugs on the market (e.g. Dabigatran).

Secondly, and more importantly, the team were inspired by the fact that Nature has found alternative solutions to the problem of inhibiting thrombin and Walraj highlighted three: from a tropical bont tick, the mosquito and a medicinal leech4.

So here is the key question: are molecular display peptides going to open up more avenues for drug design, or are they consistent with previous efforts? The answer turns out to be a bit of both.

They found that many of the peptides bound to the active site but some that also bind to an allosteric site – the latter already suggesting that drug design efforts could be focused on other sites largely ignored by Industry. Nevertheless, looking at the active site binders, whilst many of those peptides contained a motif that mimic the natural mosquito inhibitor, most of them appeared unrelated to each other suggesting multiple solutions. A lot of biochemistry was carried out by the team, and eventually the two best peptides were crystallised with thrombin, which confirmed the binding at the active site.

These structures showed orthogonal solutions – one very much based on the Mosquito solution which is to insert a key arginine in S1 in an opposite direction to substrate – incompatible with catalysis. The other solution appeared substrate-like in its path and direction to the S1 site but with a key difference. Here the peptide delivered an extremely novel ‘warhead’ in the S1 pocket that violated the paradigm that the arginine-aspartate salt bridge was required for high affinity. The latter was especially important as the peptide bound with single-digit nanomolar affinity.

They carried out a computational study (using Rosetta) and alanine-scanning mutagenesis experiments (seeing excellent correlation between the two) to determine the key interactions or ‘hotspots’. They then asked whether there was a spatial overlap between the hotspots of these peptides and the canonical drug pockets and interactions that have been exploited over the last 40 years (using data from the PDB). They saw that the hotspots overlap remarkably well with drugs from the PDB. However, some of the high-energy interactions seen in the peptides have never been exploited for drug design. For example, for one of the peptides, a loop movement creates a whole new pocket close to the active site.

Furthermore, the paradigm-violating lipophilic and neutral solution to S1 occupation appears to have been already discovered through HTS and fragment-based design. This highlights the power of molecular display – orthogonal solutions can be discovered remarkably quickly.

The next crucial – and by no means trivial step – is then how you move from these peptide solutions to discovering small molecules? Here, Cresset’s virtual screening technology, Blaze, proved invaluable. The team at Cresset took the crystal structures and produced field patterns based on linear stretches of the peptides incorporating one or more hotspots. For one of the peptides, fewer than 160 of the top compounds suggested by Blaze were experimental screened, and the team found two every small competitive inhibitors of thrombin that were previously unknown.

Conclusion

Walraj concluded that Cresset’s peptide field maps arising from molecular display are sufficient to discover small molecular inhibitors, and combining the power of molecular display and virtual screening would open up a powerful new avenue to drug discovery.

  1. Odegrip, R. et al. PNAS 101, 2806-2810 (2004)
  2. Kuhlman,B. et al. Science 302, 1364-1368 (2003)
  3. Fleishman,S.J. et al. Science 332, 816-821 (2011)
  4. Huntington, J. A. Thromb. Haemost., 111, 583-589 (2014).

ChEMBL leadlike compounds freely searchable on Blaze demo server

The Blaze virtual screening demo server has proved popular since its launch last year, however, we wanted to extend the range of compounds that are available for users to search. We have now achieved this through the introduction of three new collections of ChEMBL compounds. These collections provide leadlike compounds for drug, agrochemical, flavor and fragrance discovery that are suitable for the evaluation of Blaze in these areas. The collections are open to all registered Blaze demo users whether accessed through a web page, KNIME or using Forge, Torch, or TorchLite.

Creating collections of molecules for searching in Blaze

Blaze is a full virtual screening system that is integrated to queuing systems like SGE for database population and searching and hence creating a new collection to be searched is easy. Blaze takes care of the difficult part – splitting uploads into different sizes, identifying and linking duplicates, exploding unspecified chirality and populating the conformations of new molecules. This creates a new collection for searching. All that is required is to tell Blaze about the collection and then to upload an SDF file to the server. Choosing what to upload is more difficult. On our main Blaze server that we use for our consulting projects we have 10,000,000 molecules arranged in collections from compound suppliers. In the demo server it is not possible to use such large numbers of compounds. Until now we have had only a few thousand compounds. Here we expand that to over 400,000 compounds, derived largely from ChEMBL.

Creating the ChEMBL collections

To generate collections with appropriate properties ChEMBL was filtered in KNIME using physico-chemical properties as shown in the table below.

Property Chembl20_filtered
leadlike collection for
drug discovery
Agrochem_chembl
leadlike collection for
agrochemical discovery
Fragrance_like
ChEMBL filtered for
fragrance like molecules
MW 200 – 400 200 – 430 30 – 300
TPSA 40 – 80  N/A < 60
RotBonds 0 – 5  < 5 0 – 4
Aryl rings 0 – 3 N/A N/A
HBD 0 – 3 2 – 3 0 – 1
HBA 0 – 6 2 – 12 0 – 3
SlogP -1 – 4 0.75 – 4.5 > 1
Elements C,N,S,O,F,Cl,Br,I  C,N,S,O,F,Cl C,H,N,O
Total Molecules
available for searching
202,895 136,457 45,383

Additionally for the drug discovery library we removed compounds that we considered to be toxic or undruglike (acyl halides, sulfate esters etc.) and compounds that contain specific functional groups that have regularly appeared as false positives with Blaze (thioethers, hydrazones and imines).

The filtering was performed in KNIME workflows (represented for the drug discovery collection below).

blaze_chembl_filter
The upload is traditionally done using Blaze’s web interface but on this occasion we chose to extend our KNIME protocol to upload the compounds to Blaze using the REST interface. This feature was introduced in Blaze 10.2 and has proved a popular and easy way to keep Blaze in sync with corporate databases. While we are using KNIME here, the protocol would work equally well with Pipeline Pilot. The upload workflow is shown below with the filtering steps reduced to metanodes.

blaze_chembl_upload_blog

Using the new collections

The new collections are available to search using the standard Blaze web interface or through the REST interface enabling searching from KNIME and Pipeline pilot as well as Cresset’s desktop applications Forge, Torch, or TorchLite. The applications require configuring (in the preferences) with the address of the Blaze server together with a username and password for access. Once this is done the Run menu → Send to Blaze and right click menu ‘Send to Blaze’ options will open a dialog box for configuration of the Blaze search.

forge_103_sendto_blazetorch_103_sendto_blaze
The advantage of submitting a Blaze search from within the desktop applications is that your current field constraints and the protein excluded volume will get transferred to Blaze and used without extensive interaction or file uploading.

Note that result download is also possible from within the desktop applications. Selecting the File menu → Download Blaze Search Results brings a dialog containing a tree view of Blaze searches. One tip here – it is important to make sure that we select the best results – those from the simplex refinement not the initial search.

forge_103_download_blaze
To try the new Blaze collections for yourself please register for a username and password. If you think that there are other sets that we could usefully include or that we could improve the filters that we have used here then please contact us to discuss your suggestion.

Rapid and simple Blaze database population and searching using KNIME and Forge

Abstract

Blaze1 is Cresset’s ligand-based virtual screening platform. It uses the shape and electrostatic character of known ligands to rapidly search large chemical collections for molecules with similar properties. In this case study, a Blaze database of approximately 200,000 compounds from ChEMBL2 was prepared in a seamless manner using a KNIME3 workflow and standard Blaze database creation routines. The new collection, named ’Chembl20_filtered’, is available from the Blaze Demo Server4. Blaze searches were launched within Forge5 and by means of a KNIME workflow to test the ease of use of both workflows. The output of the searches was finally downloaded into Forge and visually inspected.

Background

Blaze, Cresset’s ligand-based virtual screening platform, uses the shape and electrostatic character of known ligands (as encoded by Cresset’s field technology6) to rapidly search large chemical collections for molecules with similar properties. It is excellent for finding novel leadlike hits from known actives, replacing peptides with non-peptides or steroids with non-steroids.

Using Blaze you can increase the diversity of your project’s lead compounds and jump into new areas of chemical space giving substantial improvements in the properties of your hits. Cresset have run hundreds of projects through Blaze with an excellent track record: hit rates as high as 30% are reported by our customers.

Blaze

Blaze is a full virtual screening system containing the infrastructure to manage compound collections and the associated conformation populations. It automatically records additions and removals from any collection and handles duplication across collections. New compounds are automatically submitted to a queuing system (typically SGE or Platform LSF) for conformer generation on a Linux cluster.

Database searching is configured through a single webpage, REST call or on the command line. Compounds are automatically triaged through a cascade of increasingly accurate search methods. Blaze automatically manages database searches with differing priorities, submitting them to a queuing system of either a GPU or CPU cluster).

Lastly, Blaze contains a full user and project based permissions system to control the visibility of individual and groups of search results.

Blaze V10.2

This most recent version of Blaze includes:

  • A new search algorithm that enables full 3D assessment of molecules at four times the previous speed, enabling the processing of databases of over 10 million compounds.
  • A new RESTful web service providing easy integration with Forge, KNIME and Pipeline Pilot7 and custom software solutions.
  • Simplified security features that are easier to unify with corporate authentication servers, in response to customer requests. This makes user management significantly simpler for large installations.
  • A free demo server, enabling you to test the performance and functionality of Blaze on a small collection of commercially available compounds.

In this case study a Blaze database of approximately 200,000 compounds from ChEMBL (database of bioactive data for drug discovery) was rapidly prepared and uploaded (added) to the Blaze demo server using the new REST API interface.

Method

Filtering

The full ChEMBL 20 data set (containing approximately 1.5 million compounds) was downloaded as an SDF file.
The set was filtered using a KNIME workflow (Figure 1) applying the following physico-chemical cut-offs to select potential leadlike structures to be used as starting points for medicinal chemistry optimization:

  • MW 200-400
  • TPSA 40-80
  • RotBonds 0-5
  • Aryl rings 0-3
  • HBD 0-3
  • HBA 0-6
  • SlogP -1-4.

blaze_chembl_filter
Figure 1. KNIME workflow used to filter the original ChEMBL data set (1.5M compounds).

The data set was further cleaned with the removal of compounds carrying reactive functional groups (e.g. alkyl halides), potentially toxic groups (e.g. azides) or other unwanted chemical moieties (e.g. heavy metals). After filtering, approximately 202,000 compounds remained for uploading to Blaze.

Upload to Blaze

The upload of the new collection could be achieved using the command line or the web interface. However, as all the compounds exist within KNIME we chose to directly upload to the Blaze free demo server using the Blaze REST API (Figure 2).

The creation of the Blaze Chembl20_filtered collection took a few hours on 150 cores using Cresset’s internal Linux cluster.

blaze_chembl_upload_blog
Figure 2. Blaze compound upload protocol.

Using Blaze from Forge/Torch

The introduction of the REST interface has enabled Blaze searching directly from many platforms and scripts including Cresset’s desktop applications Forge and Torch. To work with Blaze the applications require the address of the Blaze server and your username and password in the relevant preference setting (Edit menu -> Preferences -> Blaze panel, Figure 3).

Set up of Forge Torch connection to Blaze
Figure 3. Set up of Forge/Torch connection to Blaze.

The interface enables sending the current molecule, including any field constraints and the current protein excluded volume, to Blaze, configuration of the search options and download of results directly into the application.
To test the new ChEMBL collection and further demonstrate the usefulness of the Blaze REST interface a search was performed using Nevirapine8, one of the first round of HIV NNRTI inhibitors. The search was submitted using Cresset’s Forge and also using a KNIME protocol.

Searching Blaze from Forge

The crystal structure of the Y181C mutant HIV-1 reverse transcriptase in complex with the inhibitor Nevirapine (PDB code 1jlb) was downloaded in Forge (an identical procedure is applied when working with Torch). The workflow is summarized in Figure 4.

Nevirapine was selected as the reference structure and imported into Forge together with the HIV-1 reverse transcriptase protein. Cresset’s rules were used to define the protonation state of Nevirapine and the protein. After visual inspection the reference structure was minimized to improve the bond angles.

To initiate the Blaze search, the reference molecule was selected in the main ‘Molecules’ table then ‘Sent to Blaze’ using the right click menu. The resulting Blaze search configuration menu was used to name the search ‘1jlb’, select the ‘Chembl20_filtered’ collection and accept the default search parameters (Figure 4).

Once complete, the search results were imported into Forge (Torch would work identically) for visual inspection and further analysis.


blaze_search_from_forge
Figure 4. PDB download, selection of reference structure and start of Blaze search in Forge.

Blaze_knime_search
Figure 5. Blaze search protocol.

Blaze Searching from KNIME

A KNIME Blaze search workflow (see Figure 5) was also tested for user friendliness.
The protocol requires the manual setting of a small number of workflow variables (Blaze URL, username and password) and the configuration of three input nodes to:

  • define the name and conditions of the search (Table creator node),
  • load the reference structure as an SDF (using SDF reader node),
  • define the name of the Blaze collection to search (Chembl20_filtered, Table creator node).

Download of results to Forge/Torch

The results of the Blaze search on Chembl20_filtered using Nevirapine as the query were downloaded into Forge (Figure 6).


Download of Blaze results into Forge
Figure 6. Download of Blaze results into Forge.

While a thorough evaluation of the results of the Blaze search is beyond the scope of this case study, a qualitative analysis of the 200 top scoring results shows that Blaze was able to identify some chemically diverse potential hit compounds. As expected a large fraction of the top scoring compounds belong to the same (widely explored) chemical class of Nevirapine: however a few top scoring molecules (see examples in Table 1, Figure 7) are structurally different and are reported in ChEMBL to have been tested for HIV-1 reverse transcriptase activity.

Interesting hits retrieved by the Blaze search on Nevirapine
Table 1. Interesting hits retrieved by the Blaze search on Nevirapine.

Blaze_nevirapine_result
Figure 7. CHEMBL314103 overlaid (grey) with Nevirapine (green).

Conclusion

A Blaze database of approximately 200,000 compounds from ChEMBL was prepared in a seamless manner using a KNIME workflow. Using the Blaze REST interface the dataset could be uploaded to Blaze from within KNIME and was available for searching within a few hours.

To test ease of use of the search workflows available in Forge (Torch) and KNIME, the same search was run on each platform. While both protocols are relatively straightforward the Forge guided interface is definitely simpler to set-up for the end user. The KNIME workflow offers a higher flexibility, however, and allows the integration of Blaze searches into more customized protocols with complex post-processing of results. Using the Torch or Forge viewers within KNIME enables viewing of the 3D alignment of the returned compounds within that platform.

The new Chembl20_filtered collection is available for searching by all users of the Blaze demo server – register for free access by visiting http://blaze.cresset-group.com/blaze/

References and Links

1. http://www.cresset-group.com/products/blaze/
2. https://www.ebi.ac.uk/chembl/
3. KNIME: https://www.knime.org/
4. Blaze free demo server: Register for your username and password at the Blaze demo signup page http://blaze.cresset-group.com/blaze/
5. http://www.cresset-group.com/products/forge/
6. http://www.cresset-group.com/science/field-technology/
7. Pipeline Pilot: http://accelrys.com/products/pipeline-pilot/
8. US5366972 (A) – 5,11-dihydro-6H-dipyrido(3,2-B:2′,3′-E)(1,4)diazepines and their use in the prevention or treatment of HIV infection

Keeping the right chemistry in compound procurement

What happens when virtual screening libraries meet the real world?

Virtual screening libraries can be brilliantly designed to deliver the best range of chemistry for your wet screen, but when the virtual meets the real, things get more complicated. Including computational chemistry in the procurement process means that you can move from virtual screening to plated compounds and still keep the best of both worlds.

Read the full article in Outsourced Pharma.

Martin
Dr Martin Slater
Director of Consulting Services

The course of true research never did run smooth

A three way collaboration between Cresset, the University of Newcastle and Sygnature Discovery

Research and development is not an easy business. Real research invariably involves starts and stops, misassumptions, poor data and ill-defined targets. The resulting path is often convoluted, peppered with roadblocks and pitfalls and at best is a fairly untidy process.

This is an account of a real project, warts and all and is not untypical of projects conducted either in academia or the pharma industry. Ultimately, progress was made and knowledge was gained as new data from the biology, chemistry and modeling were carefully applied on the way.

In this case, despite the difficulties described below, the project enjoyed a relatively rapid progression from a patent bust to a novel and selective series with many of the problems associated with the target ultimately solved. The project is on track and Cresset’s modeling has provided unique and valuable insights and will continue to add value as it is applied moving forwards.

The project

From 2012 to 2014 Cresset consulting services contributed computational modeling effort to a three-way collaboration with the University of Newcastle and Sygnature Discovery on an MRC funded osteoarthritis project. Strathclyde1 and Newcastle2 Universities conducted vital experiments that identified the protein matriptase as a key mediator in disease related pathways leading to collagen degradation in joints.

This protein target3, a member of the Serine protease family, had been crystalized with peptidomimetic inhibitors adapted from a urokinase inhibitor series by Steinmetzer4. These contained the well-known benzamidine P1 pocket interacting groups which are critical for potency in many of this class of protease.

The aim of the modeling exercise was to provide Newcastle University with a new lead series of inhibitors of matriptase, with both an IP position and importantly a more favourable profile for delivering in-vivo activity.

Method

Read the details in the full write-up.

In brief, the XED force field was used to model the binding hypothesis for a potent ligand. This large reference compound was split into three appropriately sized, and more lead-like molecules as shown below, and used as the input for a Blaze virtual screen in order to find alternative scaffolds.

Fragmentation strategy for Blaze search molecules
Figure 1. Fragmentation strategy for Blaze search molecules.

Unfortunately, only a small proportion of the virtual screening output was ultimately purchased and screened and of a 142 molecules only a single example was found (Compound 0154) which had appreciable activity at matriptase – albeit relatively weak (IC50=30 μM).

A parallel patent busting strategy

At this time the chemistry partner had put in place parallel efforts on a patent busting strategy from a recently reported synthetic inhibitor. The disadvantage of this approach was that the patent bust was from a symmetric tribasic molecule and so optimisation, particularly of the ADMET profile, would be problematic. Furthermore, the binding hypothesis modeling was made more complex by the inherent ambiguity provided by the symmetrical starting ligands.

Read more in the full write-up.

This parallel strategy provided some comparable actives to Compound 0154 but via a far more expedient chemistry and with potential to design-in an IP position.

Final state of play

The project enjoyed a relatively rapid progression from a patent bust to a novel and selective series with many of the problems associated with the target ultimately solved. One exception was the main liability issue of ‘membrane penetration’ which was present from the outset and remains to be resolved. But, with one potent series well established, and a second series from the virtual screen with potential to be explored, the project is on track to provide what was promised.

Cresset’s modeling provided unique and valuable insights to the project and will continue to add value as it is applied moving forwards towards the development of series 1 and 2.

Acknowledgements

Sygnature (chemistry): L. Duffy, P. Meghani.
University of Newcastle (biology): Prof. Drew Rowan, W. Hui, D. J. Wilkinson, A. Destrument, S. Watson.
Cresset (modeling): A. Vinter.

References,

  1. Ferral, W. R. et al, Protease-activated receptor-2: a novel pathogenic pathway in a murine model of osteoarthritis, J. Arthritis & Rheumatism, 2010 (http://strathprints.strath.ac.uk/20137/).
  2. Rowan, A. D. et al, Matriptase Is a Novel Initiator of Cartilage Matrix Degradation in Osteoarthritis, Arthritis & Rheumatism, 2010, Vol. 62, No. 7, pp 1955–1966.
  3. http://merops.sanger.ac.uk/; search: ID = S01.302: Name = matriptase.
  4. Steinmetzer, T. et al, Secondary Amides of Sulfonylated 3-Amidinophenylalanine. New Potent and Selective Inhibitors of Matriptase, J. Med. Chem. 2006, 49, 4116-4126.