Rapid interpretation of patent SAR using Forge

Biological data is now a regular feature of new patent applications and this is readily available for download from Bindingdb which has data on over 2,500 patents encompassing more than 300,000 binding measurements. Generating meaningful insights to this data is perceived as less straightforward. In this post I will use Forge™ V10.6 to demonstrate that it is possible to get an overview of the SAR from a single patent entry with minimal human intervention and time.

Application to PIM-1

Selection and processing of 288 compounds from US9321756, ‘Azole compounds as PIM inhibitors’ (detailed in Appendix I) gave the Activity Atlas™ model shown in Figure 1. The total time to generate and interpret this model was around 30 minutes. It would be relatively straightforward to automate the process.

Figure 1: Activity Atlas model generated in this case study. From data download to model took 30 minutes. The ‘Activity Cliff Summary of Electrostatics’ and ‘Activity Cliff Summary of Shape’ views are shown. These detail regions of acute SAR – Red / Blue = positive / negative electrostatics preferred for greater activity; Green / Pink = activity favors /disfavors atoms in this region.

SAR interpretation

Firstly, the oxadiazole is clearly required as demonstrated in Figure 2 by region of negative (blue) next to both nitrogen atoms and representing the interaction of this group with the side chain of Lys67. Perhaps this is not surprising given the title of the patent application. The model also shows that the amino group next to the oxadiazole is constrained (area of pink surface).

Figure 2: Activity Atlas model close to the oxadizole group. Red = positive electrostatics preferred; Blue = negative electrostatics preferred; Green = Atoms in this region favored; Pink = Atoms in this region disfavored.

On initial inspection there appears to be space in the protein to accommodate a substituent on the nitrogen. However, by viewing the aligned ligands in the context of the protein and showing contacts in Forge, Figure 3 shows it is clear that all N-substituted ligands clash with Asp186 and that the adjacent space is not accessible from this position in the ligand.

Figure 3: Clash of a ligand with a morpholino substituent to Asp186 (orange lines).

The model (Figure 4) shows that there is a clear preference for molecules that extend into the gap between the two arms of the ligand (green surface at the bottom of the model above). Whilst we would want to check the underlying data, the suggestion is that substitution on either R-group is tolerated. Indeed, the most active compound crosses this gap completely which raises the possibility of using a cyclized ligand.

Figure 4: A high active from the patent displayed in CPK. The N-trifluoroethyl group touches the cyclopropyl substituent on the opposite side of the molecule.

Surrounding the green (favorable volume) region between the two arms is large area of red surface. This suggests that positive electrostatics – edges of aromatics or H-bond donors etc. – is preferred in this region.
This summary is reinforced by looking at the individual compounds that make up the data, thankfully this is easy to do with the Activity Miner module of Forge. Using Activity Miner’s top pairs table (Figure 5) there are many pairs of molecules where introduction of a positive charge in the region below (as shown in the pictures) the ligand generates a more active molecule. Generally the difference is around 1 unit better activity for the charged species.

Figure 5: The top pairs table in the Activity Miner module of Forge showing a specific pair of molecules and the electrostatic difference map between them. Red regions indicate where that ligand in more positive than the comparator; Blue where that ligand is more negative. In this case the ligand on the left is over a log unit more active and contains a positive charge in the region at the bottom of the picture.

Looking at the protein structure does not reveal a specific interaction or reason for this gain in potency. However, by using the protein field surface in Flare, we can see that the protein is generating a negative potential in this region which would account for the gain in activity when introducing a positive charge.

Figure 6: The protein interaction potential contoured at 2kcal/mol, Red = positive; Blue = negative. The potential indicates the nature of atoms that to use in a region, positive atoms fit well in negative regions etc.

Lastly, in the region of the pyrimidine group the model has a large area of blue. This indicates that there is a clear preference for molecules with nitrogen atoms in the ring at these points (e.g., pyrazine). This area points towards solvent and hence this is quite surprising. From the crystal structure alone it would be expected that introduction of heteroatoms would have little effect on activity. Examination of the data using Activity Miner confirms that, for example, pyrazine is more active than pyridine. In this case the protein fields do not reveal anything significant in the underlying potential of the protein and we are left to speculate at the reason for the SAR.

Figure 7: PDB 4TY1 showing the region around the pyrimidine group of the ligand. There are few interactions between the protein and the edge of the ligand in this region.

Speculating that protein movement was at the root of the observed SAR, I downloaded into Flare all the PIM-1 structures from the PDB, sequence aligned them and superposed based on the sequence alignment. Looking at this region across the 150+ structures show no clear case for protein flexibility although a number of structures do have a water molecule in this region that would bridge the ligand to the side chain of Arg122.

Figure 8: Over 150 PIM-1 crystal structures superposed in Flare. The backbone is shown in tube, residues close to the depicted ligand of structure 4TY1 are shown in thin sticks. Only two structures have any variation in loop conformation in this region.

The reason for the observed SAR remains elusive and could be a function of protein-protein interaction, water mediated interaction or something else.


Rapid interpretation of Bindingdb patent data can be achieved using Forge. In this case the SAR of 288 ligands was condensed to a single Activity Atlas model in less than 30 minutes. Interpretation of the model over the next 30 minutes generated clear SAR insights that could be employed on competing projects. Inspecting the protein electrostatics using Flare provided further insights into the observed SAR.

Try Forge on your project

Request a free evaluation of Forge to try this on your data or condense a patent into a simple summary of the published SAR.

See all licensing options for Forge.

Appendix I

Background computational details

The raw data was downloaded in tab separated format from Bindingdb and pre-processed in Excel. The raw data contains data for two biological targets – ‘PIM’ and ‘PIM-1’. Compounds with ‘PIM-1’ data were selected and checked for duplicate values. One compound was excluded because of a large variation in the reported IC50 value and four molecules were excluded due to missing activity values. All other duplicate IC50 values were averaged and converted to a pIC50 value resulting in a dataset of 288 molecules in a csv file.

The original dataset included the ligands of PDB codes 4TY1 and 4WT6. The protein-ligand complexes were downloaded into Flare, sequence aligned and superposed. Looking at the binding site, either ligand would work well as a reference for initial alignment of the dataset. The ligand from 4WT6 was chosen for further experiments and both ligand and corresponding protein transferred to Forge (Copy-Paste). The csv file was loaded into Forge (Training Set) and the molecules processed using Accurate but Slow conformation hunting, Substructure alignment and an Activity Atlas model built.

The Forge processing window showing the options used in this case study.

Using the Cresset Engine Broker, the calculation took 15 minutes to complete. Examining the results shows excellent alignment through the common substructure but some variation beyond that.

288 aligned ligands from US9321756 that were used to prepare the Activity Atlas model.


About Activity Atlas

Activity Atlas models are created by comparing all pairs of molecules in terms of positive and negative electrostatics plus the hydrophobics and shape properties and then combining these together, weighted by the change in activity for the pair. The result is a simple, qualitative picture of the critical points in the SAR landscape.

The resulting Activity Atlas model was automatically displayed. I always start with the ‘Activity Cliff Summary of Electrostatics’ and ‘Activity Cliff Summary of Shape’ views to understand the data. As this was a quick experiment and the alignments were noisier than a fully curated experiment, the Activity Atlas model is also noisier than ideal. However, by increasing the Confidence Level to 3.0 concentrates on the clear signals in the data.

The display options used for the Activity Atlas models shown in this study.

Model validation

Activity Atlas is a qualitative technique and hence difficult to validate except through manual inspection. However, Forge is capable of building quantitative models that can be used to validate the alignment of the molecules (we believe that consistent alignment is the single biggest factor in generating reliable 3D QSAR models). Using the Automatic regression model building methods of Forge with a 20% activity stratified test set generated an SVM model with q2 0.64 (LOO) and an r2 on the independent Test set of 0.62. Given the noisy nature of the input data I believe this represents a good model and that the alignments are valid.

Designing new molecules in a web browser

Last year we discussed our research aimed at re-imagining molecule design to bring the best of 2D and 3D technologies together in a collaborative environment. The project, code-named TorchWeb, has progressed significantly and is now on the count down to a beta release, expected in the early summer of this year, with an initial release to follow in the autumn.

The web-based interface contains plugin windows with key information. Here, in addition to the ‘Editor’ and ‘3D viewer’ plugins I have: the ‘Designs’ plugin that shows all the molecules that I am currently working on; the ‘LogP’ plugin giving an atomistic breakdown of the calculated logP; the ‘Properties’ plugin showing mutiple physico-chemical properties; and the ‘Similarity search’ plugin that shows similar molecules from a chosen database.

The two central concepts of this new product remain unchanged:

  • Firstly, we want to create an environment where medicinal chemists can draw molecules in a 2D editor, have these automatically converted into a 3D model of how the new molecule would interact with their target, or compare to the molecules that they have made before. To do this we created a new algorithm to grow molecules in 3D which is applied to every change in the 2D molecule.

The 3D pose of the molecule is updated interactively as the molecule is sketched in the 2D window enabling immediate assessment of the potential interactions that could be made by the new molecule. All other plugins also update, giving live similarity searches and logP predictions.

  • Secondly, we recognize that medicinal chemistry designers often work in teams across multiple locations and time zones. Consequently, collaboration had to be central to the application. This has been achieved through session sharing – enabling multiple users to share and simultaneously interact with a design and work on them together.

Joining a shared session enables users to collaborate live on any design, updating the 3D pose and chemical properties as the molecule evolves.

We are working steadily to convert our initial prototype into a full product. Collaboration with selected customers has enabled us to capture detailed requirements and to transition the code base into a robust, secure environment suitable for on-premise installation or deployment in the cloud.

Features have not been ignored! The design application will be joined by a data analysis application that will combine eye catching plots and graphs with 3D protein active site analysis.

Once designed a molecule still has to be made. Here we have embarked on a partnership with Elixir Software’s chemTraX. Together we will provide real time understanding on the status of every molecule from design to analysis, so you only make the molecules that you need to reach your goal.

Want to be among the first to learn more?

Subscribe to our newsletter.

Forge Design: New name, familiar environment

Forge Design is a new licensing level of Forge™ for medicinal and synthetic chemists. It replaces Torch™ and benefits from the familiar GUI, but with V10.6 enhancements.

What can Torch users expect from Forge Design?

The new graphics engine generates enhanced 3D objects, thus delivering strong performance, improved pictures and new smooth transitions between storyboard scenes (Figure 1).

Figure 1: The new graphics engine in Forge Design generates great pictures in an enhanced GUI providing strong performance, faster calculations, improved 2D display of molecules and smooth transitions between storyboard scenes.

For larger projects the GUI will be more responsive, with improved performance on common operations such as application of filters, calculation and interaction with custom plots, exporting data. Activity Miner users will experience faster calculations which are less memory intensive.

The 2D display of molecules has been improved to make it clearer and more appealing.

New functionality

Forge Design has a significant number of new features and improvements compared to Torch V10.5. For example:

  • If you are working on a large project, there is a new function to automatically assign the selected molecules to roles, based on their Murcko scaffold
  • The Filters window includes a green/red toggle to control whether each filter is enabled or disabled, and pre-defined structural filters (for example, for groups like Ring, Aromatic Ring, H-bond donor and H-bond acceptor)
  • There is a new function to show the chosen field surfaces as a difference between the two molecules in the 3D display.

Figure 2: New functions in Forge Design include an option to automatically assign molecule to roles based on their Murcko scaffold, improved filters and the new ‘Field Difference’ button to show the chosen field surfaces as a difference between the two molecules in the 3D display. In this case the mono-fluoro derivative on the right is more positive (red) where the F is changed to H but also has a more negative aromatic ring.

  • Improved Molecule Editor, enabling you to conformation hunt and align all the molecules created during the same editing session as you exit the editor
  • Improved Blaze™ results window, which now shows an enrichment plot and statistics for each Blaze refinement level
  • The ideal radial plot profile for your project and the custom settings used for the conformation hunt and alignment can be shared with your team using the new import/export functions
  • Improved display of bonds in the 3D window, greater readability of constraints labels, more intuitive display of interactions between the reference molecules and the protein
  • New function to clip only the display of the protein leaving the ligands untouched
  • New functions to choose column content as the molecule title, or as a label in the 3D window
  • Improved plots now showing a regression line for selected molecules
  • For Activity Miner users, the Disparity matrix can now be filtered by Similarity, Disparity and Δ Activity; there is also a new ‘Find Molecule’ function. Also, you can now tag all the molecules visible in the Activity View in the Forge Molecules table.

Figure 3: In the Forge 10.6 Activity Miner window, the disparity matrix can be filtered by Similarity, Disparity and Δ Activity; molecules that do not pass the filter(s) are shown in gray.

How does Forge Design compare to Torch and Forge?

Figure 4 below shows the modules which are common between Forge and Forge Design (red), and the optional modules in Forge only (blue).

Figure 4: Modules available in Forge only (blue), and modules available in Forge and Forge Design (red).

Forge Design uses wizards for common operations just as with the full Forge package.  However, the wizards for building activity models and pharmacophores will be greyed out, as these are optional in Forge Design. The processing window will look slightly different (Figure 5 – right), with the optional Build Model section greyed out.

Figure 5: The wizard and functions available only in full Forge are greyed out in Forge Design.

You will find new, pre-defined roles in the Molecules table for training, test and prediction sets. In Forge Design, these roles have no special meaning and you can use them as any other user-created role or ignore them completely.

Simplicity and integration

Having a streamlined platform for ligand-based software makes it easier for existing Forge and Torch users to upgrade their installation to the newer release, with a single installer for both Forge and Forge Design.

This solution also makes it easier to upgrade Forge Design with additional functionality (for example, Activity Miner or the model building package), if desired.

This is a further step towards the integration of all Cresset ligand-based and structure-based functionality and simplifies the product installation and distribution for most customers.

Try Forge Design

If you are an existing Torch user, we will be in contact soon with more information on Forge Design. If you don’t have Forge or Forge Design, contact us to learn more.

Sneak peek: PickR to select electrostatically diverse monomers for libraries

In a presentation at the Cresset User Group Meeting in 2016, Nik Stiefl and Finton Sirockin from Novartis discussed the selection of building blocks for DNA encoded libraries using electrostatic and shape diversity as the key descriptor. This work was powered by a custom binary and scripts written by Cresset. Over the last couple of years this approach has been applied to a wider range of library designs and has gained a reputation as a method of choice for many library designs.

PickR will be a new tool that formalizes the approaches that we developed in collaboration with Novartis. It is a command line binary that provides a diverse pick of reagents to be incorporated into a library. Unlike other approaches, PickR uses the electrostatic and shape properties of molecules to generate the descriptor matrix that is used as the basis for the diversity pick.

Generating diversity using a 3D property is not straight forward. It is necessary to explore conformations of R-groups and rotate them about the proposed connection point in order to fully understand the distribution of properties. Much of this was well described in Nik and Finton’s presentation (slides 12-15). I will leave a formal discussion to the final release announcement.

Application of PickR to amino acid side chains

I applied a dataset of approximately 1,000 amino acids that can be purchased from eMolecules. The raw reagents were processed to convert the side chains into R-groups with the C-alpha atom being converted to Iodine (all other Iodine containing reagents were excluded as were those containing Br and those with side chains >150Da). Using PickR, I generated a 3D similarity matrix for the side chains, aligning on the C-alpha to C-beta bond. 100 clusters were requested initially.


3D representations of all 100 clusters generated from amino acid side chains, aligned to each other using the I-C bond of the fragments.

Looking at the results, there are some very nice relationships. For example, in Cluster 2, together with tyrosine, are other phenolic side chains but also an indazole that contains the donor-acceptor motif.


2D representations of all the side chains in the same cluster as tyrosine (highlighted).

Along with the indole of trytophan are other substituted indoles, pyropyridines and benzofuran. In with the isobutyl side chain of leucine are a number of cyclic analogues which I expect would cause issues with many 2D similarity methods. Interestingly, indoline is placed together with the equivalent of homo-phenylalanine.


2D representations of the leucine related cluster.


2D representations of the homo-phenylalanine related side chains


The cluster containing the phenylalanine side chain highlights the major difference of PickR over other methods – R-groups are clustered on 3D electrostatic properties. Hence, together with the phenylalanine side chain you have thiophenes and pyroles but few other aromatics – pyridine and pyrimidines go to their own clusters because using electrostatics they are quite different to a plain phenyl ring.


3D and 2D pictures of all side chains in the phenylalanine cluster


Request project file and find out more

Contact me to receive the full results in a Forge project file.

Contact your account manager if you are interested in learning more about PickR.

Using Python in Flare to find common contacts

In a recent blog post Pat Walters nicely used the structures of Viagra and Cialis when bound to PDE5 to argue that scaffold hopping between these two drugs was not a task that could be performed easily. He used Python to demonstrate that each drug interacted with siginificantly different parts of the protein and that they only shared interactions with 4 residues. Inspired by this, I sought (with the help of Paolo Tosco) to implement Pat’s code in Flare.

Paolo has been working on the implementation of a Jupyter notebook within Python (see his post here) and this provides the ideal environment to implement and discuss code to explore the common and specific interactions of Cialis and Viagra with PDE5. The notebook contents are shown in the iframe below.

As you see the output (last line) is the same as originally reported. However, with the addition of the Flare interface we are able to create a nice visual representation of the results, rendering the common and ligand specific residues differently. The script takes around a 30 seconds to run:

If you would like to learn more about Flare and using Python to customize, script or automate common actions or you would like to try the code out for yourself then please contact us. The current range of Python extensions for Flare are avaiable from our GitLab repository.

A sneak peek into Flare V2: A major advancement for structure-based design with Flare

Flare V2 is in the final rounds of testing, which means the release announcement is imminent. Ahead of the user group meeting, where we will be presenting this major advancement, this post takes a sneak peek at some of the new features in this version.

New coloring options

Completely rewritten surface generation code results in faster and better surfaces with quality options built in to the surface creation dialog. This is combined with new coloring options for new surfaces to give you more insights into your proteins and ligands.

Figure 1: (a) New surface coloring options in Flare V2, and (b) PDB code 4MBS with a hydrophobic surface colored yellow (hydrophobic) to blue (hydrophilic).

Improved Z-clipping

Making pictures is key to communicating your insights on protein-ligand binding. Flare V2 has major improvements to the Z-clipping to enable you to get the view that you want. In addition, to apply a specific clipping plane to an individual surface, you now have the option to exclude ligands from the clip altogether. This option makes a significant impact on pictures of binding sites that are completely buried.

Figure 2: PDB 1IKW showing the ability to selectively clip proteins. (a) Ligand clipping often makes it difficult to get the picture you want whereas (b) disabled ligand clipping in Flare V2 gives you more options to communicate key insights.

Figure 3: Flare V2 gives the option to clip individual surfaces independently of other objects. Here a clipping plane is added only to the electrostatic surface enabling the visualization of protein residues that are above the ligand in combination with a surface.

Other features contribute to a major advancement for Flare

The new protein surfaces are complemented by new options for ligand surfaces, the new storyboard panel to capture and replay key 3D insights and many new features for ligands. Taken together with the Python API this release of Flare is a major advancement in this innovative new application for structure-based design.

Find out more and get hands-on

Register for the up-coming user group meeting to find out more about Flare V2, network with existing users and receive free training at one of the hands-on workshops.

Request an evaluation.

In the Cresset lab: Molecular design re-imagined

Molecule design is a central task in drug discovery. It is both personal and collaborative, easy to do in 2D (on the fume hood or using a drawing application) but more productive when combined with the 3D environment of the chosen target. We have been thinking about how best to provide you with a molecule design application that satisfies all these requirements. Whilst Torch has many favorable attributes that make it a popular choice, it cannot satisfy the two key features that you request of it – to be able to sketch in 2D in a popular drawing application, and to work in a collaborative environment. Project TorchWeb is now underway to satisfy these requests. This project will deliver our next generation of molecule design application and, as the name suggests, will be entirely web based.

We have recently completed a proof of concept application, part funded by a UK government grant, and are delighted with the two key technologies it has at its heart – the ability to work exclusively in 2D and yet have the 3D context of your design immediately available, and the option to share your work ‘live’ with one or many collaborators.

TorchWeb reimagines the molecular design process within a web browser. Using 2D and 3D representations of the molecule together with Cresset’s electrostatic descriptors gives an detailed view of your new design.

Thinking in 2D and 3D simultaneously

As chemists we are taught to synthesize molecules using 2D representations. This makes our life simplier, condensing complex situations down to a 2D language that we use to think about and communicate our ideas. However, molecules are 3D and exert their effects in 3D. Why do we not link these two using the computer to extrapolate our ideas from 2D to 3D automatically? This is the philosophy behind design in TorchWeb.

Changes in the 2D window are automatically interpreted into 3D.

Collaboration as standard

Modern drug discovery teams are often geographically diverse, both within a single company and across multiple organisations. In TorchWeb we have developed a collaboration layer that enables you to invite other users to share your current design environment. We aim to remove artificial barriers to working with others in your team, catalyzing new ideas, stimulating new thoughts and focusing on current challenges.

Sharing a session gives a collaboration platform where ideas can flow between users without restrictions.

Improved decision making at every stage

Once released we hope that TorchWeb will enable medicinal chemists to make the best, most informed decision at every stage of the discovery process. In part this is due to the core Cresset technologies but also because of the web based nature of the application. Central to the design of TorchWeb is the ability to extend the application through plugins or custom windows that can provide real time feedback on new designs at the moment of conception. This will allow you to bring all of the insights available from your existing QSAR and QSPR methods into the design process.

Alpha and beta testing coming soon

We are actively developing TorchWeb and are aiming for launch in 2019. As with our desktop applications, we are keen to gain feedback from our customers to ensure you receive a product that works for you. Whilst it is not ready for testing just yet, we will announce alpha and beta test programs in the coming months. In the meantime, if you would like to know more about this exciting new application then please do not hesitate to get in touch.

Hear about TorchWeb in more detail

See a demonstration and hear more about the core algorithms behind TorchWeb – register for The Cresset User Group Meeting on June 21 – 22, 2018.

Flare: Accessible structure-based design

Modern structure-based design encompasses hundreds of methods, advanced algorithms and diverse biological targets. Cresset has a long-standing reputation for easy to use applications in the ligand-based design sphere. In deciding to bring Flare™, a new structure-based design application, to the market we created a challenge – can structure-based design be made simple?

A balance of usability and flexibility

Usability and flexibility are such fundamental features of well-designed software that they are often only noticed when they are absent. Usability reduces frustration, reduces training overhead, and makes it easier to access the full potential of the software. However, users also want the flexibility to tweak experiments, perform complex workflows and customise their applications to work the way they do. These key ‘unspoken’ features have been part of the design process in building Flare from the very start.

Focus on design

Critical to any analysis is the use of your conclusions to change the future. In structure-based design this means using information on how ligands bind to proteins to influence the design of the next ligand. In Flare we have put ligands at the heart of the application. They are stored in their own table and have a dedicated tab menu. The table layout enables physico-chemical property data to be stored alongside each ligand or calculated for every new design. It enables you to organize your compounds (sorting ligands based on their properties) or split ligands into different groups (roles) so that you can break down larger datasets into manageable chunks.

Designing new ligands is easy using the simple ‘Edit a copy’ feature. This brings up the molecular editor where the ligand can be improved in the protein active site and in reference to other ligands. The combination makes it easy to design in 3D, gaining all the productivity benefits that this brings. Sequential edits give an iterative process where each new design can be analysed and used as the basis for the next design.

Accessible methods

Analysis of existing or newly designed ligands requires complex methods. From docking to a detailed energetic prediction of binding, structure-based design methods are all complex. The challenge here is to present complex methods in an accessible way that enables expert users to modify key parameters but is not daunting to the regular user. In Flare a standardised layout is used for all calculation dialogues and provide default settings that work well in most cases. Experts have the choice to take the defaults or progress to the Advanced Options to change the parameters to meet their needs.

Great pictures

Creating great pictures is central to structure-based design. It seems like no J. Med. Chem. Article is complete without at least one protein-ligand picture. Creating pictures in Flare is easy with control over every aspect of the 3D display straight from the Home tab menu. Add to these the ability to control the clipping planes of surfaces independently of atoms and bonds and control over the picture resolution and you have all the elements you need to make stunning pictures. Whether for internal presentations, print articles or large posters, Flare delivers the quality of picture that you need.

Free evaluation of Flare

The feedback that we have received on the usability of Flare has been very positive. In the next release, you will see even more usability features such as storyboards, improved ligand selection and enhancements to the drag and drop features. We want Flare to be the best structure-based design application you use, so share your experience with us.

Request a free evaluation of Flare to see for yourself just how accessible structure-based design is for computational, medicinal and synthetic chemists.

New Spark reagent databases: eMolecules’ Tiers 1, 2, and 3

Each month we release updated Spark databases derived from eMolecules’ building blocks. These have proved very popular with our customers. This month a small change is being made to the databases in that we now only include reagents that are in eMolecules’ Tiers 1, 2, and 3. These correspond to the most accessible reagents and should be a good source of inspiration for R-group design experiments in Spark.

Why the change?

The number of reagents that are now listed as available has grown significantly. In the last couple of months we have been processing around 650,000 reagents but this month that number is close to 1.1 million. Unfortunately the majority of this increase is in eMolecules’ Tier 4 category with availability in the multiple-weeks time frame. We felt that these additional reagents were largely noise in the majority of Spark experiments. As a result we have slimmed the downloads, search times and results by only including Tiers 1, 2 and 3. These still encompass 295,000 reagents and hence provide you with an excellent source of readily available R-group bioisosteres.

If you are interested in the Tier 4 reagents, please contact Cresset support to discuss the options.

Installing the Spark reagent databases is easy using the built in Spark database update facility.

Tversky similarity in field-based virtual screening

In the releases of Blaze V10.3 and Forge V10.5 we introduced new similarity metrics alongside the new capabilities to manually weight the similarity function using pharmacophore constraints. With the introduction of Tanimoto and particularly Tversky measures of similarity, a new range of experiments are available to you that help you tailor the results you get. In this post I will use the Tversky similarity to perform substructure and superstructure type searches using Blaze. These new options are also available in Forge.

Figure 1: Blaze results can be tailored to generate the type of results that interest you, from substructure like to pure chemotype switching or super-structure like.

Similarity in Blaze

Blaze uses the field point patterns of molecules combined with their shape to align and score a ‘database’ of molecules against a ‘reference’ or ‘query’ that is usually a known active. In this context the default Dice similarity has worked well. It returns active molecules that are similar in size to the query, but is not too size-dependent allowing Blaze to find hits that are smaller than the reference. In most cases this is exactly what you want – a ligand the same size or smaller than the reference that maintains most of the potential sites of interaction. The scoring algorithm could be altered to generate more substructure like or more superstructure like results. However, this was complex to set up and sub-optimal in performance. In Blaze V10.3 the new Tversky similarity makes these searches more accessible. A look at the average MW of the first 100 compounds returned using the standard Dice and the new Tversky options highlights the difference:

Table of average MW of first 100 compounds returned using different similarity metrics. Database of 35283 positively charged Chembl compounds with 5-30 heavy atoms on Blaze demo server. Query MW: 319. Database average MW: 318

Dice Tanimoto Tversky, α 0.05 Tversky, α 0.95
314 313 192 363


Substructure searches with Blaze

The Tversky metric has two parameters, α and β. Using the Tversky similarity option in Blaze, and setting α to 0.05 and β 0.95, results in a substructure-like search. In fact, we don’t deal with structures so this actually equates to a ‘sub-field’ search. It returns molecules that contain a field pattern that is contained within the query – i.e. field fragments of the query. This is useful where you have a large known active but want to screen or design a fragment library of smaller molecules that match parts of the query.

Figure 2: Search query and 3 selected results (ranks 3, 5, 11) from a sub-field search using the A2C active from the Fragment hopping with Blaze case study. Each result includes some features of the search query but also omits at least one functional group.

Superstructure searches with Blaze

Setting a Tversky similarity with α at 0.95 and β at 0.05 generates a ‘super-field’ search. That is, molecules that contain a field pattern similar to the query are scored highly whether or not they have additional field points. This is useful for growing hits from a fragment screen or in other situations where you do not want to penalize results for having additional functionality to the query. As hits could contain the query at any position and any orientation, this option works particularly well when combined with field, pharmacophore or excluded volume constraints. For example, using an excluded volume will direct the results towards the available space around the query. Equally, using field constraints or the new pharmacophore constraints will ensure that results contain the interactions that you know to be important.

Figure 3: Search query and 3 selected results (ranks 2, 4, 6) from a super-field search using A2C active from the Fragment hopping with Blaze case study and an expanded database to include larger fragments. Each result contains a similar field pattern to the query plus additional features or functional groups.

Tanimoto similarity in place of Dice

In addition to Tversky, the new versions of Blaze and Forge offer the opportunity to change from the default Dice similarity to Tanimoto. This will make a difference to how the individual elements of the score are combined, resulting in a small change in the order that molecules are returned in a virtual screening experiment, but the two experiments are highly correlated. The effect is somewhat complicated to describe and hence will be explored in a future post.

Figure 4: Plot of rank returned using Tanimoto similarity vs Dice similarity for ~10,600 compounds. The results are highly correlated with r2 0.96.


The new similarity metrics increase the range of experiments that can be easily performed within Blaze. Using the new metrics in Forge enables refinement or enhancement of Blaze results using the same metrics. Sub-field and super-field searches in particular should prove useful for fragment-based discovery.

If you would like to try the Blaze interface, or study the effects of the new similarity metrics, then signup for a Blaze demo server account.

To try Blaze on your datasets or your projects, request a full evaluation.