Rapid interpretation of patent SAR using Forge

Biological data is now a regular feature of new patent applications and this is readily available for download from Bindingdb which has data on over 2,500 patents encompassing more than 300,000 binding measurements. Generating meaningful insights to this data is perceived as less straightforward. In this post I will use Forge™ V10.6 to demonstrate that it is possible to get an overview of the SAR from a single patent entry with minimal human intervention and time.

Application to PIM-1

Selection and processing of 288 compounds from US9321756, ‘Azole compounds as PIM inhibitors’ (detailed in Appendix I) gave the Activity Atlas™ model shown in Figure 1. The total time to generate and interpret this model was around 30 minutes. It would be relatively straightforward to automate the process.

Figure 1: Activity Atlas model generated in this case study. From data download to model took 30 minutes. The ‘Activity Cliff Summary of Electrostatics’ and ‘Activity Cliff Summary of Shape’ views are shown. These detail regions of acute SAR – Red / Blue = positive / negative electrostatics preferred for greater activity; Green / Pink = activity favors /disfavors atoms in this region.

SAR interpretation

Firstly, the oxadiazole is clearly required as demonstrated in Figure 2 by region of negative (blue) next to both nitrogen atoms and representing the interaction of this group with the side chain of Lys67. Perhaps this is not surprising given the title of the patent application. The model also shows that the amino group next to the oxadiazole is constrained (area of pink surface).

Figure 2: Activity Atlas model close to the oxadizole group. Red = positive electrostatics preferred; Blue = negative electrostatics preferred; Green = Atoms in this region favored; Pink = Atoms in this region disfavored.

On initial inspection there appears to be space in the protein to accommodate a substituent on the nitrogen. However, by viewing the aligned ligands in the context of the protein and showing contacts in Forge, Figure 3 shows it is clear that all N-substituted ligands clash with Asp186 and that the adjacent space is not accessible from this position in the ligand.

Figure 3: Clash of a ligand with a morpholino substituent to Asp186 (orange lines).

The model (Figure 4) shows that there is a clear preference for molecules that extend into the gap between the two arms of the ligand (green surface at the bottom of the model above). Whilst we would want to check the underlying data, the suggestion is that substitution on either R-group is tolerated. Indeed, the most active compound crosses this gap completely which raises the possibility of using a cyclized ligand.

Figure 4: A high active from the patent displayed in CPK. The N-trifluoroethyl group touches the cyclopropyl substituent on the opposite side of the molecule.

Surrounding the green (favorable volume) region between the two arms is large area of red surface. This suggests that positive electrostatics – edges of aromatics or H-bond donors etc. – is preferred in this region.
This summary is reinforced by looking at the individual compounds that make up the data, thankfully this is easy to do with the Activity Miner module of Forge. Using Activity Miner’s top pairs table (Figure 5) there are many pairs of molecules where introduction of a positive charge in the region below (as shown in the pictures) the ligand generates a more active molecule. Generally the difference is around 1 unit better activity for the charged species.

Figure 5: The top pairs table in the Activity Miner module of Forge showing a specific pair of molecules and the electrostatic difference map between them. Red regions indicate where that ligand in more positive than the comparator; Blue where that ligand is more negative. In this case the ligand on the left is over a log unit more active and contains a positive charge in the region at the bottom of the picture.

Looking at the protein structure does not reveal a specific interaction or reason for this gain in potency. However, by using the protein field surface in Flare, we can see that the protein is generating a negative potential in this region which would account for the gain in activity when introducing a positive charge.

Figure 6: The protein interaction potential contoured at 2kcal/mol, Red = positive; Blue = negative. The potential indicates the nature of atoms that to use in a region, positive atoms fit well in negative regions etc.

Lastly, in the region of the pyrimidine group the model has a large area of blue. This indicates that there is a clear preference for molecules with nitrogen atoms in the ring at these points (e.g., pyrazine). This area points towards solvent and hence this is quite surprising. From the crystal structure alone it would be expected that introduction of heteroatoms would have little effect on activity. Examination of the data using Activity Miner confirms that, for example, pyrazine is more active than pyridine. In this case the protein fields do not reveal anything significant in the underlying potential of the protein and we are left to speculate at the reason for the SAR.

Figure 7: PDB 4TY1 showing the region around the pyrimidine group of the ligand. There are few interactions between the protein and the edge of the ligand in this region.

Speculating that protein movement was at the root of the observed SAR, I downloaded into Flare all the PIM-1 structures from the PDB, sequence aligned them and superposed based on the sequence alignment. Looking at this region across the 150+ structures show no clear case for protein flexibility although a number of structures do have a water molecule in this region that would bridge the ligand to the side chain of Arg122.

Figure 8: Over 150 PIM-1 crystal structures superposed in Flare. The backbone is shown in tube, residues close to the depicted ligand of structure 4TY1 are shown in thin sticks. Only two structures have any variation in loop conformation in this region.

The reason for the observed SAR remains elusive and could be a function of protein-protein interaction, water mediated interaction or something else.


Rapid interpretation of Bindingdb patent data can be achieved using Forge. In this case the SAR of 288 ligands was condensed to a single Activity Atlas model in less than 30 minutes. Interpretation of the model over the next 30 minutes generated clear SAR insights that could be employed on competing projects. Inspecting the protein electrostatics using Flare provided further insights into the observed SAR.

Try Forge on your project

Request a free evaluation of Forge to try this on your data or condense a patent into a simple summary of the published SAR.

See all licensing options for Forge.

Appendix I

Background computational details

The raw data was downloaded in tab separated format from Bindingdb and pre-processed in Excel. The raw data contains data for two biological targets – ‘PIM’ and ‘PIM-1’. Compounds with ‘PIM-1’ data were selected and checked for duplicate values. One compound was excluded because of a large variation in the reported IC50 value and four molecules were excluded due to missing activity values. All other duplicate IC50 values were averaged and converted to a pIC50 value resulting in a dataset of 288 molecules in a csv file.

The original dataset included the ligands of PDB codes 4TY1 and 4WT6. The protein-ligand complexes were downloaded into Flare, sequence aligned and superposed. Looking at the binding site, either ligand would work well as a reference for initial alignment of the dataset. The ligand from 4WT6 was chosen for further experiments and both ligand and corresponding protein transferred to Forge (Copy-Paste). The csv file was loaded into Forge (Training Set) and the molecules processed using Accurate but Slow conformation hunting, Substructure alignment and an Activity Atlas model built.

The Forge processing window showing the options used in this case study.

Using the Cresset Engine Broker, the calculation took 15 minutes to complete. Examining the results shows excellent alignment through the common substructure but some variation beyond that.

288 aligned ligands from US9321756 that were used to prepare the Activity Atlas model.


About Activity Atlas

Activity Atlas models are created by comparing all pairs of molecules in terms of positive and negative electrostatics plus the hydrophobics and shape properties and then combining these together, weighted by the change in activity for the pair. The result is a simple, qualitative picture of the critical points in the SAR landscape.

The resulting Activity Atlas model was automatically displayed. I always start with the ‘Activity Cliff Summary of Electrostatics’ and ‘Activity Cliff Summary of Shape’ views to understand the data. As this was a quick experiment and the alignments were noisier than a fully curated experiment, the Activity Atlas model is also noisier than ideal. However, by increasing the Confidence Level to 3.0 concentrates on the clear signals in the data.

The display options used for the Activity Atlas models shown in this study.

Model validation

Activity Atlas is a qualitative technique and hence difficult to validate except through manual inspection. However, Forge is capable of building quantitative models that can be used to validate the alignment of the molecules (we believe that consistent alignment is the single biggest factor in generating reliable 3D QSAR models). Using the Automatic regression model building methods of Forge with a 20% activity stratified test set generated an SVM model with q2 0.64 (LOO) and an r2 on the independent Test set of 0.62. Given the noisy nature of the input data I believe this represents a good model and that the alignments are valid.

How do I turn my biology insight into a novel therapeutic?

The question that many of the people talking to Cresset Discovery Services ask is “I have discovered some interesting biology – how do I turn it into a drug?”. If this is you, then read on.

Let’s look at some of the ways that we can use in silico technology to translate your ideas into chemical tools in the first instance. Optimizing the drug-like properties of these early hit compounds will put you on the road to defining a lead series and onwards to nominating a drug candidate for clinical studies.

Finding a chemical starting point

The first step in the drug discovery journey is to find a chemical starting point, a molecule that binds to your target and either blocks or enhances its activity. Your overall aim may be to block a key point in a signalling pathway, or divert it to an alternative route.

Alternatively, you may want to prevent levels of a particular cytokine from building up or block the activity of an enzyme, a receptor or an ion channel. Don’t overlook the fact that you may already have found chemical tools that modulate your biological target – what are its physiological activators, are there any literature compounds or known drugs we can use to get you started?  What is known about the protein target? Do you have structures of your system, or are there examples of close relatives on the Protein Data Bank? Can we learn from off-target effects of other compounds or scaffold-hop from one chemical series to another to create new intellectual property.

Your next milestone faster and more cost effectively

Using computational chemistry effectively can both considerably speed up hit finding and reduce the costs of running large physical screens. The best approach to take will depend on the type of bioassay that you intend to use. For example, you can test many more compounds in a plate-based enzyme assay than you can screen in a complex phenotypic system so you need to decide whether you want to test thousands of compounds or would prefer to look at a smaller bespoke set of compounds.

In both cases, initial steps will focus on defining the bioactive conformation of one or more compounds that bind to your protein. We use these as pharmacophores to search for similar compounds that will have the same biological effect. By applying the XED force field we take a unique view of molecules which means that we can compare different chemical classes directly. This is a better reflection of how proteins ‘see’ other molecules and explains how drugs bind to sites that evolved to interact with peptides, DNA or other natural products. In summary, we are looking at the wider electrostatic and shape properties of molecules that extend beyond their atomic skeletons and are responsible for their biological properties.

Efficient virtual screening for diverse new structures

One popular option for kick starting the search for a hit compound is to use Cresset Discovery Services to run virtual screens. By using Blaze (an effective ligand-based virtual screening platform optimized to return diverse new structures) or Flare (which provides fresh insights into structure-based design) we can search a library of 20 million commercially available compounds. The output will be a list of compounds ranked by the similarity of their field point patterns to your pharmacophore. We can filter the list to prioritize compounds that have the appropriate physicochemical properties based on the nature of your target. Most importantly, this will give you a shopping list of compounds that you can purchase and test before you build significant synthetic chemistry resources in your team.


The same approach allows you to move from one chemical series to another – particularly useful when you are looking for a backup compound to fill a hole in your portfolio or overcome a deficit in your existing series.

Building bespoke libraries

You can purchase millions of compounds from chemical vendors, however, these still only represent a fraction of all possible compounds – even if we only consider molecules that are small enough to be used as drugs. Substances produced by organisms have evolved to form interactions which are not always available to off-the-shelf chemicals. Sometimes it is better to design your own library so that you can build in features that explore different regions of chemical space or mimic the properties of natural products. We can help you design libraries that move you into new areas of chemical space or focus on specific features of a molecule.

3D similarity-based clustering workflow.

Working with fragments

One approach to getting better coverage of chemical space is to work with low molecular weight compounds (<300 Da), termed fragments. Linking these together can generate larger drug like molecules, however, accomplishing this is recognized as a difficult task. Spark, a scaffold hopping and R-group exploration application, enables us to offer you a tailored solution to this problem, using fragment libraries constructed from those that occur in biologically active molecules. New suggestions for compounds to make can be built to fit the binding cavity in a protein structure. See how Spark was used to grow and link fragments.

Free confidential discussion

We have worked on hundreds of projects on different biological targets and would be happy to discuss the best approach for accelerating your assets through the pipeline. Contact us for a free confidential discussion.


Modeling the intricacies of molecular recognition: Make ‘smart antibodies’ into biologics

Antibodies are fantastically versatile molecular recognition engines, capable of creating artificial enzymes in response to recognized interactions. Similarly, enzymes depend on molecular recognition for catalysis. Cresset Discovery Services has a depth of experience in modeling molecular recognition scenarios, stretching back to the early nineties.

Nature’s molecular recognition engines

In 1992 I was about to start a PhD on ‘catalytic antibodies’, investigating the amazingly versatile ability of the immune system to prepare artificial enzymes. It all sounded fascinating, and still does. Alas, rather than building antibodies, the path of my PhD altered, resulting in me developing my organo-phosphorus chemistry skills and producing ‘transition-state inhibitors’ of beta lactamase enzymes instead.

However, I retained a keen interest in molecular recognition and a familiarity with antibodies as a protein class and this has proved useful for our work at Cresset Discovery Services today. Indeed, it wasn’t until much, much, later that immunoglobulins would reveal their impact as important future medicines and of course this was way before the multi-billion-dollar blockbuster TNFa targeting biologic Adalibumab (Humira) (Figure 1, right).

Immunoglobulins are biologically evolved to recognize an enormously variable patch of molecular surface using a single protein architecture; an arrangement of two polypeptide chains (heavy and light) each bearing three hyper variable loops.

Figure 1 illustrates the diversity of recognition possible in antibodies, as modeled in Flare™, ranging from small molecules (sulfathiazole) to big molecules (Fullerene C60)1 and peptides (TNFa)2.

Figure 1: Left: an example of a fullerene recognizing mouse antibody. Middle:another mouse antibody recognizing a sulphonamide drug, Sulfathiazole. Right: the human engineered antibody Adalibumab (green) with TNFa. Heavy chain (crimson) variable loops (magenta), Light chain (grey) variable loops (black). All modeled in Flare using PDB codes: 5CP3, 1EMT and 3WD5.

Similarly, enzymes also use molecular recognition to great advantage via a diverse array of protein architectures. In contrast to antibodies, they specifically recognize molecular surfaces that represent a transition state of a chemical transformation. Lowering the energy of the transition state through binding allows catalysis to happen in enzymes.Figure 1: Left: an example of a fullerene recognizing mouse antibody. Middle:another mouse antibody recognizing a sulphonamide drug, Sulfathiazole. Right: the human engineered antibody Adalibumab (green) with TNFa. Heavy chain (crimson) variable loops (magenta), Light chain (grey) variable loops (black). All modeled in Flare using PDB codes: 5CP3, 1EMT and 3WD5.

The main premise of catalytic antibodies is to elicit their production in response to a transition state that can be mimicked by a ligand structure, which in theory then reproduces enzymic behaviour in the resultant protein. This pursuit simultaneously probes our fundamental knowledge of molecular recognition and of enzyme catalysis.

A catalytic antibody that cleaves cocaine

In 2006 workers at Scripps published work on a catalytic antibody3 that cleaves cocaine at the benzoyl ester. The antibody was elicited using an aryl phosphonate ester – a mimic of the carboxyl ester hydrolysis transition state.

The insights gained through the X-ray structures that were solved for this system were remarkable. Multiple components of the reaction coordinate are shown to exploit common recognition patterns, whilst the different geometries are accommodated by both residue and backbone movements in a dynamic process (Arg and Tyr movements in Figure 2).

This is an excellent model system. It shows how some interactions are very favourable e.g., the cation-pi interaction of the tropane is very solid across the structures, whilst others e.g., the ester, are variable. It also demonstrates that proteins, and in particular enzymes, ‘breathe’ – they are not statues (Figure 2).

Figure 2: The Scripps catalytic antibody in the different protein conformations that recognize the substrate and the transition state for cocaine ester hydrolysis. This shows highly mobile Arg (magenta) and Tyr (cyan) residues and the ligand (cocaine hydrolysis reaction coordinate – light to dark green). All modeled in Flare using PDB codes: 2AJU, 2AJV, 2AJZ, 2AJX, 2AJY and 2AJS.

All molecular recognition is not equal

One very interesting facet of molecular recognition demonstrated by antibody antigen interactions is that these interactions are not all equal. In fact, there are lesser and greater interactions – the latter of which can predominate in driving antibody production. Preferred interactions (or ‘hot spots’) can be visualised using various techniques (e.g., as demonstrated using the previous cocaine system modeling in Flare shown in Figure 3).

Figure 3: The Scripps catalytic antibody conformer ligand interaction surfaces. All modeled in Flare using PDB codes: 2AJV and 2AJX. This shows highly mobile surface changes required to squash the substrate into the transition state geometry (this time with the phosphono mimetic) and the hydrophobic hot spot surface (yellow) calculated by Flare.

Antibodies are produced via natural selection processes as a cellular response to a presented antigen. Some parts of the antigen are better than others at eliciting antibody responses, since they may involve preferred interactions, and so the resultant antibody may not bind in a therapeutically useful way.

Modeling and design of these interactions (i.e., via antigen engineering) becomes a useful task, since leaving it to nature can divert us from our intended goals.

Cresset Discovery Services work on biologics

While we are actively working and delivering on projects involving biologics, client confidentiality means that it remains a challenge to describe details of work done for commercial clients. We can, however, speak more generally and we have recently delivered on projects to:

  • Predict / design mutants that prevent binding
  • Model strategic glycan positioning at unhelpful yet potently antigenic surfaces
  • Model / characterize the observed binding order of an antigen – receptor series.

Read more about Cresset’s capabilities in biologics: Modeling ‘big’: Applying the XED force field to biologics.

Biologics remain a very important class of protein targets for disease therapeutics which we intend to continue to support through our innovative modeling services.

Free confidential discussion

Find out how we can accelerate your project by requesting a free confidential discussion.


  1. X-ray crystal structure of an anti-Buckminsterfullerene antibody Fab fragment: Biomolecular recognition of C60, Braden*, B. C., Goldbaum F. A., Chen B., Kirschner A. N., Wilson S. R. and Erlanger B. F. PNAS 97, no. 22, 12193–12197, 2000.
  2. Comparison of the Inhibition Mechanisms of Adalimumab and Infliximab in Treating Tumor Necrosis Factor a-Associated Diseases from a Molecular View, Shi Hu, Shuaiyi Liang, Huaizu Guo, Dapeng Zhang, Hui Li, Xiaoze Wang, Weili Yang, Weizhu Qian, Sheng Hou, Hao Wang§, Yajun Guo and Zhiyong Lou, Journal of Biological Chemistry, 288, 38, pp. 27059 –27067, 2013
  3. Complete Reaction Cycle of a Cocaine Catalytic Antibody at Atomic Resolution, Zhu X., Dickerson T. J., Rogers C. J., Kaufmann G. F., Mee J. M., McKenzie K. M., Janda K. D. and Wilson I. A. Structure 14, 205–216, 2006.

Modeling ‘big’: Applying the XED force field to biologics

Cresset is well known for powerful and accurate ligand-centric modeling, and Flare has established our methods for protein-ligand interactions. Work on GPCR modeling and viruses demonstrates the effectiveness and potential of Cresset technology for protein-protein interactions. Here I discuss the successes and challenges of modeling ‘big’ – applying Cresset’s XED force field to biologics.

Adventures in protein modeling: GPCRs

In 2014 Dr Andy Vinter, Cresset founder, reported on GPCR modeling exercises using the XED force field1, where ligand poses were exhaustively explored together with full complex minimizations to provide qualitative or quantitative analyses with binding estimates for agonist v antagonists. Although this was a huge modeling challenge, the approach provided fascinating new insights into GPCR behaviour that are in keeping with more recent literature. In particular, Brian Kobilka (joint winner of the 2012 Nobel Chemistry Prize) published a paper in 2016 showcasing the use of specific nanobody binding to the intracellular side of the GPCR to probe the long-range influence of ligands at the extracellular side2. He provided evidence supporting the hypothesis that GPCRs are likely partitioned between different states by differential stabilization of the full complexes in response to ligands. Our modeling findings concur in that the subtleties of these interactions extend beyond direct local binding interaction events and are propagated at distance across the full protein complex.

A matter of scale

Long-distance effects are not unusual in the realm of protein-protein interactions yet are beyond the scope of traditional molecular mechanics – from an accuracy point of view. From a sheer scaling point of view, the number of atoms involved means they are also beyond the scope of quantum mechanics. QMMM methods are also sometimes a poor compromise as these are discontinuous and focus on the local binding event.

Interestingly, this is where the XED force field has a nice sweet spot; accuracy approaching that of QM, but speed and the ability to map larger numbers of atoms >30,000, which is highly appropriate for the analysis of protein ligand and protein-protein systems. We can do this accurately and consistently through deployment of careful protein preparation and minimization workflows on protein systems.

Example: Influenza virus

The Centre for Pathogen Evolution at The University of Cambridge3 is involved in mapping data ultimately for the potential prediction of vaccine escape mutations of the influenza virus.

Hemagglutinin virus protein is the receptor that recognises mammalian cell surface glycans as an essential route to host cell entry. The ability of the virus to recognise sialic acid containing glycans is essential to this process and residues that contribute to its recognition represent those which are consequently difficult to mutate without compromising the virus. Antibodies which are directed to this site (Figure 1 left and middle) are less likely to suffer from viable virus mutations than others 4,5 (Figure 1 right).

Figure 1: Left: Influenza H3N2 hemagglutinin with the electrostatics of the core recognition element sialic acid from PDB 5VTQ. Middle: overlapping monoclonal antibody (blue tube) recognition site with electrostatics from key residues from PDB 2VIR. Right: a non-overlapping monoclonal antibody from PDB 5W42.

Optimizing biologics using 3D electrostatic shape and complementarity

In vogue, directed degradation mechanisms (PROTACS)/antibodies/vaccines, i.e., biologics, are example therapeutic paradigms which involve subclasses of these protein–protein interactions rather than the classical small molecule drug – protein target interactions. Modeling them is a significant challenge faced by many organizations charged with producing an array of diversely targeted therapeutics, because it is where a lot of what remains (the ‘higher hanging fruit’) happens to be.

The biologics industry may have slightly different criteria for cycling through an optimization, but ultimately similar schemes to those operating in the pharmaceutical industry still apply. There is an equivalent of traditional medicinal chemistry drug discovery workflows – involving SAR analysis, design, synthesis and test cycles. For antibodies, as for small molecules, target affinity, solubility, aggregation are key initial concerns. Mouse to human transformation is a uniquely biologics issue (unless we are talking in vivo models) as is the means of controlling SAR. For proteins it is all in the manipulation of the amino acid sequence, protein loop conformational preference, by single or multiple residue mutation. Incidentally, conventional sequence similarity metrics are not a useful measure of a residues relative potential for interaction with ligands or proteins in active sites (despite often being the tool of choice for analyzing protein data), as they were derived from natural mutation propensity and that consequence on maintenance of protein architecture.

Ultimately, the mechanism of target engagement, the molecular recognition event, is through electrostatic and shape complementarity and is fundamentally the same 3D phenomenon that applies to small molecules. Cresset scientists have an outstanding track record of working on electrostatic and shape complementarity and have successfully applied these skills to protein-protein interactions.

In the last 12 months, Cresset Discovery Services has completed work on viral vaccine modeling and biologics modeling which have proved highly useful for clients. We matched observed binding events with calculated binding enthalpy trends and predicting a-priori the observed pattern of protein binding or unwanted peptide binding suppression. This has been done using WT or mutant proteins that we have successfully taken through analysis, modeling/design and client testing cycles. As you would expect, client confidentiality prevents us disclosing further details, but contact us for a free confidential discussion.


  1. Applying the XED molecular mechanics force field to the binding mechanism of GPCRs
  2. Allosteric nanobodies reveal the dynamic range and diverse mechanisms of G-protein-coupled receptor activation, Kobilka et al, Naturevolume 535, pages448–452 (21 July 2016)
  3. https://www.pathogenevolution.zoo.cam.ac.uk/
  4. Substitutions Near the Receptor Binding Site Determine Major Antigenic Change During Influenza Virus Evolution, David F. Burke, Derek J. Smith et al, Science 22 Nov 2013:
    342, Issue 6161, pp. 976-979
  5. Diversity of Functionally Permissive Sequences in the Receptor-Binding Site of Influenza Hemagglutinin, Nicholas C. Wu Jia Xie Tianqing Zheng, Corwin M. Nycholat, Geramie Grande, James C. Paulson Richard A. Lerner and Ian A. Wilson, Host & Microbe 21, 742–753, June 14, 2017