At the heart of medicinal chemistry design: The synergistic relationship between Forge and Spark


We present an exercise in in silico medicinal chemistry: using Activity Atlas1 to assess and understand the features that drive activity for a collection of CDK2 inhibitors; using Spark2 to suggest new ideas to explore; then visually assessing those new ideas in Forge3 in the context of the qualitative 3D-SAR models generated by Activity Atlas.


The efforts of medicinal chemists in their quest to design new, active ligands can be summarized by answering three deceptively simple questions:

  1. What to make next?
  2. Why should I make it?
  3. Has this chemical space been explored previously?

These questions are frequently addressed separately. However, there is much value in considering the questions simultaneously with the help of Forge and Spark.

In this case study, Cresset ligand-based design tools are used to address the questions above with respect to the well-studied target for oncology Cyclin-dependent kinase 2 (CDK2).


Compound collection and alignment

A collection of 38 imidazopyridine containing CDK2 inhibitors was gathered from CHEMBL4 and curated to create a molecular dataset spanning a pIC50 range of 4-9.

Prior to assessing any relationship between activity and 3D structure, it is necessary to align the collection of molecules. In this case we chose to align the molecules to the ligand derived from the crystal structure 1OIT (CHEMBL73303: pIC50 =9.0). As many of the compounds in the dataset contained a solvent-exposed group that was not present in this reference an additional active ligand was identified (CHEMBL70808: pIC50 = 8.52) that maintained the core, but expanded beyond the terminal sulfonamide of the 1OIT ligand. The CHEMBL70808 ligand was aligned to the 1OIT ligand using manual and automated methods to ensure good correspondence of the imidazo[1,2-a]pyridine groups that are deeply embedded in the binding site. Further manual alignment was employed to ensure comparable orientation of the sulfonamide, despite being solvent-exposed. The pair of ligands (Figure 1) was then used as a combined, equally weighted, reference for the subsequent alignment of the 38 compounds gathered.

Figure 1_ Alignment of PDB1OIT ligand
Figure 1. Alignment of PDB:1OIT ligand (pink) and CHEMBL70808 (grey).

Compound collection and alignment

A collection of 38 imidazopyridine containing CDK2 inhibitors was gathered from CHEMBL4 and curated to create a molecular dataset spanning a pIC50 range of 4-9.

Prior to assessing any relationship between activity and 3D structure, it is necessary to align the collection of molecules. In this case we chose to align the molecules to the ligand derived from the crystal structure 1OIT (CHEMBL73303: pIC50 =9.0). As many of the compounds in the dataset contained a solvent-exposed group that was not present in this reference from an additional active ligand was identified (CHEMBL70808: pIC50 = 8.52) that maintained the core, but expanded beyond the terminal sulfonamide of the 1OIT ligand. The CHEMBL70808 ligand was aligned to the 1OIT ligand and subjected to additional manual alignment in order to better align the imidazo[1,2-

a]pyridine groups that are deeply embedded in the binding site. Further manual alignment was employed to ensure comparable orientation of the sulfonamide, despite being solvent-exposed. The pair of ligands (Figure 1) was then used as a combined, equally weighted, reference for the subsequent alignment of the 38 compounds gathered.

Within Forge, the collection of CDK2 inhibitors was aligned to the two equally weighted references, using 50% fields and 50% shape and a soft protein excluded volume to generate an alignment (Figure 2) and similarity score (range 0.59 through 0.85). As alignment is at the heart of every 3D-QSAR approach, the alignments were visually inspected and manually tweaked to ensure consistent orientation of all side chains.

Figure 2_ Alignment of 38 CDK2 inhibitors to PDB1OIT and CHEMBL70808
Figure 2. Alignment of 38 CDK2 inhibitors to PDB:1OIT and CHEMBL70808.

Modeling 3D SAR with Activity Atlas

To generate a qualitative assessment of the features driving activity, an Activity Atlas model calculation was performed.

Activity Atlas is available in Forge, and generates visually striking maps based on the aligned molecules. The models depict: the average electrostatics and shape for active compounds; the summary of activity cliffs in electrostatics, shape and hydrophobicity; and the regions explored. The models generated by Activity Atlas are intended to help answer the questions asked earlier and are shown in Figures 3-5.

In Figure 3, the average active molecule map is depicted. It indicates the common electrostatics and hydrophobic features of active molecules in the dataset – essentially molecules that do not have these features are unlikely to have any significant activity. The central H-bond donor (amine)/H-bond acceptor (pyrimidine) motif mapping the interaction with the hinge region of CDK2 is present in all actives. Negative regions near the H-bond acceptors of sulfonamide and imidazopyridine are also present on all actives.

Figure 3_ Activity Atlas map for average electrostatics and hydrophobics for active molecules
Figure 3. Activity Atlas map for average electrostatics and hydrophobics for active molecules. Blue = negative electrostatics; Red = positive electrostatics; Gold = hydrophobics.

Figure 4 shows the summary of activity cliffs in electrostatics and steric space – the fine detail on the average of actives map. Areas that are blue suggest that making this area more negative (or less positive) could enhance activity; areas that are red suggest that making the area more positive (or less negative) can drive activity; and finally, steric bulk is favorable in the green regions, whereas it is not tolerated in the pink regions.

Figure 4 (right) displays the activity cliff analysis in an orientation better suited to examining the imidazopyridine. Tilted on its side, the Activity Atlas model suggests that a more positive/less negative π-cloud should enhance activity, as well as noting that a stronger negative near the H-bond accepting ring nitrogen should also favor activity.

Figure 4_ Two views of the Activity cliff summary for electrostatics and sterics
Figure 4. Two views of the Activity cliff summary for electrostatics and sterics. Left: Electrostatics and sterics. Right: Electrostatics only in a view rotated around the x axis. Areas in blue suggest that negative electrostatics are favored; areas in red favor positive electrostatics. Areas in green are favorable sterically, while areas in pink are unfavorable for sterically bulky groups.

The assessment of the electrostatic regions explored suggests that at least 10 of 38 molecules have contributions as shown in Figure 5. This means that regions that are not explored may present opportunities for modification and further optimization.

The region explored analysis also performs a calculation of novelty for all the data set compounds as well and novel designs. The calculated novelty can be used as an indication for how much information would be gained from the molecule should it be made.

Figure 5_ Aligned ligands with a display of the regions of electrostatic space explored
Figure 5: Aligned ligands with a display of the regions of electrostatic space explored.

Generating ideas in Spark

With a SAR analysis in hand, Spark was used to find bioisosteric replacements for the highlighted portion of

the 1OIT ligand as shown in Figure 6. The fragments were sourced from the Spark reagent database of boronic acids derived from eMolecules.

Figure 6_ The Spark eMolecules boronic acids reagent database
Figure 6. The Spark eMolecules boronic acids reagent database was designated to identify bioisosteric replacements for the moiety shown in pink in the PDB:1OIT ligand.

Results and analysis

Four results were selected from the Spark experiment were selected for further analysis and are shown in Figure 7. Each of the selected Spark hits have been chosen to either enhance activity, and/or to challenge the model since it was built on a relatively small number of compounds.

Each of the proposed bioisosteric replacements are derived from commercially available reagents that are shipped from the supplier within 2 days to 4 weeks.

In (a), the pyrazolopyridine replacement would test the importance of the H-bond acceptor strength on the original imidazopyridine. Moving the heteroatom across the ring results in a weaker H-bond acceptor while maintaining a reasonable density above the ring. Note that this replacement results in a richer central pyrimidine ring.

The dihdropyrroloimidazole (b) has similar electrostatics to that starting ligand but tests the requirement for an electron deficient aromatic by saturating this ring, removing all pi-electrons from this region.

In (c), the benzofuran might be suitable for testing the model as it removes the strong H-bond acceptor altogether, replacing it with a weak feature. Whilst this might not be productive in activity the molecule would add knowledge to the model and the reagent is available immediately.

Finally, (d) suggests replacing the imidazopyridine with a quinoline. The change from 5,6 to 6,6 ring introduces a different geometry for the H-bond acceptor but maintains many of the electrostatic features of the starting ligand in a reagent that is available for immediate dispatch. If this substitution was tolerated then the spark results contain other, more exotic suggestions that include a H-bond acceptor in a 6 membered ring in this position.

Figure 7. The starting ligand from 1OIT and selected Spark results (a-d). Top: in 2D. Center: in 3D showing positive and negative interaction potentials. Bottom: presented in the context of the Activity Atlas activity cliff summary for electrostatics.


This article presents our approach to answering the three questions at the heart of medicinal chemistry design:

1. What to make next?

Every chemist has a collection of tricks and substitutions that have been built up from their laboratory experience. Spark enhances the generation of ideas in an unbiased way, and in combination with a chemist’s intuition, can provide ideas about what compounds can be made that are in potentially open IP space, but still retain the characteristics of active molecules. The Spark experiment using eMolecules reagent databases also provides tier and ordering information for applicable reagents exploiting specific chemical reactions to improve laboratory efficiency.

2. Why should it be made?

Taking the Spark and chemist-designed ideas and examining them in the context of the Activity Atlas models is a way to ensure that new compounds meet the characteristics of active molecules, or conversely, are designed to test specific aspects of the models. The combination of average active molecule and activity cliff summary maps can provide the rationale needed to give confidence that taking a new design to the laboratory is worth the effort.

3. Has this chemical space been explored previously?

The region explored analysis combined with the novelty calculation within Activity Atlas can be used to assess whether the Spark and chemist-design ideas lie within the already explored chemical space, or whether they map regions of this space so far unexplored.

References and links

4 P. Bento, A. Gaulton, A. Hersey, L.J. Bellis, J. Chambers, M. Davies, F.A. Krüger, Y. Light, L. Mak, S. McGlinchey, M. Nowotka, G. Papadatos, R. Santos and J.P. Overington (2014) ‘The ChEMBL bioactivity database: an update.’ Nucleic Acids Res., 42 1083-1090.

A picture tells a thousand words: Summarizing SAR for medicinal chemists


In drug discovery programs, summarizing and understanding the SAR for large sets of compounds can be both difficult and time-consuming. It is often the responsibility of the computational chemist supporting the project team to carry out this analysis, sharing the results with medicinal chemists in a clear and concise manner, to inform new molecule design and help them define the future directions of the project. With Activity Atlas complex SAR can be easily summarized into a visual 3D map, condensing a large table of data into a single picture. Activity Atlas is a probabilistic method of analyzing the SAR of a set of aligned compounds as a function of their electrostatic and shape properties. The method uses a Bayesian approach to take a global view of the data in a qualitative manner, taking into account the probability that a molecule is correctly aligned.


Despite utility in extracting useful SAR from sets of related compounds, 3D-QSAR techniques are known to perform poorly where there are activity cliffs. Equally, activity cliff analysis is a powerful technique for locating the most important changes that have been made within a series, but it looks at pairs of compounds in isolation rather than examining the entire data set.

Pairs of compounds with a high similarity and a large difference in activity carry important information relating

to the factors driving activity. This analysis has traditionally been done using 2D similarity metrics, but we have extended this to 3D similarity using computationally aligned molecules. We present a technique to analyse multiple pairs of molecules simultaneously to derive a global view of the activity cliff data, a method we call Activity Atlas. Activity Atlas generates three distinct visually striking maps of the electrostatic, shape and hydrophobic properties around molecules:

  • Activity cliff summary
  • Average of actives model
  • Regions explored analysis

Local analysis_Global analysis


A Bayesian approach was taken where each pair of molecules provides evidence as to whether the difference in electrostatic and steric potentials within the pair in a particular region of space contributes to a change in activity. More than one alignment is considered for a molecule, and a weight is assigned to each alignment based on its score compared to the best-scoring alignment. This allows cases where a molecule has a flexible substituent which is not fully constrained by the initial alignment to be handled correctly.

Treating pairs of molecules rather than individuals allows the technique to be weighted towards describing the steep regions of the activity landscape correctly, where QSAR methods have generally struggled.

In contrast to matched molecular pair analysis, this analysis requires fewer molecules and extracts more information,

as it can handle pairs of molecules with multiple changes, and it also includes non-identical but correlated molecular transformations. For example, a Me->F and an Et->Cl transformation both provide information about the effectiveness of replacing a small hydrophobic group with a small electronegative group, but the correlation is missed in a standard MMP analysis.

Application to selectivity

Examination of the steric and electrostatic maps for the three subtypes clearly shows which regions should be targeted in order to enhance subtype selectivity.

In the example below, the right hand side of the molecules can be used to discriminate between A3 and the other two subtypes, while A1 and A2a can be separated by increasing steric bulk and positive charge around the top of the molecules.


Application to selectivity

Adenosine receptor agonists with activities against A1 , A2A , and A3 with activity cliffs summary are shown.


Methodology_Average of actives summary_Activity cliff summary_Regions explored summary
The Regions explored summary gives a comprehensive 3D picture of the regions explored in electrostatic and shape space. A novelty score is assigned to each compound, enabling to predict whether newly designed candidates are likely to contribute additional SAR knowledge, and are thus worth making.

Traditional 3D-QSAR model

A traditional 3D-QSAR model was built on the same data set (q 2 = 0.7). While 3D-QSAR seems better at extracting

information where SAR is continuous, Activity Atlas gives more definition in regions where SAR requirements are critical.

Traditional 3D-QSAR model


The Activity Atlas technique is a powerful way of summarizing SAR data in 3D. By combining information across multiple activity cliffs, it enables a global view of the critical points in the activity landscape.

The average of actives summary captures in one picture the 3D requirements for potency, while the Regions explored summary enables prioritizing compounds which add crucial SAR information over trivial analogues.

These visually appealing maps provide an insightful and highly intuitive way of conveying valuable SAR information from computational to medicinal chemistry groups.


J. Chem. Inf. Mod. 2006,46, 665-676

J. Chem. Inf. Model. 2011, 51, 258-266

J. Med. Chem. 2005, 48, 141-151

J. Med. Chem. 2008, 51, 589–602

Bioorg. Med. Chem. Lett. 17 (2007) 5934–5939

Bioorg. Med. Chem. Lett. 17 (2007) 3373–3377

Chemom. Intell. Lab. Syst. 2001, 58, 109-130

Chemom. Intell. Lab. Syst. 1993, 18, 251-263

Generating accessible, novel R-groups in hit-to-lead optimization


Balancing novelty, activity, physicochemical properties and IP position is at the heart of the hit-to-lead and lead optimization processes. Generating many small changes to a structure rarely moves the project forward significantly while large changes can cause a significant loss in activity. The challenge is often to find a non-trivial change that progresses the project towards the multi-parameter optimization goal without jeopardizing activity.

Bioisosterism has proved a popular way to generate new scaffolds in drug discovery. In this poster we explore the application to R-groups and demonstrate that groups which are bioisosteric in shape and electrostatic space provide an excellent range of lead optimization

opportunities. However, in hit-to-lead and lead optimization projects, there is rarely time to scope out new synthetic routes for the introduction of each R-group.

Linking in silico generated ideas for R-group replacements with available reagents and accessible chemistry is key to using the novel results available through shape and electrostatic bioisosterism. R-group libraries that are united by specific chemistry provide a systematic way to rapidly exploit a particular chemical reaction to generate novel chemical matter during the lead optimization process.

Electrostatics/shape bioisosterism

Cresset’s field technology1 represents molecules using electrostatic and shape properties enabling the comparison of molecules across chemical series.

Electrostatics shape bioisosterism

Scaffold hopping in Spark

Spark searches databases of fragments for replacements for part of the starting molecule. All fragments which have the required geometry are formed into a product molecule which is energetically minimized. Only as a

product molecule is the replacement assessed for electrostatic and shape similarity to the starting molecule. This enables the electrostatic and shape properties of the fragment influence those of the retained portions of the molecule and vice-versa

Scaffold hopping in Spark

Fragment sources in Spark

Spark generates bioisosteres from databases of fragments derived from:

  • Commercially available, real compounds (ZINC)
  • Theoretical aromatic rings (VEHICLe)
  • Literature reports of bioactive compounds (ChEMBL)
  • Fragments from the Cambridge Structural Database (CSD) of small molecule crystal structures
  • Available reagents

We present a simple technique to rapidly generate and use databases of available reagents. It is applied to a collection of compounds to generate a searchable database of bioisosteric replacement groups to boost novelty in design, while simultaneously balancing physicochemical property and synthesis considerations. A simple set of rules classify the R-group collections by specific chemistry making selection of the appropriate database facile. Other secondary data is included in the substituent record, reflecting its source compound in an inventory system or vendor catalogue for ease of access.

Fragment sources in Spark

Fragment sources in Spark. Sphere size represent numbers of fragments in each database.

Example reagent processing rules for acids and amines

Example reagent processing rules for acids (top) and amines (bottom) that capture the source of an R-group. The full list of rules encompasses 22 separate transformations.

Btk inhibitors: Displacing water

In this example Spark was used to look for reagents that were bioisoteric with a pyridyl-water complex. Smith et al.<sup>2</sup> showed that the replacement of the 4-methylpyridin-3-yl in PDB:4ZLZ with small bicyclic heterocycles improved potency. The new hetrocycles displace the water molecule and make direct H-bond interactions with the P-loop.

We sought to test if Spark could suggest known and reasonable alternative replacements for the pyridyl water complex. We weighted the scoring towards electrostatics and specified the H-bond interactions of the water molecule as required. Fragments were selected from a database of 41K aromatic halides to replicate the boronic acid chemistry used in the original publication.

Water mediated hydrogen bond from pyridyl N to Phe413 and Gly414 of P loop

Water-mediated hydrogen bond from pyridyl N to Phe413 and Gly414 of Ploop. PDB: 4ZLZ.

Constraints on water field points

Spark Results

Selected Spark results

Selected spark results. Pleasingly the known active replacement heterocycle was retrieved at position 15 in the hit list albeit with a modified substituent.

Application to D3 antagonists

In this experiment we wished to demonstrate the use of Spark for providing novel amines for published D3 antagonists. However, searching for new amines from the secondary amine reagent database failed to provide novel

R-groups. Reasoning that the known active R-groups were highly functionalized and therefore that commercially available amines represented a limited source of inspiration the search was expanded to encompass all supplied fragment databases. Compounds with piperazine scaffolds were filtered out as these are well known in the literature and diluted the results.3

Application to D3 antagonists
Known D3 scaffolds were found in ChEMBL or Zinc (commercially available compounds) databases. Novel solutions were found in the ChEMBL database. An analysis of the chemical diversity of the known D3 scaffolds retrieved from Spark databases shows that the less common fragments derived from the literature database are a precious source of potentially useful chemical diversity.


Spark provides both known and novel active scaffolds that suggest opportunities for scaffold hopping and R-group replacement. Combining this power with the use of ‘chemistry-aware’ reagent fragment databases allows for exploitation of specific chemistries in the laboratory. Accessing fragments for potential substitutions from both literature and commercial sources represents a way to identify potentially novel chemistry and diversity. Furthermore, the creation of fragment databases from proprietary collections of compounds can be a powerful way of increasing the chemical diversity available.


1. J. Chem. Inf. Mod. 2006,46, 665-676

2. J. Med. Chem. 2015, 58, 5437−5444; Nature 2003, 423 (6937), 356−361; J. Am. Chem. Soc. 1989, 111 (1), 314−321

3. J. Med. Chem. 2007, 50, 5076-5089; Bioorg. Med. Chem. Lett. 2008, 18, 901– 907; Bioorg. Med. Chem. Lett. 2008, 18, 908–912; J. Med. Chem. 2010, 53, 7129–7139

Web clip: Using the Cresset Engine Broker to accelerate pharmacophore generation in Forge

The Cresset Engine Broker (CEB) is used to accelerate computationally intensive calculations by connecting resources from a Linux cluster to desktop applications. It is available from both Forge and Spark and can dramatically alter the time needed to complete a large experiment. In this web clip we demonstrate how to use the CEB to accelerate a computationally intensive pharmacophore experiment using FieldTemplater.

The experiment focuses on calculating a binding mode hypothesis for CCR5 inhibitors which inhibit the viral entry pathway for HIV. Five known inhibitors are brought into the FieldTemplater interface with their conformations pre-populated and subjected to a systematic alignment of all molecules’ conformations with each other. A moment after the calculation begins on local processors the CEB launches the resources on an external cluster to speed up the process resulting in over 100 threads running the Windows® based FieldTemplater experiment. Once the calculation completes the results are compared to a known binding mode from a protein-ligand crystal structure. The full details of the FieldTemplater experiment are available as a case study.

Rae Lawrence
Dr Rae Lawrence, Technical Sales North America

Web clip: Using radial plots to visualize multiple parameters simultaneously

Optimizing the physical properties of molecules is a key goal in most drug discovery or agrochemical research projects. Molecules must be active but also need to reach the site of action, at the correct concentration and for long enough (but not too long) to be useful. In a spreadsheet these properties are listed as numerical values in columns and can be difficult to compare across molecules in a quick way. By plotting the numerical information in a radial plot it is possible to quickly visualize the data and in particular how the data fits the project requirements rapidly. Moreover comparisons of the radial plots of multiple molecules makes it possible to quickly identify trends, sort on overall fit to the project profile or find outliers enabling the team to focus on the results that meet the project goals rather than those which solve the immediate problem.

Radial plots can be used in all our desktop applications (Torch, Forge, Spark). See this in action in the web clip below and contact us to find out more.

Web clip: Spark V10.3 – Using Spark’s tile view and tags to rapidly assess scaffold hopping results

Version V10.3 of Spark, Cresset’s computational chemistry software for idea and bioisostere generation, includes the ability to ‘tag’ results with a custom user-defined note that can be used for sorting, filtering, and decision-making. This expands on the ‘favorites’ designation in that tags can be used to explain why a result was flagged as a favorite. For example, you can tag suggested results as being known already, interesting, synthetically unfeasible, or any other designation that you might need.

The tile view of results allows for rapid assessment of the bioisosteric substitution, along with selected properties in a tiled view. This allows for a stream-lining of the visualization of many results from the Spark experiment and can be sorted and filtered the same way as the regular spreadsheet view in the Molecule Table.

See this in action in the web clip below and contact us to find out more.

Ligand design using Torch and Spark in tandem – molecule growing

Lead optimization calls on the experience and intuition of team members to drive towards the project goals. However, using Spark (for bioisostere replacement) in addition to Torch provides additional inspiration that may not have been considered.

This web clip shows how iterative design can be done and assessed within Cresset’s software by using Spark to identify bioisosteric substitutions that are in keeping with the field space of the reference ligand in its bioactive conformation.

Torch for iterative design

Finding the best compound to make requires careful consideration. Getting the best design is critical to progression of the project goals. We believe that designing in 3D gives helps you to make the best compound. To help you in this process we have built a comprehensive molecular editor into both Forge and Torch that enables the design of molecules in 3D with instant feedback on the fit to a known inhibitor.

This web clip uses this feature to show how you can iterate Cresset’s design process to identify the best possible molecules for synthesis or for passing to colleagues for further study.

New KNIME nodes and Pipeline Pilot (V2.2) components released

We are delighted to announce our continued integration with KNIME® and Pipeline Pilot workflow environments. This release introduces nodes for FieldTemplater, distributed computing and Spark Database Generator functionalities.


The FieldTemplater node is part of Cresset’s Forge nodes which wrap the corresponding functionality from the command line. FieldTemplater is a tool for finding pharmacophores: common patterns in electrostatic fields and hydrophobicity amongst molecules. Application to several structurally dissimilar molecules with common activity yields a set of hypotheses for the bioactive conformation without need for protein or crystallographic information.

FieldTemplater in PipelinePilot

Cresset Engine Broker

Released earlier this year for Forge and Spark executables, Cresset Engine Broker is now available for integration in the KNIME and Pipeline Pilot automation environments. The Cresset Engine Broker allows a number of components to distribute calculations across a computing cluster so that your local KNIME nodes can be controlling hundreds of CPUs. This can vastly reduce the amount of time it takes for a node to run.

Spark Database Generator

The Generate Spark Database node gives Spark users the ability to create their own fragment databases for bioisostere replacement experiments using their own compound collections, inventory systems, etc.
Spark is a bioisostere replacement tool in which the user selects a section of a starter molecule to be replaced using fragments with similar electrostatic and steric properties. Spark is supplied with fragment databases derived from commercially available compounds and theoretical studies. The Spark Database Generator is an optional module for creating custom fragment databases.

These new and improved components simplify the integration of Cresset’s advanced technology into the automation environments of KNIME and Pipeline Pilot, allowing you to gain knowledge and direction for your drug discovery projects within a familiar interface. All of Cresset’s Pipeline Pilot and KNIME components are free for existing customers. If you don’t already have Cresset’s software try a free demo.

What are field points?

Cresset’s technology compares molecules using shape and electrostatic similarity simultaneously. In this short web clip we introduce the field point technology that enables these comparisons.

Find out more about our technology or try it for yourself in one of our innovative, easy to use products.