Activity Miner – more fire power coming to Torch and Forge

We have been looking at ways to improve the navigation of structure activity relationship data. In Forge we have the capability to create a quantitative relationship between the molecular fields and the activity, but this is a time-consuming task that is not always successful. We wanted to add both automated and manual methods to extract qualitative SAR information from your data. With this in mind we have created ‘Activity Miner’ to rapidly interrogate and decipher SAR in both Torch and Forge. This will be part of a larger set of SAR interpretation tools that we hope to release over the coming months.

What is Activity Miner?

Activity Miner starts from a set of aligned molecules and compares them to each other. Each pair is given a ‘disparity’ value which reflects how much the activity changes relative to the structure. Pairs with high disparity (activity cliffs) contain more information about your SAR. By looking at all the high disparity pairs you can rapidly navigate through your dataset and understand where key changes have been made to your molecules.

Importantly changes to the structure can be judged either using classical 2D fingerprints or more intriguingly using Cresset’s molecular fields. Using fields gives similarities that are sensitive to the electrostatic and shape changes that are being made. Additionally, using fields provides the capability to compare pairs of structurally diverse compounds, although we expect this feature to be most useful within one chemical series.

What is disparity?

Within Activity Miner, the notion of ‘disparity’ is the key element used to investigate the activity landscape.

Disparity is calculated by dividing the difference in activity between two molecules by the distance between them. In Activity Miner the distance is calculated as ‘1 – similarity’ where similarity is either Cresset’s field similarity or the 2-D similarity. Pairs of molecules that have large differences in activity while having good field similarity give high disparities and highlight important aspects of the SAR.

For those of us who want to think through the mathematics: Disparity is given by:

Disparity ∝ ∆Activity / (1-Similarity)

∆Activity= Activityref – Activityx
This measure highlights where changes to the molecules (or more importantly their fields) make the most difference to the SAR.

Disparity in Activity Miner

Activity Miner presents the disparity data between pairs of molecules together with the 3D view of the molecules to enable you to easily visualise what structural and field changes are contributing to an activity change (below).

Activity Miner Disparity Table
The relationships between molecules in the dataset are presented in four separate ways:

    1. The full matrix of pairwise disparity values is presented in the ‘Disparity table’ window (shown above).
    2. The pairs with the highest disparity (but not necessarily the highest activity) are presented in the ‘Disparity View’ (below).

Activity Miner Disparity View

    1. The ‘Activity view’ presents a single row of the disparity table. This is a view focused on a specific compound with the most similar neighbors arranged around it and colored by disparity. The distance between the compound in focus and the neighbor is shown by the size of the segment. Changes to the structure are highlighted with ‘halos’ on neighboring structures. Browsing through the activity view is a quick way to find the key features of the SAR in the dataset.

Activity View

    1. Lastly Activity Miner also presents a ‘Cluster View’ where the compounds have been clustered using the chosen similarity method (below). The currently selected pair of molecules are highlighted in the compound names. Hovering on a name or line displays the structure or cluster information. Structures can be ‘locked’ into the view.

Activity Miner Cluster View
Applying Activity Miner to the GSK PERK Dataset

We will examine the GSK PERK dataset (J. Med. Chem, 2012, 55, 7193) that we discussed in the April blog and employ Activity Miner to look for interesting structure-activity correlations. To do this we took our aligned dataset of compounds (they were aligned using the ‘Substructure’ method in Torch) and sent the alignments into the new Activity Miner module.

High disparity resulting in increased activity – substituent effect

The Activity View shows a selected ‘focus’ compound in the center, with the neighboring molecules around the circumference. The color-coding and height of the ‘boxes’ provide quick visualization of the distance between molecules and their disparity with smaller boxes reflecting smaller distances (more similar), and darker colors referring to higher disparity. Green corresponds to positive disparity; red corresponds to negative disparity. For this example comparison, we chose molecule ’10, #31′ as the focus (pIC50=8.8). The Activity view for this molecule is shown below (left).

Now we compare it molecule ’25’ (pIC50=9.8) from the PERK dataset. This compound has the highest field similarity to the focus compound (0.98), and the highest disparity. If we click on the segment next to this compound it grows to show the associated data (see animation below).

Activity Miner Animation of Activity View
Activity View for molecule ’10, #31′ (pIC50 = 8.8) shown in the center and with molecule ’25’ being compared. The ‘halos’ on the circumference molecules indicate where there are structural differences.
Big Disparity

The difference between the structures above is in the position of the halogen substitution on the terminal phenyl. In this case, a small structural and field change has led to a large change in activity (1 log unit). Viewing the rest of the Activity View around molecule ’10, #31′ shows at a glance that improvements to this molecule almost invariably involve removing the para-fluoro and replacing it with a meta and/or ortho-substituent, matching the observations reported in the original GSK paper.

This first example illustrates how the relevant SAR can easily be extracted from even a large data set, showing at a glance the most important changes you have made.

Activity cliffs with fields give you more information

In mining the SAR landscape for an understanding of what factors drive activity, we need to be able to think about molecular changes in terms of their effects on the electrostatics and shape, not just the structure. For example, we can look at aromatic substitutions both in terms of the properties of the substituent and in terms of the effect of the substituent on the pi system.

When using molecule “5, #25” as a focus, we find a number of other molecules with substitutions to the furanopyrimidine core leading to increased activity. The molecule with the highest disparity is ’18, #28′ where the furanopyrimidine is replaced by a N-methyl pyrrolopyrimidine giving more than a log unit of activity improvement.

Activity Miner Full Screen showing Activity View
Molecules ‘5, #25′ (pIC50 = 8.1, top in 3D) and ’18, #28’ (pIC50 = 9.4, bottom in 3D).
Activity Miner not only helps us find this interesting change, but lets us view the fields to try to understand the reason for the activity increase. GSK postulated in their SAR analysis that pyrrolopyrimidine (’18’) produced more favorable H-bond interactions and improved solubility. Viewing the field patterns (above), it is apparent that the overall fields are very similar between the two molecules. Focussing on the detail, the N-methyl substitution both changes the dipole across the ring, making the NH2 less positive (and hence less good a H-bond donor), but the adjacent N more negative (and hence potentially a slightly stronger H-bond acceptor). A bigger change, however, is apparent when looking at the negative electrostatics above and below the ring. The N-methyl compound has a much larger aromatic pi cloud. The activity increase for this compound is thus potentially more due to a stronger aromatic-aromatic (or cation-pi) interaction, rather than being directly due to the substituent itself. Viewing the size of the pi cloud on the other molecules with substitutions to this ring then allows us to confirm or refute this hypothesis.

Based on these and other internal validation experiments, we believe this enhancement to both Torch and Forge to be a powerful tool for guiding lead optimization and mining the SAR to rapidly generate new and more active structures for experimental evaluation.

Try Activity Miner

The Activity Miner module is due for release with the next versions of Torch and Forge due in September 2013 but we are looking for active beta testers to try this new feature through the summer. If you would like to give this functionality a try and promise to provide feedback (good and/or bad) during your sneak preview, we definitely want to hear from you. Please contact us to request a trial.

Tim Cheeseright
Dr Tim Cheeseright, Director of Products

Rae Lawrence
Dr Rae Lawrence Technical Sales North America