Practical experiences of using the 3D-QSAR tools in Forge, Application to RET Kinase Inhibitors

Practical experiences of using the 3D-QSAR tools in Forge, Application to RET Kinase Inhibitors was presented by Dr Bohdan Waszkowycz, Cancer Research at the Cresset European User Group Meeting 2015.

The Cancer Research UK group at the Manchester Institute have been looking at ways to improve the selectivity of RET kinase inhibitors. They used the 3D-QSAR models in Forge and compared them with other 3D-QSAR methods. They concluded that building a successful and robust QSAR model takes time and patience, but Forge offers a user-friendly approach to building 3D-QSAR models.

In recent years RET kinase has been implicated in lung cancer and thyroid cancer, so there is a large potential market for a compound that can inhibit it effectively. There are a few known inhibitors used in the clinic, but they have not been designed specifically for the target but hit a range of kinases, so they have a lot of off-target toxicity. The group’s goal for this project was to identify novel inhibitors with improved potency and selectivity towards RET.

The 3D-QSAR dataset

Bohdan described the quinazoline dataset that they have been working on at Manchester. They synthesized over 450 examples, with very varying properties, and then selected a subset of 128 compounds for this 3D-QSAR study to focus on the gatekeeper pocket of the kinase.

They initially based their modeling on X-ray structures from the PDB, but later in the project they obtained their own in-house X-ray for a phenolic quinazoline analogue that was of interest because it showed a large boost in RET affinity and selectivity.

Using Forge, they calculated the fields and difference maps for the simple phenol and a number of chemically diverse analogues to explore how the fields varied with substitution. They found that the phenol is important because it forms a couple of critical hydrogen bonds in the gatekeeper pocket.

Why should we want to build a QSAR model?

The goal of building a QSAR model is to find a correlation between chemical structure and biological activity. One reason to do this is the insight it gives you into your dataset. For example, you may be able to see a trend in the data, but is it statistically robust and can you use it predictively? Can you transfer the SAR you’ve learned onto a new scaffold to get to new chemical space?

How does 3D-QSAR work in Forge?

Forge is helpful for analyzing which 3D features of the dataset are strongly correlated to biological activity. Forge assembles the field points around each compound of the aligned dataset, and then calculates the electrostatic and steric energies at these field points to use as input to the regression analysis. Once the regression has been run, the most significant regression coefficients can be visualized in 3D space, from which it can be deduced whether they contribute positively or negatively to activity.

The performance or robustness of the prediction is normally evaluated as a cross-validated correlation coefficient, Q2. This is particularly useful for assessing a separate test set of compounds that are not in the original training set.

A comparison of structure-based and ligand-based alignment methods

Bohdan presented an evaluation of how different alignment methods influenced the 3D-QSAR model. He noted how much work it can be to take a large dataset and align all the compounds consistently. Naturally, he looked for methods to automate the alignment stage and minimize the amount of manual refinement to obtain the final alignments.

One approach was to explore docking into the protein binding site. He kept a number of poses for each compound and chose the most plausible, doing some manual adjustment in order to refine inconsistent poses. There was quite a bit of misalignment of the core, with pivoting about the key hydrogen bonding interaction at the hinge, which is typical of docking a dataset into a rigidly constrained protein.

He also performed ligand-based alignments from scratch using Forge, selecting several diverse compounds as references and using the substructure alignment method. He again had to do some manual adjustment and refinement, but on the whole the Forge alignments yielded much tighter alignments of the conserved core than did docking.

These two datasets enabled him to run a series of QSAR models to compare the impact of the ligand-based and structure-based alignment methods.

Outliers and ambiguous alignments

One interesting situation presented is what to do with ambiguous alignments? For example, how would it be best to align a 3-OMe phenyl substituent (IC50 1200nM) with the parent 3-OH (IC50 5nM)?

Forge returned a like-for-like alignment as the methoxy is a simple analogue of the hydroxy. However, the docking method flipped the methoxyphenyl group due to steric clash with the tight binding site. The question then is which happens in reality? Which yields the most meaningful model? There are always lots of unknowns in a real-life dataset. Bohdan emphasised that his goal was to get a consistent set of alignments which made sense given the particular alignment method. He recommended Mark Mackey’s blog The three secrets to great 3D-QSAR: Alignment, alignment and alignment.

Another question that was considered is what to do with outliers? Can they be safely removed from the dataset or does the model suffer as a result? If they are indeed the only example of a particular chemical feature within the dataset then there is no way the QSAR model can learn about that feature from the remaining compounds, and therefore it may be valid to remove such outliers.

In his comparison of ligand-based and structure-based 3D-QSAR, Bohdan concluded that overall the Forge ligand-based alignments gave better models than docking-based alignments, in terms of more consistent alignments and more easily interpretable QSAR regression models.


Building a successful and robust 3D-QSAR model takes time and patience. The critical first step is to generate plausible and consistent alignments.

A more reliable estimation of model robustness comes from using separate training and test sets, but choosing a suitable test set is not as easy as it may seem! Selecting the test set by activity worked well, but selecting it randomly gave much more variable results. Therefore it is important to make a careful selection of the initial dataset and to consider how best to separate it into representative training and test sets.

In summary, Forge offers a user-friendly interface to building 3D-QSAR models. It is easy to set up training/test sets, run cross validation and to visualize graphs and coefficients.

For this project on RET kinase, the final models were very consistent with the observed SAR, offering helpful insight into the features required for improving activity and selectivity.

Try Cresset solutions on your project

Request a free software evaluation