Intelligent library design for protein families and beyond

Finding interesting hits against the plethora of potentially ‘useful’ new protein targets is still a significant challenge despite the growing list of techniques for detection and hit generation paradigms.

For example: high throughput screening of diverse compounds, fragment based drug discovery, crystallography and NMR driven Structure based drug discovery, plasmon resonance protein ligand binding detection and phenotypic screening to mention a few.

There are also important and efficient computational chemistry techniques such as structure or ligand based high throughput virtual screening but also low throughput scaffold hopping and iterative de-novo design for fast followers. Amongst these, rationale design of compounds remains an important and useful technique for providing useful starting chemistry, particularly where either ligand or protein information exists. I presented an overview of library design at CDDD in Verona. My presentation covered an extensive chemogenomic approach for GPCR library design and the use of Cresset’s cutting edge field based software for ligand gated ion channel library design, both of which were published recently in peer reviewed articles. You can view my presentation below.

Delivering high quality library design

Libraries of chemical compounds are the lifeblood of modern drug discovery programs. The quality of library design can determine a project’s success or failure.

Both molecular modeling and cheminformatics techniques are important for the production of chemical libraries. The Cresset Consulting Services team has the analysis and design experience that is vital for the delivery of successful chemical libraries.

Different types of library design

Library design as a concept is not new, but it only became a popular paradigm in drug discovery a decade or so ago. Over time the field of library design has split to encompass two main type of library, both of which are commonly used by medicinal chemists for their drug discovery campaigns:

Diverse

  1. Diverse compound libraries for the discovery phase
  2. Diverse lead-like libraries for the discovery phase
  3. Diverse fragment libraries for fragment based drug discovery

Knowledge-based

  1. Focused libraries for the discovery phase
  2. Libraries for the lead optimization phase

Modern drug discovery now rarely proceeds simply via the classical route of making serial changes and acting on the output of testing. Rather, activity is explored using SAR explosions at discrete points in the process.

Designing a diverse library

Diverse sets of compounds – be they drug-sized, lead-like or fragments – are usually created by selecting compounds from a greater pool using some measure of diversity on the pool. The pool could be commercially available compounds (singles or libraries), internal collections or synthetically accessible library space. Often a combination of these sources is used to get the widest possible range of compounds into the final library. In most cases the selection of compounds to include in the diverse library proceeds by using a combination of 2D similarity matrices and property calculations. This is essentially the process used by big pharma to get the most out of their compound screening file.

Although there are established methods for this, which work OK for generic screening molecules from vendors, there is no standard protocol and each company may have a different preferred set derived from the same commercially available pool.

Diverse fragment libraries

With the rise of fragment based drug discovery over the last 5-10 years a thirst has emerged for libraries containing smaller lead-like and fragment-like diversity. The type of analysis required to gauge redundancy in this case becomes tricky as the smaller the molecules become the more difficult it is to create meaningful robust measures of chemical similarity – many of the 2D similarity methods lose their discriminatory ability. Thus fragment libraries or lead-like libraries may require special treatment.

We have become interested in using our own description of molecules – their shape and electrostatic character – to describe compound collections. We presented some initial work in this space at the spring 2012 ACS meeting. In this blog post Tim describes how we are looking again at the diversity of compound and specifically fragment collections using the computational efficiency available from BlazeGPU.

Knowledge based library design – Focused libraries

To design a focussed library computational input becomes a critical factor. Focused libraries are inherently the result of leveraging the designs using existing knowledge. However this knowledge can be applied in different ways. Two clear approaches are common in this space, each with differing factors that dictate the course of the library design workflow.

The technique typically used by compound vendors is to filter their compound collection based on the fit of molecules to activity models that have been developed (e.g. using physical property, pharmacophore or 2D similarity models). The usefulness of the classification is entirely dependent on the details of how the model has been constructed and applied.
The alternative technique, often employed by specialist vendors and bigger drug discovery organisations, is to design novel scaffolds and substitutions to address specific biological target areas of interest. These include application of structure or ligand based designs targeting protein families or sets of related targets using medicinal chemistry principles. Unlike the filtering approach above, in this case all molecules would have to be synthesized with inherent advantages (notably IP) and disadvantages (cost) that comes with this.

The latter undoubtedly requires the greatest engagement of time and resource to provide a suitable level of insight into the problem from which to develop innovative chemical solutions.

Case study

S-adenosyl methionine (SAM) is a co-factor used as a biological methylation synthon. It is employed in a host of enzymatic methyl transferase processes which are important in a number of disease areas. In the area of Epigenetics the lysine methyl transferases ‘KMT’s are responsible for methylating lysine groups on histones – a process which mediates gene expression by changing the stability of the nucleosome.

A quick analysis of the binding conformation of SAM across the PDB (Figure 1) reveals a small number of clusters of SAM bioactive conformations are observed. The conformation of SAM found in KMT’s form a tight cluster which is distinct from the more diverse generic SAM utilising enzymes. Interestingly, the analysis shows that DOT1L, which is also thought to be a KMT, is an outlier and more closely related to the generic enzyme set than to the other KMTs.

Figure 1. SAM conformations from SAM utilising enzymes observed from the PDB

Figure 1. SAM conformations from SAM utilising enzymes observed from the PDB

Assuming we wished to pursue a SAM mimetic design as a paradigm for KMT or DOT1L inhibitor generation, then from a molecular design point of view there are a number of issues which would need to be addressed. One major issue already given is that SAM is ubiquitously used as a cofactor thus a close mimetic may have unwanted side interactions. Clearly a DOT1L SAM mimetic design will have more issues with generic SAM enzyme crossover. A design aimed at other KMTs (e.g. SMYD2) would have selectivity issues just within the specific KMT family.
Designing away from potential crossover activity could be achieved by a full SAM mimetic design since both the adenine and Met chains adopt different vectors and shapes in the different sub-classes. Alternatively, concentrating on the adenine mimetic alone, the H-bonding patterns and solvent exposure are distinct in the two enzyme sub-classes as shown in Figure 2.

Figure 2. Differences in recognition of adenine in the two ‘DOT1L-like’ v ‘KMT-like’ systems

Figure 2. Differences in recognition of adenine in the two ‘DOT1L-like’ v ‘KMT-like’ systems

This simple example shows how some background knowledge on the system can impact on the scope and potential success of any given design.

We described in our previous blog how our fragment replacement tools can be used to search for novel bioisosteric replacements – in this case using the Spark software with adenine as the molecular input you can find suitable replacements as seeds for a library. As the template is extracted from a protein context all the ideas would be generated in the same coordinate frame and thus could be visualized and assessed for fit into the protein.

Alternatively the whole SAM 3D conformation from whichever sub-class could be submitted to Blaze to search for commercial vendor molecules that fit specific field patterns from the specific SAM conformation.

Figure 3. Library design idea for a SMYD-like KMT inhibitor (Left: SAM from SMYD2 and Right: virtual molecule)

Figure 3. Library design idea for a SMYD-like KMT inhibitor (Left: SAM from SMYD2 and Right: virtual molecule)

The output of these virtual exercises, rather than being molecules to test (which is the usual scenario) would be molecular scaffolding ideas that would be potential starting seeds for a design. Ideally we would be looking for a good molecular fit to the interaction patterns (Figure 3) and especially to those which also provide appropriate synthetic vectors from which to explore the allowed variation defined from the starting binding pose.

In this case Spark has provided us with a design idea which matches well to the field patterns and interaction patterns required by the KMT SAM conformation in SMYD2 (PDB: 3S7F) and provides three potential vectors for a library: R1 for the substrate pocket, R2 for the open solvated pocket, R3 for the ribose pocket (Figs 3 and 4).

Figure 4. Interaction patterns and putative library design substitution vectors.

Figure 4. Interaction patterns and putative library design substitution vectors.

A standard protocol for constructing the library might proceed as follows:

  1. Synthetically accessible variants (i.e., commercially available building blocks) of the above library would be gathered and a method outlined, possibly involving
  2. intermediate route scouting for incorporating R2 and R3 variants first and then a final array
  3. fulfilled by elaborating R1.
  4. A virtual ‘all-combinations’ library would be constructed and
  5. the enumerated library analyzed in terms of predicted ‘drug-like’ properties [MWT, LogP, TPSA, (HBD, HBA, Rot.bnd)-counts etc]. Combinations which provide poor properties would be discarded.
  6. Chemistry validation of the synthetic route and scope for the decoration transformations would be established followed by
  7. stability studies on a sub-set before (VIII) final synthetic library construction and (IX) purification and plating (i.e., 96 well plates for screening).

Our library design service offering

Cresset computational chemists have wide knowledge of and experience in delivering projects involving all of the library scenarios described above which we are now able to offer as a service. Contact us for more information.

Cresset Consulting – What Makes a Great Consulting Project?

In this series of blogs, Dr Martin Slater, Director of Consulting Services, talks about what ingredients contribute to the success of a consulting project.

There are three key ingredients vital to the success of a consulting project. First and foremost is the expertise of the consulting team. Second is the quality of the software and the reliability of the science used to deliver the results. Third, and the one variable that changes with each project, the success of a collaboration can hinge on the quality of communication between the customer and the team.

Great People Make a Great Consulting Project

Perhaps it sounds surprising, but I would always put the expertise of the consulting team before the software. Of course, they are both important, and you cannot perform computational chemistry without proven scientific computational methods. But we are talking about scientific research, and it takes excellent scientists to deliver excellent science.

We don’t carry out research by putting a postgraduate in front of a screen so they can input data then tell customers what the software tells them. What Cresset offer our customers is years of combined scientific expertise from a world class computational chemistry team.

Our customers are typically small to medium size companies who do not have the resources to build a dedicated in-house computational chemistry team. They could perhaps choose to hire one computational chemist to work with their medicinal chemists, but this could really limit the breadth of the computational chemistry they can deploy.

Our team of scientists have experience in structural biology, medicinal chemistry, computational chemistry and cheminformatics. Together they have over 100 years of combined industry experience on a very wide range of biological targets and therapeutic areas. In fact, because of our range of consulting projects and industry backgrounds, our team typically has much greater experience than most in-house pharmaceutical computational chemistry teams.

We work on projects from scaffold hopping and ligand based virtual screening, fragment replacement, fragment growth and SAR analysis including 3D QSAR using fields. The following diagram shows the wide range of fields in which Cresset consultants have worked.

And this experience matters. Computational chemistry is not magic, it is science. Scientific tools need scientists to provide intelligent input and interpret the results. It is Cresset’s expertise and depth of experience that really deliver results for our customers.

Cresset's Consultancy Fields
Figure 1: The wide range of fields in which Cresset consultants have worked

All posts in this series:

Martin_Smart_B&W_150x150_2

Dr Martin Slater,
Director of Consulting Services

Building a Focused Library

Better Design is One Way to Increase the Chances of a Successful Screening Hit

The better your library is designed, the better your chances of a successful screening hit. Given knowledge of the protein target, a focused approach to library design leads to far higher hit rates than high-throughput or random screening.

Discover how templates built by Forge can be used to screen compound collections, or fragment libraries, to help to build focused screening libraries with a high chance of success. Read our article in Genetic Engineering News (15th September 2012, Vol. 32, No. 16).

CASE STUDY: Using FieldTemplater in Library Design

There is a delicate balance to be struck in library design between identifying all the chemical scaffolds that are potentially active, whilst retaining a manageable library of compounds that is tractable and cost-effective for routine screening. Field templates can be used in library design to predict activity of compounds both at therapeutic targets as well as at known toxicity targets such as CYP 2D6 and hERG. A range of templates can be derived and used as ‘lenses’ to counterscreen an aggregated library of compounds derived from multiple congeneric series and other sources.

FieldTemplater was used in the example shown to select to select a diverse library of potential H3 antagonists. A series of seven highly active H3 antagonists were identified from the literature and aligned in their bioactive conformations to generate a consensus field template (shown bottom). As confirmation of the predictive capability of this template, the field match score was compared against the known activity (Ki) scores of 68 further H3 antagonists described in the scientific literature and outside the original training set. A good match of fields to activity was confirmed.

The H3 template was then used to counterscreen Cresset’s 4.5M compound collection to identify potential H3 antagonists. A large number of matches were identified, with 68 distinct chemical scaffolds. Since chemical scaffolds that can be expected to show liabilities for serious off-target or toxicity effects should be avoided, the compound matches were also screened against toxicity Field templates for CYP 2D6 and hERG activity. Approximately 4% of the compounds were rejected due to potential 2D6 toxicity and a further 8% due to potential hERG toxicity.


Field based methods can also be used to predict novel bioisosteric compounds that will exhibit the same activity when key fragments of their structure are replaced. Such a tool was used to replace the central core as an alternative library method in order to generate a novel scaffold replacement library. The results of this analysis can be seen in the graph shown below.

The highlighted structures on the graph represent some of the most active known H3 antagonists from the literature and the blue structures represent novel compounds generated by the software. The graph shows a number of novel compounds with diverse central cores that have significantly higher predicted activities at H3 (as shown by higher field similarity score). Five of the more interesting compounds (all of which are novel and have high similarity scores) have been highlighted in yellow.

These highlighted compounds would be ideal candidates for inclusion in the final library as they combine innovation with chemical tractability and high predicted activity. Interestingly, the 2D similarity score of most of the dataset, including all of the highlighted molecules, is less than 0.7, which is a de facto cut-off for 2D based scoring methods. This means that most of these structures would be very unlikely to be considered in a traditional library design process as there would be no reliable way to predict their activity.

Field-Focused Library Design

Article published in Drug Discovery and Development:

When designing libraries against specific targets, a wide range of activities—therapeutic, off-target, and toxicity—must be predicted. Experience has shown that a compound’s biological activity cannot be predicted solely by its 2D structure and we need to use a more detailed description of the molecule. At its most basic level, activity is determined by the interactions of the molecular fields (surfaces) of the target protein with those of the ligand in their respective binding conformations.
Read the full article

FieldAlign v 3.0

Enabling Rapid Design and SAR Interpretation

The latest release of FieldAlign brings new functionality that will increase your SAR knowledge and help you design the best next synthesis. FieldAlign has always given you biologically relevant molecular comparisons that can be used to find the root causes of activity or inactivity.

FieldAlign v3.0 makes this easier by enabling sorting or filtering of your molecules using activity, physical properties or other experimental data that you have generated. Combining the filered view with FieldAlign’s excellent molecular alignments gives you a deep understanding of your compounds and the protein that they are targeting.

Using FieldAlign to design the next best synthesis target is now easier than ever. The new molecular editor enables rapid generation of iterative designs, while support for SMILES representations of molecules makes it easy to assess small virtual libraries to determine the best subset to synthesize. Improved support for copy and paste of molecules between FieldAlign, FieldStere and FieldView further simplifies the design process and makes the communication of your recommendations easy.

FieldAlign v3.0 runs on multiple CPU cores, giving you the option to use all the power of your desktop computer, or for bigger problems you can also use Cresset’s unique FieldEngines running on remote resources such as those on an in-house cluster or in the cloud to massively increase the speed of the calculations. Finally, FieldAlign v3.0 is now available as a command line binary enabling deployment in a wide variety of situations and giving ultimate flexibility in molecular alignment and scoring.

The new FieldAlign interface showing CDK2 compounds:

The new molecule spreadsheet in FieldAlign V3.0 showing the clustered alignment view

The new molecular spreadsheet

Library Design – Novel H3 Anatagonists

There is a delicate balance to be struck in library design between identifying all the chemical scaffolds that are potentially active, whilst retaining a manageable library of compounds that is tractable and cost-effective for routine screening. Field templates can be used in library design to predict activity of compounds both at therapeutic targets as well as at known toxicity targets such as CYP 2D6 and hERG. A range of templates can be derived and used as ‘lenses’ to counterscreen an aggregated library of compounds derived from multiple congeneric series and other sources.

Selected H3 Antagonists

FieldTemplater was used in the example shown to select a diverse library of potential H3 antagonists. A series of seven highly active H3 antagonists were identified from the literature and aligned in their bioactive conformations to generate a consensus field template (shown bottom). As confirmation of the predictive capability of this template, the field match score was compared against the known activity (Ki) scores of 68 further H3 antagonists described in the scientific literature and outside the original training set. A good match of fields to activity was confirmed.

The H3 template was then used to counterscreen Cresset’s 4.5M compound collection to identify potential H3 antagonists. A large number of matches were identified, with 68 distinct chemical scaffolds. Since chemical scaffolds that can be expected to show liabilities for serious off-target or toxicity effects should be avoided, the compound matches were also screened against toxicity Field templates for CYP 2D6 and hERG activity. Approximately 4% of the compounds were rejected due to potential 2D6 toxicity and a further 8% due to potential hERG toxicity.

Analysis of H3 Library

Whilst this set of structures is interesting and useful as an H3 screening library in itself, it has only considered existing compounds which may have low innovation and IPR potential. Field based methods can also be used to predict novel bioisosteric compounds that will exhibit the same activity when key fragments of their structure are replaced. Such a tool was used to replace the central core of some of the 68 compounds resulting from the above screening. The results of this analysis can be seen in the graph shown right.

The highlighted structures in red on the graph represent some of the most active known H3 antagonists from the literature and the blue structures represent novel compounds generated by the software. The graph shows a number of novel compounds with diverse central cores that have significantly higher predicted activities at H3 (as shown by higher field similarity score). Five of the more interesting compounds (all of which are novel and have high similarity scores) have been highlighted in yellow. These compounds would be ideal candidates for inclusion in the final library as they combine innovation with chemical tractability and high predicted activity. Interestingly, the 2D similarity score of most of the dataset, including all of the highlighted molecules, is less than 0.7, which is a de facto cut-off for 2D based scoring methods. This means that most of these structures would be very unlikely to be considered in a traditional library design process as there would be no reliable way to predict their activity.

Download this article as pdf