Designing new molecules in a web browser

Last year we discussed our research aimed at re-imagining molecule design to bring the best of 2D and 3D technologies together in a collaborative environment. The project, code-named TorchWeb, has progressed significantly and is now on the count down to a beta release, expected in the early summer of this year, with an initial release to follow in the autumn.

The web-based interface contains plugin windows with key information. Here, in addition to the ‘Editor’ and ‘3D viewer’ plugins I have: the ‘Designs’ plugin that shows all the molecules that I am currently working on; the ‘LogP’ plugin giving an atomistic breakdown of the calculated logP; the ‘Properties’ plugin showing mutiple physico-chemical properties; and the ‘Similarity search’ plugin that shows similar molecules from a chosen database.

The two central concepts of this new product remain unchanged:

  • Firstly, we want to create an environment where medicinal chemists can draw molecules in a 2D editor, have these automatically converted into a 3D model of how the new molecule would interact with their target, or compare to the molecules that they have made before. To do this we created a new algorithm to grow molecules in 3D which is applied to every change in the 2D molecule.

The 3D pose of the molecule is updated interactively as the molecule is sketched in the 2D window enabling immediate assessment of the potential interactions that could be made by the new molecule. All other plugins also update, giving live similarity searches and logP predictions.

  • Secondly, we recognize that medicinal chemistry designers often work in teams across multiple locations and time zones. Consequently, collaboration had to be central to the application. This has been achieved through session sharing – enabling multiple users to share and simultaneously interact with a design and work on them together.

Joining a shared session enables users to collaborate live on any design, updating the 3D pose and chemical properties as the molecule evolves.

We are working steadily to convert our initial prototype into a full product. Collaboration with selected customers has enabled us to capture detailed requirements and to transition the code base into a robust, secure environment suitable for on-premise installation or deployment in the cloud.

Features have not been ignored! The design application will be joined by a data analysis application that will combine eye catching plots and graphs with 3D protein active site analysis.

Once designed a molecule still has to be made. Here we have embarked on a partnership with Elixir Software’s chemTraX. Together we will provide real time understanding on the status of every molecule from design to analysis, so you only make the molecules that you need to reach your goal.

Want to be among the first to learn more?

Subscribe to our newsletter.

Cresset to participate in the first SCI/RSC Computational Chemistry Workshop on April 10

Talking to a group of medicinal chemists at a conference over lunch raised the following question: “It’s really interesting to see all the clever things that can be done with these software tools, but could we have a meeting where we actually get to try them out for ourselves?” . With this in mind a combined team from SCI and RSC decided to organize a computational chemistry workshop where people could access software and benefit from top quality training from the creators and developers of a range of these tools, each of which address different aspects of pre-clinical drug discovery. All scientists working in this area need tools and techniques for handling chemical information but it is difficult to get an opportunity to try out more than one package at a time and we would all relish a helping hand to get up and running as quickly as possible.

Cresset is always keen to introduce new people to the concept of fields and to demonstrate the ways in which they can be used to design biologically active molecules. We are very happy to welcome Giovanna Tedesco, Senior Product Manager at Cresset, who will present on:


Next generation structure-based design with Flare

Learn how simple structure-based design can be within small molecule discovery projects. The workshop will cover ligand design in the protein active site, Electrostatic Complementarity™ maps and scores, ensemble docking of ligands with Lead Finder, calculations of water stability and locations using 3D-RISM, energetics of ligand binding using WaterSwap and use of Python extensions. Applications you will use: Flare™ , Lead Finder™.

 


Participants will be able to pick 4 out a possible 6 workshops over the day, choosing from sessions covering data processing and visualization; ligand and structure-based design, or ADMET prediction. These are all areas that chemists working in the pharmaceutical, biotech, life sciences and agrochemicals sectors engage with every day. Full details of all workshops are available from SCI and slots will be assigned on a first-come-first -served basis. Most importantly, all software and training materials required for the workshop will be provided for attendees to install and run on their own laptops and use for a limited period afterwards. This will give everyone the chance to take what they have learnt back to their own organisations and try out their newly acquired skills on their own data.

When: April 10, 2019

Where: The Studio, conveniently located next to Birmingham New Street Station, Birmingham, UK

Registration is open and the early bird price of £30 (£40 for non-SCI/RSC members) is available till 27th February. Financial support to cover travel and registration is available for students on application.

I hope you are able to join us for a unique opportunity to get to grips with a wide range of tools and concepts which you can use in your own research.

Find out more and register now.

Caroline Low, PhD, FRSC CChem

SCI Scientific Organizing Committee member and Cresset Discovery Services consultant

Comparing Forge’s command line utility to Blaze – which one should you use?

Here at Cresset we’re very interested in ligand-based virtual screening – it’s been a focus of the company ever since we started more than seventeen years ago. In that time there have been many advances and refinements of the techniques for both ligand-based virtual screening and structure-based methods. We have stuck by our fundamental principle that ligand similarity based on both electrostatics and shape is an excellent way to sort the wheat from the chaff. The results obtained by our services division, who have run more than 200 virtual screening campaigns with a better than 80% success rate, is testament to that.

Difference between falign and Blaze

One of the things our customers ask from time to time is which application should they be using to do virtual screening. The simple answer is that there are two, Forge (and its command-line utility ‘falign’) and Blaze, and the differences are readily apparent.

In falign, you can generate conformations for a large set of molecules, align them to one or more references, and rank them by the similarity score. You also have the option to bias the alignments and scores by adding field constraints, pharmacophore constraints, and protein excluded volumes.

By way of contrast, in Blaze, you can generate conformations for a large set of molecules, align them to one or more references, rank them by the similarity score, and… ok, point taken. So, given that falign and Blaze apparently do the same thing…

Why falign and Blaze?

The answer is scale. As anyone who’s ever played with large data sets knows, doing calculations on a few hundred compounds is fundamentally different to doing them on tens of millions of compounds. Once you are working at large scale, seemingly trivial operations such as filtering data sets become much more difficult if you want to be efficient. Blaze was designed from the ground up to work with large data sets of 107 molecules and more, with an emphasis on maximizing throughput on a computational cluster. Forge/falign on the other hand are much more aimed at small-scale work, enabling simple screening or analysis of relatively small sets of compounds where the big iron of Blaze is overkill.

Data preparation

As an example, let’s look at the preparation of the data set in the two software suites. In falign, this is relatively simple: you provide the compounds to falign in 2D or 3D form, it assigns protonation states as necessary, and computes conformations on-the-fly if required before aligning to the query:


Falign has a secondary mode for use when aligning structurally-related compounds, which ensures that the common substructure within the dataset is perfectly matched:

Blaze, on the other hand, is much more sophisticated in its conformer handling. The average user of Blaze has multiple data sets that they want to screen (in-house compounds, vendor screening compounds, virtual libraries, custom collections), and these often have significant overlap. In addition, these data sets are usually reused multiple times for multiple virtual screens. As a result, Blaze has a sophisticated deduplication and precomputation pipeline that maximizes computational efficiency. The Blaze workflow looks more like this:


Any given chemical structure is only present once within Blaze: it may have multiple different names, and be present in multiple collections, but we’ll only precompute its conformations once and we’ll only align it once in any given screen. The conformer computation pipeline is heavily optimized for performance: we’ve done extensive studies on our conformer generation algorithm XedeX to find the optimal trade-off between conformation space coverage, rejection of higher-energy conformations, calculation time and number of conformations required. In addition, we’ve developed a special-purpose file format that is highly compressed (less than 13 bytes per atom on average, including coordinates, atom types, element, charge and formal charge) while being unbelievably fast to parse.

Blaze has a multiple-step pipeline to filter the data set, so that the full 3D electrostatic shape and alignment algorithm is only applied to molecules that are likely to have a high score. For extremely large data sets there’s an initial filter by FieldPrint, an alignment-free fingerprint method that gives a crude measure of electrostatic similarity. The molecules that pass the filter then go into an ultrafast version of our 3D alignment and similarity algorithm, and the full similarity algorithm is applied only to the best 10% or so of these. As a result, Blaze can chew through millions of molecules very quickly on even a modest cluster. The processing capability of Blaze is further enhanced by the fact that there’s a GPU version which is even faster.

Small versus large data sets

So, falign is designed for the simple use case on small sets of molecules, while Blaze is aimed at maximum computational and I/O efficiency on very large data sets. There is another important difference between the two. As anyone who’s been in charge of maintaining a virtual screening system knows, keeping it up to date is often a painful and thankless task. It’s bad enough keeping up with the weekly additions to the internal compound collection but keeping track of updates to external vendor’s collections is difficult: not only are new compounds being added but old ones are being retired. Blaze makes handling this situation easy. You simply provide Blaze with the new set of compounds that you want to be in the collection, and Blaze will automatically handle the update.


Any new compounds will be added, no-longer-available compounds will be marked and removed from the screening process, and any unchanged compounds will be left alone. This is far more computationally efficient than fully rebuilding the conformations for everything. Blaze can even be directly connected to your internal compound database, so that the Blaze collection holding your in-house compounds is always right up to date.

Given how great Blaze is at handling virtual screening, why would you ever want to use falign?

Blaze is optimized for throughput and computational efficiency, but the downside of this is latency. If you have a set of compounds you want to align and score in Blaze, you have to upload them, wait for Blaze to process them and build the conformations, wait for Blaze to build its indices, initiate a search, and wait for it to be submitted to your cluster queueing system. There’s five- or ten-minute’s latency in all of this, which is fine for a million molecules but is overkill if you have only one hundred. Falign, by contrast, will start work straight away on your local machine with no waiting at all.

The answer to the falign vs Blaze question, then, is largely a question of scale. Got a dataset of a million molecules that you want to run repeated virtual screens against? Blaze is just the ticket. Got a small set of compounds that you want to align and score as a one-off? Forge and falign are just what you need. For our in-house work we tend to find that the tipping point occurs around a few thousand molecules. Falign can easily chew through this many in an hour or so (especially if plugged into your computing cluster using the Cresset Engine Broker). However, if there’s more compounds than this or we’re going to want to run multiple queries, then Blaze it is. Since Blaze is accessible through the Forge front end, and both are accessible through KNIME and Pipeline Pilot, it’s as easy as pie to pick the right tool for the job.

Try on your project

Request a free evaluation to try out Forge or Blaze for your small or large-scale virtual screening needs. Don’t have a cluster? Blaze Cloud and Blaze AWS provide simple ways to access cloud resources to do the number crunching for you.

 

Forge Design: New name, familiar environment

Forge Design is a new licensing level of Forge™ for medicinal and synthetic chemists. It replaces Torch™ and benefits from the familiar GUI, but with V10.6 enhancements.

What can Torch users expect from Forge Design?

The new graphics engine generates enhanced 3D objects, thus delivering strong performance, improved pictures and new smooth transitions between storyboard scenes (Figure 1).


Figure 1: The new graphics engine in Forge Design generates great pictures in an enhanced GUI providing strong performance, faster calculations, improved 2D display of molecules and smooth transitions between storyboard scenes.
 

For larger projects the GUI will be more responsive, with improved performance on common operations such as application of filters, calculation and interaction with custom plots, exporting data. Activity Miner users will experience faster calculations which are less memory intensive.

The 2D display of molecules has been improved to make it clearer and more appealing.

New functionality

Forge Design has a significant number of new features and improvements compared to Torch V10.5. For example:

  • If you are working on a large project, there is a new function to automatically assign the selected molecules to roles, based on their Murcko scaffold
  • The Filters window includes a green/red toggle to control whether each filter is enabled or disabled, and pre-defined structural filters (for example, for groups like Ring, Aromatic Ring, H-bond donor and H-bond acceptor)
  • There is a new function to show the chosen field surfaces as a difference between the two molecules in the 3D display.


Figure 2: New functions in Forge Design include an option to automatically assign molecule to roles based on their Murcko scaffold, improved filters and the new ‘Field Difference’ button to show the chosen field surfaces as a difference between the two molecules in the 3D display. In this case the mono-fluoro derivative on the right is more positive (red) where the F is changed to H but also has a more negative aromatic ring.

  • Improved Molecule Editor, enabling you to conformation hunt and align all the molecules created during the same editing session as you exit the editor
  • Improved Blaze™ results window, which now shows an enrichment plot and statistics for each Blaze refinement level
  • The ideal radial plot profile for your project and the custom settings used for the conformation hunt and alignment can be shared with your team using the new import/export functions
  • Improved display of bonds in the 3D window, greater readability of constraints labels, more intuitive display of interactions between the reference molecules and the protein
  • New function to clip only the display of the protein leaving the ligands untouched
  • New functions to choose column content as the molecule title, or as a label in the 3D window
  • Improved plots now showing a regression line for selected molecules
  • For Activity Miner users, the Disparity matrix can now be filtered by Similarity, Disparity and Δ Activity; there is also a new ‘Find Molecule’ function. Also, you can now tag all the molecules visible in the Activity View in the Forge Molecules table.


Figure 3: In the Forge 10.6 Activity Miner window, the disparity matrix can be filtered by Similarity, Disparity and Δ Activity; molecules that do not pass the filter(s) are shown in gray.

How does Forge Design compare to Torch and Forge?

Figure 4 below shows the modules which are common between Forge and Forge Design (red), and the optional modules in Forge only (blue).


Figure 4: Modules available in Forge only (blue), and modules available in Forge and Forge Design (red).
 

Forge Design uses wizards for common operations just as with the full Forge package.  However, the wizards for building activity models and pharmacophores will be greyed out, as these are optional in Forge Design. The processing window will look slightly different (Figure 5 – right), with the optional Build Model section greyed out.


Figure 5: The wizard and functions available only in full Forge are greyed out in Forge Design.
 

You will find new, pre-defined roles in the Molecules table for training, test and prediction sets. In Forge Design, these roles have no special meaning and you can use them as any other user-created role or ignore them completely.

Simplicity and integration

Having a streamlined platform for ligand-based software makes it easier for existing Forge and Torch users to upgrade their installation to the newer release, with a single installer for both Forge and Forge Design.

This solution also makes it easier to upgrade Forge Design with additional functionality (for example, Activity Miner or the model building package), if desired.

This is a further step towards the integration of all Cresset ligand-based and structure-based functionality and simplifies the product installation and distribution for most customers.

Try Forge Design

If you are an existing Torch user, we will be in contact soon with more information on Forge Design. If you don’t have Forge or Forge Design, contact us to learn more.

Forge V10.6: Choose the molecules to make, and understand why you should make them

读中文。

I am delighted to announce the availability of Forge™ V10.6, our powerful computational chemistry suite for understanding structure-activity relationship (SAR) and new molecule design. The focus of this release is on new and improved methods to generate robust Quantitative Structure-Activity Relationship (QSAR) models with strong predictive ability.

Choose the molecules to make next

Project chemists generally know which molecules they can make with a reasonably good chance of them being active. They often have too many clever ideas and are looking for ways of filtering and prioritizing lists of tangible compounds, arrays and small libraries.

Having a predictive QSAR model is a terrific way of doing this – you send your molecules into the model and get immediate feedback on whether making a compound is a good or bad idea.

However, getting a robust, predictive QSAR model is not always straightforward, and this is still a pain point for many of our users. You need a training data set of reasonable size, good activity data (e.g., pKi, pIC50) spanning a sufficiently large range, good descriptors and good modeling algorithms.

While we can’t help with the need of having a training data set of reasonable size and spread of activity, we can help with the rest.

The new Machine Learning (ML) methods in Forge, namely Support Vector Machines (SVM), Relevance Vector Machines (RVM) and Random Forests (RF) significantly expand the range of available QSAR model building options beyond the previous Field QSAR and k-Nearest Neighbors (kNN) regression options (Figure 1). Having access to a panel of well known, robust statistical tools gives you more opportunities to build a predictive model useful in project work.


Figure 1. The new Machine Learning methods significantly expand the range of QSAR model building options in Forge V10.6.

What about the descriptors?

Forge 3D electrostatic (based on Cresset’s XED force field) and volume descriptors are relevant for molecular recognition, and accordingly work very well for modeling activity and selectivity. These are used by Field QSAR and the new ML methods, while kNN can use either 3D electrostatic/shape or 2D fingerprint similarity.

New methods in action in a practical example

For this experiment, I have re-used an aligned data set of Orexin 2 Receptor ligands from the US patent literature,1 which I previously presented in a case study on Activity Atlas™, a method in Forge for qualitatively summarizing the SAR for a series into a visual 3D model.

I split the 377 Orexin 2 ligands into two subsets: a training set of 302 compounds which I used to build the QSAR models, and a test set of 75 molecules which were used solely to assess their predictive ability.

Figure 2 shows the results obtained with Field QSAR and the ML methods in generating predictive models for OX2R pKi. Field QSAR, kNN and RF models were built using default conditions; for SVM and RVM, Forge suggested a fine tuning of the model building conditions as the training set is large.


Figure 2. Performance of Field QSAR and ML methods on the Orexin 2 data set. Training set = 302 molecules used to build the models. Test set = 75 additional molecules used solely to assess the predictive activity of the models.

‘r2 Training Set’ is used to check the ability of each model to fit the data in the training set. It ranges from 1 (perfect fit) to 0 (no fit). From Figure 1, I can see that all models (except kNN in this case) give excellent results in fitting. However, this is hardly surprising as ML methods are well known for their ability to fit data of any type.

A more realistic check of the quality of the model comes from ‘r2 Training Set CV’. In cross-validation (CV), a part of the compounds in the training set is temporarily excluded from the model and the remaining compounds are used to build a model which is then used to predict the activity of the excluded compounds. Not surprisingly, ‘r2 Training Set CV’ is always lower than ‘r2 Training Set’, but the results for Field QSAR, RF, RVM and especially SVM are still good (kNN does not calculate this statistics).

Finally, ‘r2 Test Set’ gives an idea as realistic as possible of the performance of the model in real project work, as the model is asked to predict the activity of compounds it has never seen before. Most methods give reasonably good results, with SVM clearly outperforming the other methods with a more than respectable r2 test set = 0.59.

In a real project, I would not hesitate to choose SVM for filtering and prioritizing my list of ‘to-make’ compounds, with confidence that this is the best predictive power I can get for this specific data set.

What about kNN? It didn’t work very well on this data set; does it mean that it is not a good method? Not really. kNN is a robust, well known method particularly useful when working with multiple compound series, or with biological data which are derived from different sources. The fact that it didn’t work particularly well in this case does not exclude good performance in other projects.

This is the whole point of having several model building methods available: you can choose the one which gives best performance in your specific project.

If you think it must have been boring to calculate all these models separately, then I have good news: you don’t really have to. The default option in Forge is to automatically run all the ML models and pick the best one for you (Figure 3).


Figure 3. The Automatic model building option in Forge runs all the available ML methods and picks the best model for the output.

Understand why you should make the molecules you have chosen

A significant part of a project chemist’s work is to design the next generation of active molecules. To achieve this, you need to understand what are the features which make some compounds active, and which are those that undermine the activity in others. In other words, you need to interpret the model.

Unfortunately, ML algorithms won’t help you here: they are complicated equations which cannot be easily translated back to 3D in terms of ligand-protein interactions.

Luckily, Forge provides you with two additional tools: Field QSAR 3D views and the Activity Cliffs Summary in Activity Atlas.

Field QSAR, when successful, gives you the best of both worlds, i.e., predictions and interpretation.

Activity Atlas is qualitative only (no predictions) and is great for understanding the SAR for your data using activity cliffs analysis, especially when the SAR landscape is jagged.

Activity Atlas in V10.6 includes a new Activity Cliffs Summary algorithm which generates more detailed SAR maps reducing the reliance on individual compounds, especially useful for small and medium sized data sets.

In Figure 4, you can see the Field QSAR maps compared to the new Activity Cliffs Summary maps for the Orexin 2 data set.


Figure 4. Top: Field QSAR electrostatic (left) and steric (right) coefficients.  Bottom: Activity Cliffs Summary of Electrostatics (left) and Activity Cliffs Summary of Shape (right). Color coding: red = more positive electrostatic favors activity; blue = more negative electrostatic favors activity; green = favorable steric bulk; magenta = unfavorable steric clash.

Both types of maps clearly and consistently indicate where more positive (red) or negative (blue) electrostatics favors activity, and where steric bulk is favorable (green) or forbidden (magenta), providing invaluable indications for ligand design.

I don’t have ‘top quality’ data, but I still need a model

Sometimes the data you have are not as clean as you would like for the purposes of QSAR modeling. You may have % of inhibition data rather than pIC50s or pKis; data generated with different assays; or simply data which are qualitative in nature.

The new ML methods in Forge will work just as well to build classification models for sorting new molecules into existing categories (e.g., active/inactive). Forge will also provide appropriate visual tools (such as the confusion matrix, Figure 5) and classification performance metrics (Precision, Recall, Informedness) to assess the performance of the model and decide if it is good enough to be used in project work.


Figure 5. Confusion matrix for and useful statistics for an Orexin 2 classification model.

Improved graphics and GUI

In Forge V10.6 you will experience strong performance, great pictures and new smooth transitions between storyboard scenes thanks to new graphic engine which generates enhanced 3D objects (Figure 6).


Figure 6. The new graphic engine in Forge V10.6 generates great pictures.

This release includes also many other GUI and usability improvements, including:

  • An improved QSAR Model widget including relevant information and plots for the regression and classification models, PCA component plots, notes, and a ‘pop-up’ button to visually compare different models (Figure 7)
  • An improved interface for handling categorical data in support of classification models
  • Improved Blaze™ results window, showing an enrichment plot and statistics for each Blaze refinement level
  • New function to automatically assign selected molecules to roles, based on their Murcko scaffold
  • New function to run clustering from the main Forge GUI, specifying the desired similarity metric and threshold
  • New option to use all the available local CPUs, relaxing the 16-CPUs limitation of previous Forge releases
  • More responsive GUI for large projects, with improved performance on common operations such as application of filters, calculation and interaction with custom plots, exporting data
  • Faster, more robust and less memory-consuming calculation of Activity Miner™ and Activity Atlas large similarity matrices
  • Improved 2D display of molecules
  • Improved Activity Miner GUI
  • Improved plots now showing a regression line for selected molecules
  • Improved structural filters now including pre-defined filters for Ring, Aromatic Ring, Non-ring atom, Chiral atom, H-bond donor and H-bond acceptor
  • Improved Filters window, now including a green/red toggle to control whether each filter is enabled or disabled.

Figure 7. Compare different QSAR models with the new ‘pop-up’ button in the QSAR Model widget.

Stay tuned for more

Sign up for our newsletter to receive product release announcements, request your free evaluation or contact us to learn more about how Forge can help advance your project.

  1. US patent number 8,653,263B2

Sneak peek at Forge V10.6: Model building focus and much more

读中文。

While the development team is busy giving the finishing touches to Forge V10.6, let’s have a quick look at what is new in this release.

Improved predictions through new models

Forge users told us that the development of QSAR models with strong predictive ability was still a pain point for their projects. Not surprisingly, this is what made us focus on model building in this release.

Forge V10.6 comes with a full panel of well-known and robust Machine Learning (ML) methods (Support Vector Machines, Relevance Vector Machines, Random Forests, kNN classification) which complement those available in previous versions (Field QSAR and kNN regression).

These ML methods can be used to build both regression and classification models, and this is reflected in a QSAR Model widget completely re-designed to provide relevant visualizations and statistics for both model types (Figure 1). While each regression and classification model can be built individually, there is an option in Forge to automatically run all the ML models and pick the best one for you.


Figure 1. Left: Observed vs. Predicted Activity graph for a SVM regression model. Right: Confusion matrix and statistics for a SVM classification model.

Generating qualitative models on small datasets

Activity Atlas is a qualitative method for summarizing the SAR for a series into visual 3D maps that can be used to inform new molecule design. Forge V10.6 includes a new Activity Cliff Summary method which generates more detailed SAR maps by slightly downsizing the importance of the strongest activity cliffs.

You may want to use the new flavor of the method for understanding the SAR of small to medium size data sets, as this will provide a finer level of detail. For larger data sets (e.g., for quickly understanding patent SAR information), the original algorithm will help you focus on the prevalent SAR signals.

More responsive GUI for larger projects

Working with large projects (more than 1,000 molecules with multiple alignments and QSAR models) will be much more efficient in Forge V10.6. You will see improvements in the performance of common operations such as application of filters, calculation and interaction with custom plots, exporting data. The calculation of the large similarity matrices in Activity Miner and Activity Atlas will also be faster, more robust and use less memory.

Furthermore, there is now an option to set-up Forge to use all the available local CPUs, if appropriate, as we have relaxed the 16-CPUs limitation in the previous release of the software.


Figure 2. Forge running on multiple local CPUs.

Improved interface to Blaze for virtual screening

The improved Blaze results window now shows an enrichment plot and statistics for each Blaze refinement level.


Figure 3. Improved interface to Blaze in Forge.

Stay tuned for more

Subscribe to our newsletter to receive the product release announcement, or contact us to learn more about Forge.

Flare Viewer: Free access to Flare for structure-based design

We are pleased to announce the introduction of Flare Viewer, a free licensing option of Flare, our structure-based design application. With Flare Viewer you can easily visualize and analyze your protein-ligand complexes, use our proprietary electrostatics to design new ligands, and communicate your ideas with high quality graphics and pictures.

Focus on ligands

Read in protein-ligand complexes by opening a file in a local or remote disk location, downloading multiple entries from the Protein Data Bank, or by drag-and-drop from your desktop if you are a Windows user. Ligands can be moved into the dedicated ligands table by drag-and-drop, with each ligand keeping the association with the protein it belongs to. Here they can be easily organized into custom groups, to keep your project tidy.

The dedicated ligand table and interactive menu gives easy access to all ligand actions: for example, sorting on any column, control visibility, tagging and filtering on structure, tags and numerical and text columns. A physico-chemical profile is calculated for every ligand and summarized in a fully customizable radial plot and multi-parametric score to help you design and select the ligands with the best fit to your ideal project profile.


Figure 1: The ligand-centric organization of Flare gives easy access to all ligand actions.

Explore ligand-protein interactions

Flare calculates and displays a variety of ligand-protein interactions. These include H-bonds, steric clashes, aromatic-aromatic, cation-pi interactions and more, also including water-mediated and intra-molecular interactions as an option.

Each ligand can be displayed with its associated protein in grid mode making comparisons between ligands or proteins straightforward.


Figure 2: Each ligand can be displayed with its associated protein, making it easy to compare the interactions of different ligands.

Iterative molecular design meets ligand electrostatics

Understanding ligand electrostatics is key in the design of improved ligands. In Flare, electrostatic interaction potentials calculated with the Cresset XED force field can be visualized as ligand fields or by mapping the electrostatic potential onto the ligand’s molecular surface.


Figure 3: Ligand electrostatics can be shown as ligand fields (left) and by mapping the electrostatic potential of the ligand on its surface (right). Color coding: cyan = negative electrostatics; red = positive electrostatics.

Designing new ligands in Flare gives you immediate feedback on electrostatic changes in the context of the protein active site. In the molecule editor, the ligand or a selected part of the ligand can be minimized ensuring bonds, angles and torsions have low energy values.


Figure 4: The Molecule Editor.

Compare multiple proteins

Multiple protein structures can be imported in the same project and displayed in the same frame of reference using the sequence alignment and superimposition functions in Flare. You can choose the protein to superimpose to, whether all proteins are to move and if all residues or selected residues are superimposed. The protein structure can be optimized by flipping flexible residues or changing tautomeric and charge states for relevant residues.

Once opened, the proteins will sit in a dedicated table where all their components (chains, ligands, crystallographic waters and cofactors) are clearly visible, enabling a rapid inspection of specific chains or residues.

Protein surfaces can be displayed and colored by solid color, atom, secondary structure and hydrophobicity, and saved in a dedicated protein surface window.


Figure 5: Comparing multiple protein-ligand complexes is made easy by working in grid mode, showing ribbons and applying different protein surfaces styles.
Important scenes can be captured and annotated in the Storyboard to be recalled when needed.  Images can be easily copied and exported, with many options to configure the image or file size.

A dedicated extended atom picking widget enables complex queries and gives you full control on what is selected and displayed in the 3D window.

Protein viewer with an intuitive GUI

The ribbon menu structure of Flare makes it easy to identify the commands and controls you are looking for, as all actions are always visible and organized in a logical structure.


Figure 6: All actions are always visible in the Flare ribbon menu.

Upgrade to the Flare Python API

Upgrading Flare Viewer to include the Flare Python API will enable you to create your own workflows, automate common tasks, add custom controls and context menus, access Python modules such as the RDKit cheminformatics toolkit, NumPy, SciPy, and Matplotlib. We also provide a collection of featured python extensions that extend the existing Flare functionality.

Discover Flare Viewer

See the features of Flare Viewer, and download your free 1 year license.

Bespoke free licensing options for academic users are also available; see the announcement.

Flare for Academics

We believe that the lively academic environment is an amazing source of new scientific ideas, algorithms and computational methods. Flare for Academics is a free* licensing option of Flare, our structure-based design software, which has specifically been designed for academic users.

Flare for Academics is a user-friendly environment where academic users can easily develop and test their ideas and methods, or plug-in the most interesting open-source algorithms. It extends on the functionality of Flare Viewer to provide an excellent platform for drug discovery, with a focus on ligand design and electrostatics.

Discover the power of the Python API

The Flare Python API gives academic researchers the opportunity to make their science more accessible through integration into a user-friendly environment.

An environment to build upon and create great science

You will benefit from a robust, commercial standard SBDD environment that enables focus on science by utilizing methods such as protein preparation, protein minimization and multi-core docking. Access is also given to the RDKit cheminformatics toolkit, NumPy, SciPy, and Matplotlib, which are all integral to Flare. Beyond these, virtually any other Python module can be pip-installed making Flare infinitely extendable. An ever-growing collection of featured python extensions that enhance the existing Flare functionality are also provided, these include: plotting, protein mutation, and custom workflows (see also the new Jupyter Notebook integration).


Figure 1. The ‘Extensions’ tab in Flare 2.0.

Low-level access to the graphical user interface and internal processes

The Flare Python API not only provides an environment to develop your own algorithms but also a way to deploy them across a wider user base. The API provides access to all elements of the Flare interface through addition of user-defined controls and context menus.

For example, you may add custom controls into an existing Flare ribbon, or create a new Flare ribbon for Python scripts you frequently use. Custom-created controls in Flare can be created as small or large buttons, spin boxes, custom sliders, or complex dialogues with signals and call-back functions (Figure 2).


Figure 2. Some types of custom controls which can be added to a Flare ribbon.

Automate and distribute Flare calculations

Whenever you need to carry out a completely automated task, for example the overnight preparation of a panel of proteins followed by docking of several ligand series, the most convenient option is to write a Python script that runs outside the Flare GUI. It can then be distributed on a cluster via a queueing system for maximum performance. The pyflare binary is a Python interpreter giving you access to Flare functions using either custom developed or Cresset released scripts.

Upgrade Flare with the Jupyter QtConsole

The native GUI of Flare embeds the Python Console and Python Interpreter widgets. The Python Console is the simplest option to run one-line commands. With the Python Interpreter you can handle slightly more complex scripts: for example, you can load a script, interactively edit it inside Flare and then save your modifications. Both the Python Console and the Python Interpreter have a multi-tab interface that makes it possible to work on multiple Python snippets at the same time.

Python enthusiasts can easily upgrade Flare with the Jupyter QtConsole for access to all the Jupyter features, e.g.: TAB completion, auto-indentation, syntax highlighting, context help, inline graphics, and more. Using this widget, you can type Python commands, examine molecules and draw plots, all in the same window.

Upgrade Flare with the Jupyter Notebook

The Flare Python Notebook is an instance of the Jupyter Notebook embedded into the Flare GUI. It has direct access to the Flare GUI objects and methods, offers an even richer interface and enables editing and running individual code cells.


Figure 3. The Python Qt-Console (left) and Python Notebook (right) in Flare.

Not just a viewer

Flare for Academics is not just a viewer, but a complete, user friendly platform for iterative molecule design in drug discovery.

Multiple protein structures can be easily imported in the Flare project and displayed in the same frame of reference using the sequence alignment and superimposition functions.

Flare’s protein preparation will enable you to optimize your protein-ligand structures by adding hydrogen atoms, optimizing hydrogen bonds, removing atomic clashes and assigning optimal protonation states. Further optimization of the protein active site can be achieved by protein minimization based on the XED force field, and by manually flipping flexible residues or changing tautomeric and charge states for relevant residues.


Figure 4. Flare for Academics is user friendly platform for iterative molecule design in drug discovery.

 
Smart visualization of protein-ligand complexes in grid mode facilitates the comparison between ligand or proteins. The display of a variety of non-bonded ligand-protein interactions makes it easy to understand the different binding modes for your ligands.

The ligand-centric structure of Flare includes a dedicated ligand table and interactive menu giving easy access to all ligand actions, such as sorting on any column, control visibility, tagging and filtering on structure, tags and numerical and text columns, grouping of ligands in custom-created roles. In the ligand table, each molecule is associated to calculated physico-chemical properties, a radial plot and a multi-parametric score to help you design and select the ligands that best match the ideal project profile. Ligand electrostatic interaction potentials calculated with the XED force field can be visualized in the 3D window and in the molecule editor, and used to inform ligand design.

Multi-core docking experiments can be run to predict the 3D structure of flexible ligands in the active site of your protein. Docking in Flare uses Lead Finder™ to provide excellent pose prediction and detailed feedback on new molecule designs.

Discover Flare for Academics

See the features of Flare for Academics, and apply for your 1 year license.

* In most countries; contact us to see if you are eligible for a free license.

Python extension enabling Jupyter Notebook integration in Flare released

In a recent post I wrote about Integrating Jupyter Notebook into Flare as a new Python extension dedicated to Python developers and enthusiasts. The Python extension that makes this possible is now released (Figure 1).


Figure 1. The button which enables the Python Notebook extension.
While using it to carry out my daily Python coding tasks, I have identified a number of features that the protoype extension was missing and were worth implementing. So, there are a few more highlights that I’d like to share with you.

As discussed in my previous post, the feature that personally I enjoy most is the fact that the Flare Python Notebook has direct access to the Flare main_window() object, and hence allows you to work on the project currently loaded in the main viewport, i.e., interact with ligands and proteins, visualize molecular and field surfaces, etc. As this involves running the Python code in the main GUI thread, only a single Python Notebook may have access to the GUI at any given time.

However, I thought it would be useful to be able to run other concurrent, separate pyflare processes within the same Python Notebook while the main GUI process is busy doing a computation, e.g., preparing a protein (Figure 2):


Figure 2. Download a PDB complex in the GUI, then run Protein Preparation.
The Python Notebook remains responsive while the Protein Preparation task is run by a FieldEngine process in the background. This means I can open a second Python Notebook tab and, for example, visualize the 2D ligand structure. Since the new notebook tab runs as a separate pyflare process, it does not have access to the Flare main_window() object, as shown by the absence of the Flare icon and by the tooltip (Figure 3):


Figure 3. Open another tab and carry out some other task in a separate process.
Once the calculation has finished, you can switch back to the main tab and keep on working there.

To provide better integration with the Flare GUI, I have moved the familiar ‘Kernel’ notebook menu controls to the bottom of the window (Figure 4):


Figure 4. Restart/Stop commands can be accessed from bottom left buttons.
Also, the Load/Save commands were moved from the File menu to buttons, in order to provide more control on the location the notebooks can be saved to or retrieved from (Figure 5):


Figure 5. Load/Save notebooks through a standard file dialog.
The Python Notebook extension is now ready for download from Developers extension on our GitLab page. I’d be really keen on hearing thoughts and ideas from other Python enthusiasts out there, so please do not hesitate to get in touch if you would like more information, have feedback or have suggestions for new features in the next version of the Python Notebook.

Which macrocycle should I try first? Picking the best linkers with Flare™ and Spark™

At Cresset, we enjoy seeing our products work in synergy. By combining the most recent scientific methods and workflows we deliver solutions to address molecule design challenges. In this post, we use the new Electrostatic Complementarity™ (EC) maps and scores in Flare to help the post-processing of a Spark macrocyclization experiment.

Using Electrostatic Complementarity in Flare to post-process the Spark results

In the case study Using Spark to design macrocycle BRD4 inhibitors, we used Spark, our bioisostere replacement and scaffold hopping tool, to design macrocyclization strategies for non-macrocyclic, pyridone BRD4 inhibitors and evaluate results against experimental data reported by Wang et al [1]. The results showed that Spark successfully reproduced the experimental data.

In a real drug discovery project where no retrospective data is available, it would be useful to have criteria based on the existing knowledge of the system under study helping a further post-processing of the Spark results. Here we show how to use Spark in synergy with the EC maps and scores in Flare, our structure-based design tool, to pick the most promising candidates for synthesis.

Electrostatic interactions are essential for molecular recognition and are also key contributors to the binding free energy ΔG of protein-ligand complexes. Assessing the electrostatic match between ligands and binding pockets provides important insights into why ligands bind and what can be changed to improve binding.

The 100 top scoring results from the BRD4 Spark experiment were opened in Flare using the ‘Send to Flare’ functionality in Spark, which also transfers the related starter molecule (compound 1 in Figure 1) and excluded volume protein (5UEY). The protein was prepared in Flare, removing the water molecules that do not make clear interactions with both the ligand and protein. EC scores and maps were then calculated for compound 1 and the experimentally validated macrocycle 2 reported by Wang et al. towards the same 5UEY protein, as shown in Figure 1. As expected, the EC maps for both compounds show good complementarity to the protein and a very similar EC R score of 0.52/0.53 (Pearson’s r correlation coefficient). Spark linkers showing similar (or better) maps/score should provide interesting ideas for synthesis.


Figure 1: EC maps and scores for compound 1 and macrocycle 2, calculated towards protein 5UEY. Color coding: green = good complementarity; red = electrostatic clash.

Picking the winners

Figure 2 shows a couple of the most interesting linkers in terms of EC score.


Figure 2: EC maps and scores (top panel) for two ‘matching’ Spark linkers, calculated towards protein 5UEY. Color coding: green = good complementarity; red = electrostatic clash. The bottom panel shows electrostatic potential maps for the same Spark results. Color coding: cyan = negative electrostatic; red = positive electrostatic.

In the first example (Figure 2 – left), the π-system in the double bond linker complements the positive electrostatic field at the NH proton of His437 better than compound 1 or a fully saturated linker of similar length, as in macrocycle 2.

Another interesting example of good electrostatic match is the mercaptoethanol linker (Figure 2 – right). The negative electrostatic field of the thioether group is also in close proximity to the polarized NH of His437.

For both compounds, the increase in EC towards the protein is due to the introduction of a more negative ligand electrostatic in the region near His437, as shown by the electrostatic potential maps for both linkers (Figure 2 – bottom).

Discarding the losers

In contrast, an analysis of the EC maps for two of the linkers with the lowest EC scores (Figure 3) immediately highlights the reasons why these should be down-prioritized.


Figure 3. EC maps and scores (top panel) for two ‘clashing’ Spark linkers, calculated towards protein 5UEY. Color coding: green = good complementarity; red = electrostatic clash. The bottom panel shows electrostatic potential maps for the same Spark results. Color coding: cyan = negative electrostatic; red = positive electrostatic.
These linkers expose an area of negative interaction potential towards the carbonyl of Asn443, resulting in a strong electrostatic clash.

Conclusion

Are you surprised that a few linkers with low EC ended up among the top 100 scoring Spark results? Don’t forget that Spark works on ligand similarity. In macrocyclization (and fragment linking) experiments we are stretching the method to explore regions in space where ‘no ligand has gone before’.

In such cases, adding protein information is clearly highly beneficial to help post-processing. EC maps in Flare are an intuitive visual method for rationalizing the choice of the best ideas to progress, while EC scores provide a rapid way of scoring and filtering the 500 Spark results in just a few minutes.

To try Spark or Flare on your projects, request your free evaluation.

  1. Wang, L.; McDaniel, K. F.; Kati, W. M. Fragment-Based, Structure-Enabled Discovery of Novel Pyridones and Pyridone Macrocycles as Potent Bromodomain and Extra-Terminal Domain (BET) Family Bromodomain Inhibitors. J. Med. Chem. 2017, 60 (9), 3828–3850.