KNIME nodes V2.5.0 released

A new release of workflow components for the KNIME™ environment is now available. This includes nodes for the Machine Learning methods in Forge™, nodes for accessing Flare™ functionality through the Flare Python API, and a number of enhancements to existing components.

To illustrate these enhancements, I created an integrated workflow to automatically perform qualitative and quantitative Structure Activity Relationships (SAR) analysis for patent data from Bindingdb.

An integrated workflow for SAR analysis

Reading the Rapid interpretation of patent SAR using Forge blog post, I thought it would be very nice to have a workflow to analyze Bindingdb data in an automated manner requiring minimal human intervention. Cresset workflow solutions are ideal for this, and to test the feasibility of this idea, I put together the KNIME workflow shown in Figure 1.


Figure 1. An integrated KNIME workflow for SAR analysis of Bindingdb patent data.
The workflow is divided in 5 blocks which are briefly described below.

Data preparation

This block of nodes prepares the raw data downloaded from Bindingdb (in this case, from the US9321756 patent: ‘Azole compounds as PIM inhibitors’) for the SAR analysis.

Nodes which require manual input are labelled in Figure 1. I need to specify the name and location of the csv file I want to use; choose the biological target I am interested in (US9321756 reports activity for two biological targets – ‘PIM’ and ‘PIM-1’: I used PIM-1) and make sure I am working with the correct activity column (IC50, for PIM-1).

There are also nodes to filter away IC50 missing values and those with ‘higher than’ (>) or ‘lower than (<) modifiers, which transform the activity values into pIC50, calculate mean pIC50 values for compounds which were tested multiple times on PIM-1, and remove those compounds where the mean pIC50 is associated with a high standard deviation (>0.7). Finally, the compounds are sorted in order of descending activity to enable an activity stratified partitioning in a training and test set.

The ‘Histogram’ node (at the bottom in Figure 1) can be used to check that the distribution and range of the activity values meet the conditions for building robust qualitative and/or quantitative SAR models in Forge. In this case (Figure 2), the activity range covers almost 4 log units and the distribution is reasonably even, so I can confidently go ahead with the model building.


Figure 2. Range and distribution of the PIM-1 pIC50 values from the US9321756 patent.

Reference ligand

This bit of the workflow downloads the protein-ligand complex with PDB code: 4TY1 and sends it to the new ‘pyflare’ node to prepare the protein, extract the reference ligand, and remove the crystallographic water molecules. The pyflare node allows the Flare Python API to be used from within KNIME, enabling access to all the Flare functionality.

The ‘Flare Viewer’ node, also new in this release, can be used to launch Flare and visualize the results, as shown in Figure 3.


Figure 3. The prepared 4TY1 protein and the 38W reference ligand. Crystallographic water molecules were removed using the ‘pyflare’ KNIME node.

Alignment

Here I used the ‘Forge Align’ node to align the molecules in the training and test sets to the 38W reference ligand from PDB: 4TY1, using the protein as an excluded volume. I configured the node to use the ‘Exhaustive’ setting (which runs a more accurate conformation hunt), to assign formal charges to the molecules according to the Cresset rules, and to align the molecules by Maximum Common Substructure (MCS), as shown in Figure 4. To speed up the alignment process, I configured KNIME to use the Cresset Engine Broker.


Figure 4. Configuration of the ‘Forge Align’ node used in this case study.

SAR analysis and visualization

In this final part of the workflow, qualitative and quantitative SAR models are calculated using the ‘Forge Build Field QSAR’, ‘Forge Build Activity Atlas’ and the new ‘Forge Build Machine Learning’ nodes. The visualization is mainly done using the ‘Forge Project Viewer’ node, but as an alternative I could use the ‘Forge Project Writer’ node to save the results into separate project files to view at a later stage.

Field QSAR model

The Field QSAR method uses Forge 3D electrostatic (based on Cresset’s XED force field) and volume descriptors to create an equation that describes activity, using Partial Least Squares (PLS) analysis.

For this case study, I configured the ‘Forge Build Field QSAR’ node to use the ‘Weight molecules by similarity’ option, as shown in Figure 5 – left, which weighs each molecule according to its similarity to the reference. Using this setting will downweigh the importance of training set molecules not optimally aligned to the reference (and accordingly associated with a lower similarity), and may generate better models in those cases where the alignment is not carefully curated.

The Field QSAR model shows a Q2 (training set CV, LOO) = 0.52 and a R2 (test set) = 0.65. Visual inspection of the training and test sets and inspection of the PCA plot (Figure 5 – right) reveal that there is a group of compounds with an incorrect protonation state on the pyridine ring. Recalculating the model after removing these compounds from the training and test sets gives a model with similar statistics (Q2 training set CV, LOO = 0.57 and R2 test set = 0.58).


Figure 5. Left: Configuration of the ‘Forge Build Field QSAR’ node used in this case study. Right: the PCA plot highlights a group of compounds in the training set with incorrect protonation state of the pyridine ring.
This is a reasonably good starting model which can possibly be improved by further curation of the protonation state and alignment of specific compounds.

Machine Learning model

The new ‘Forge Build Machine Learning’ node can be used to generate Machine Learning (ML) regression or classification models in KNIME, using Forge 3D electrostatic and volume descriptors. You can decide which model type will be generated (choosing from k-Nearest Neighbors, Random Forest, Relevance Vector Machine or Support Vector Machine), but for this case study I kept the default ‘Auto’ option, which automatically runs all the ML models and pick the best one for the output. To calculate the predicted pIC50 values for the molecules in the test set, I used the new ‘Forge Score Machine Learning’ node.

The best performance is obtained with a Support Vector Machine model showing a Q2 (training set CV) = 0.62 and a R2 (test set) = 0.71. Also in this case, recalculating the model excluding the molecules with an incorrect protonation in the pyridine ring from the training and test sets gives a model with similar statistics (Q2 training set CV = 0.57 and R2 test set= 0.69).

For this data set, the SVM model is marginally more predictive than the Field QSAR model.

Activity Atlas models

Activity Atlas™ models generate a simple, qualitative picture of the critical points in the SAR landscape. In particular, the ‘Activity Cliff Summary’ views highlight regions of acute SAR and are a useful starting point to understand the data.

The new default ‘Weighted Sum’ Activity Cliffs Summary algorithm in the ‘Forge Build Activity Atlas’ node generates more detailed SAR maps by reducing the reliance on individual compounds, and is especially useful for small and medium sized data sets.

As this dataset is relatively large though, I also built an alternative Activity Atlas model using the original ‘Sum’ algorithm, which instead focuses on the prevalent SAR signals, and I compared the maps obtained with the two methods. Default conditions were used for all the other options. The two models are shown side by side in Figure 6, and for this case study give very similar results, comparable to those of the original blog post.


Figure 6: Activity Atlas model showing the ‘Activity Cliff Summary of Electrostatics’ and ‘Activity Cliff Summary of Shape’ views. Left: Activity Atlas maps calculated with the new ‘Weighted Sum’ algorithm. Right: Activity Atlas maps calculated with the original ‘Sum’ algorithm. Red / Blue = positive / negative electrostatics preferred for greater activity; Green / Pink = steric bulk is favorable/disfavorable in this region.

Also new and improved in this release

V2.5 Cresset KNIME nodes also include additional new features and improvements:

  • New example workflows to illustrate the usage of Cresset KNIME nodes
  • New Surface Writer node to write molecule surfaces to a directory
  • Spark™ Database Search node: new options to set field and pharmacophore constraints, use multiple reference molecules to guide the search, specify a database to search in addition to automatically detected databases, additional similarity metrics (Tanimoto and Tversky)
  • Forge Align node: additional similarity metrics (Tanimoto and Tversky), new option to remove boat conformations before performing the alignment
  • Forge Build Activity Atlas node: new options to write the Activity Atlas surfaces to a directory and to specify a column in the input table to use for the similarity data
  • Forge Build Machine Learning: new Weighting Scheme option in support of kNN models
  • Forge Project Roles Extractor node: new option to output additional data for each role
  • Improved import of ligand and protein files, with most nodes now accepting ligands in SMILES format and proteins in PDB format.

Conclusion

The KNIME workflow built for this post can quickly run a preliminary qualitative and quantitative SAR analysis of any interesting patent data in Bindingdb in an automated manner requiring minimal human intervention. For the US9321756, running this workflow took approximately 30 minutes and resulted in a SVM quantitative SAR model with reasonable predictive ability and clear SAR maps for Activity Atlas.

Cresset customers can contact support to get these new components free of charge. Request a free evaluation to try the software yourself.

Electrostatic Complementarity™ scores: How can I use them?

Flare™ V2 introduced a new analysis method called Electrostatic Complementarity (EC). The basic idea is quite simple: the maximum electrostatic affinity between the ligand and the receptor is achieved when the electrostatic potentials of the ligand and the receptor match (that is, have the same magnitude and opposite sign). At first glance that seems obvious. At second glance it seems a bit more surprising – why wouldn’t it be a good idea to have an even larger potential on the ligand to make the interaction energy even better? The reason is that the improved electrostatic interaction energy between the ligand and the protein will be cancelled out by the increase in desolvation penalty for the ligand.

So, all we need to do is to compute the electrostatic potential of the ligand and the protein over a suitable contact surface, and then compute some sort of correlation metric to measure how similar they are. In a vacuum, this calculation would be quite straightforward. Unfortunately, water (as usual) makes everything much more complicated. In the absence of running long dynamics simulations, we’re going to have to approximate the solvent effects somehow. I’m not a great believer in continuum solvent approximations for this purpose, as water in and around a protein active site is very far from being a continuous dielectric. However, we must do something to account for the water. Our answer is a mix of a complex dielectric function and special treatment of formal charges, which we’ve already show works well for visualizing the electrostatic potentials inside a protein active site (ref to earlier blog post on protein interaction potentials).

 


Figure 1: Electrostatic potentials and surface complementarity for the biotin-streptavidin complex.

So, we can compute the potentials (J. Med. Chem., 2019, 62 (6), pp 3036–3050), we can visualize them by coloring the surface by the complementarity (Figure 1), and we can compute an overall EC score. The question now is ‘Does it actually do anything useful?’.

Well, we’re computing an overall EC score, so the obvious thing to check is if the score correlates with activity. We have done this for lots of data sets (see Figure 2 and the J Med Chem paper referred to earlier), and you get anywhere from a modest (r2=0.33 for RPA70N) to very good (r2=0.79 for PERK) correlation. Problem solved, then: just dock your ligand designs into your protein, compute EC scores, and pick the one with the highest EC score to make!

 


Figure 2: Correlation of EC scores with activity for a range of data sets.

Unfortunately, it’s not actually that simple. While we do show that EC score correlates with activity for a wide variety of data set on different targets, these data sets are very carefully curated. The reason is that the binding of a ligand to a protein depends on many different physical effects. Electrostatics is one of these, and a very important one, but it’s not the only one, so EC score is only going to predict activity differences where the other effects do not change.

The data sets used to get the correlations in Figure 2 are very conservative: the ligands  within each set are all very closely related, they are all very close to the same size, they have much the same number of rotatable bonds, they have consistent binding modes, and so on. In addition, we find that to get a strong correlation you need to minimize alignment noise (much as you do when generating a good 3D QSAR), so we align all the ligands on a common substructure rather than relying on a free dock.

All other things being equal, then, a higher EC score should give you a higher affinity. Unfortunately, in the real world, all other things are rarely equal, and so unless you are looking at quite conservative changes (for example asking where on my ligand I could substitute a fluorine to improve affinity) the EC scores are likely to be a poor guide. Back to the original question, then: ‘Does it actually do anything useful?’.

Luckily, although the single numeric EC score is very sensitive to placement of the molecule in the active site, the distribution of EC values over the surface is much more robust. The primary use of the EC method isn’t the calculation of scores: it’s the visualization of where your molecule is matching the protein electrostatics well, and where it isn’t matching as well (Figure 3). This gives you hints as to where you might want to make changes to your molecule, and what changes you might want to make: add a halogen? Move a nitrogen in a heterocycle? Small electrostatic interactions to halogen atoms or to the edges of aromatic rings are hard to visualize any other way.

 


Figure 3: The mGLU5 inhibitor on the left has a minor electrostatic clash on the pyridine ring, as seen in the EC surface coloring on the left. Placing a fluorine in this position removes the clash and improves affinity.

The primary use of the EC method, then, is analyzing your ligands and pointing out where improvements can be made. You can be confident that these suggestions are sensible, as we have shown in multiple data sets that where the difference between ligands is primarily electrostatic the EC score correlates with affinity. However, the EC scores themselves aren’t a general predictor of affinity, as there are many factors not included in the score that can make a molecule a better or worse binder.

If you’d like to visualize the electrostatics of your molecules in their active site and get guidance on how to improve them, request a free evaluation of Flare.

Run Cresset calculations in the cloud with Cresset Engine Broker™ and ElastiCluster

In this post I will describe how to instantiate a cluster on Google Cloud (or any other cloud platform supported by ElastiCluster) and use it to run calculations with any Cresset desktop application, i.e., Forge, Spark or Flare, on any platform – Windows, Linux or macOS.

Configuring ElastiCluster is simple, and it is a one-off operation. Once this is done, you can spin up a Linux cluster in the cloud with a single command:

elasticluster start cebroker

Within a few minutes, the cluster will be up and running, and ready to run Cresset calculations via the Cresset Engine Broker.

Once your calculations are finished and you do not need the cluster anymore, you can switch it off with another single command:

elasticluster stop cebroker

How to

As I was new to Elasticluster and to Google Cloud, I followed the excellent Creating a Grid Engine Cluster with Elasticluster tutorial to understand how to enable the Google Cloud SDK, and generate credentials to grant ElastiCluster access to my Google Cloud account.

The rest of this blog post concentrates on the steps required to get this to work in a Linux bash shell. The same workflow runs smoothly in a macOS Terminal and in a Windows bash shell.

The process that I will follow is:

  1. Install ElastiCluster and its dependencies in a virtual environment
  2. Create an ElastiCluster configuration file
  3. Spin up the cluster and use it with Cresset desktop apps

Not all the steps are necessary if you have a VPC already set-up and running on Google Cloud, but this details how to start from scratch in the minimum time.

Install ElastiCluster and its dependencies in a virtual environment

Firstly, I suggest to follow the advice in the Creating a Grid Engine Cluster with Elasticluster tutorial  and install virtualenv, in order to create a separate Python environment for ElastiCluster:

paolo@cresset77 ~/blog$ pip install virtualenv
Collecting virtualenv
Downloading https://files.pythonhosted.org/packages/33/5d/314c760d4204f64e4a968275182b7751bd5c3249094757b39ba987dcfb5a/virtualenv-16.4.3-py2.py3-none-any.whl (2.0MB)
100% |################################| 2.0MB 7.7MB/s
Installing collected packages: virtualenv
Successfully installed virtualenv-16.4.3

Then, create and activate a new elasticluster virtual environment:

paolo@cresset77 ~/blog$ virtualenv elasticluster
No LICENSE.txt / LICENSE found in source
New python executable in /home/paolo/blog/elasticluster/bin/python2
Also creating executable in /home/paolo/blog/elasticluster/bin/python
Installing setuptools, pip, wheel...
done.
paolo@cresset77 ~/blog$ cd elasticluster/
paolo@cresset77 ~/blog/elasticluster$ . bin/activate
(elasticluster) paolo@cresset77 ~/blog/elasticluster$

Next, clone from GitLab the Cresset fork of the elasticluster repository which contains a few Cresset-specific Ansible playbooks required to automatically set up the Cresset Engine Broker and FieldEngine machinery on the cloud-hosted cluster:

(elasticluster) paolo@cresset77 ~/blog/elasticluster$ git clone git@gitlab.com:cresset-opensource/elasticluster.git src
Cloning into 'src'...
remote: Enumerating objects: 13997, done.
remote: Counting objects: 100% (13997/13997), done.
remote: Compressing objects: 100% (4820/4820), done.
remote: Total 13997 (delta 8383), reused 13960 (delta 8375)
Receiving objects: 100% (13997/13997), 5.23 MiB | 1.21 MiB/s, done.
Resolving deltas: 100% (8383/8383), done.

Finally, install elasticluster dependencies:

(elasticluster) paolo@cresset77 ~/blog/elasticluster$ cd src
(elasticluster) paolo@cresset77 ~/blog/elasticluster/src$ pip install -e .
Obtaining file:///home/paolo/blog/elasticluster/src
Collecting future (from elasticluster==1.3.dev9)
Downloading https://files.pythonhosted.org/packages/90/52/e20466b85000a181e1e144fd8305caf2cf475e2f9674e797b222f8105f5f/future-0.17.1.tar.gz (829kB)
100% |################################| 829kB 6.6MB/s
Requirement already satisfied: pip>=9.0.0 in /home/paolo/blog/elasticluster/lib/python2.7/site-packages (from elasticluster==1.3.dev9) (19.0.3)

[...]

Running setup.py develop for elasticluster
Successfully installed Babel-2.6.0 MarkupSafe-1.1.1 PrettyTable-0.7.2 PyCLI-2.0.3 PyJWT-1.7.1 PyYAML-5.1 adal-1.2.1 ansible-2.7.10 apache-libcloud-2.4.0 appdirs-1.4.3 asn1crypto-0.24.0 azure-common-1.1.18 azure-mgmt-compute-4.5.1 azure-mgmt-network-2.6.0 azure-mgmt-nspkg-3.0.2 azure-mgmt-resource-2.1.0 azure-nspkg-3.0.2 bcrypt-3.1.6 boto-2.49.0 cachetools-3.1.0 certifi-2019.3.9 cffi-1.12.2 chardet-3.0.4 click-7.0 cliff-2.14.1 cmd2-0.8.9 coloredlogs-10.0 contextlib2-0.5.5 cryptography-2.6.1 debtcollector-1.21.0 decorator-4.4.0 dogpile.cache-0.7.1 elasticluster enum34-1.1.6 funcsigs-1.0.2 functools32-3.2.3.post2 future-0.17.1 futures-3.2.0 google-api-python-client-1.7.8 google-auth-1.6.3 google-auth-httplib2-0.0.3 google-compute-engine-2.8.13 httplib2-0.12.1 humanfriendly-4.18 idna-2.8 ipaddress-1.0.22 iso8601-0.1.12 isodate-0.6.0 jinja2-2.10.1 jmespath-0.9.4 jsonpatch-1.23 jsonpointer-2.0 jsonschema-2.6.0 keystoneauth1-3.13.1 monotonic-1.5 msgpack-0.6.1 msrest-0.6.6 msrestazure-0.6.0 munch-2.3.2 netaddr-0.7.19 netifaces-0.10.9 oauth2client-4.1.3 oauthlib-3.0.1 openstacksdk-0.27.0 os-client-config-1.32.0 os-service-types-1.6.0 osc-lib-1.12.1 oslo.config-6.8.1 oslo.context-2.22.1 oslo.i18n-3.23.1 oslo.log-3.42.3 oslo.serialization-2.28.2 oslo.utils-3.40.3 paramiko-2.4.2 pathlib2-2.3.3 pbr-5.1.3 pyOpenSSL-19.0.0 pyasn1-0.4.5 pyasn1-modules-0.2.4 pycparser-2.19 pycrypto-2.6.1 pyinotify-0.9.6 pynacl-1.3.0 pyparsing-2.4.0 pyperclip-1.7.0 python-cinderclient-4.1.0 python-dateutil-2.8.0 python-gflags-3.1.2 python-glanceclient-2.16.0 python-keystoneclient-3.19.0 python-neutronclient-6.12.0 python-novaclient-9.1.2 pytz-2018.9 requests-2.21.0 requests-oauthlib-1.2.0 requestsexceptions-1.4.0 rfc3986-1.2.0 rsa-4.0 scandir-1.10.0 schema-0.7.0 secretstorage-2.3.1 simplejson-3.16.0 six-1.12.0 stevedore-1.30.1 subprocess32-3.5.3 typing-3.6.6 unicodecsv-0.14.1 uritemplate-3.0.0 urllib3-1.24.1 warlock-1.3.0 wcwidth-0.1.7 wrapt-1.11.1

Create an ElastiCluster configuration file

Open your favourite text editor and create a ~/.elasticluster/config file with the following content:

# slurm software to be configured by Ansible
[setup/ansible-slurm]
provider=ansible
# ***EDIT ME*** add cresset_flare only if you plan to run Flare calculations
frontend_groups=slurm_master,cresset_common,cresset_broker,cresset_flare
compute_groups=slurm_worker,cresset_common
# ***EDIT ME*** set the path to your Cresset license
global_var_cresset_license=~/path/to/your/cresset.lic

# Create a cloud provider (call it 'google-cloud')
[cloud/google-cloud]
provider=google
# ***EDIT ME*** enter your Google project ID here
gce_project_id= xxxx-yyyyyy-123456
# ***EDIT ME*** enter your Google client ID here
gce_client_id=12345678901-k3abcdefghi12jklmnop3ab4b12345ab.apps.googleusercontent.com
# ***EDIT ME*** enter your Google client secret here
gce_client_secret=ABCdeFg1HIj2_aBcDEfgH3ij

# Create a login (call it 'google-login')
[login/google-login]
image_user=cebroker
image_user_sudo=root
image_sudo=True
user_key_name=elasticluster
user_key_private=~/.ssh/google_compute_engine
user_key_public=~/.ssh/google_compute_engine.pub

# Bring all of the elements together to define a cluster called 'cebroker'
[cluster/cebroker]
cloud=google-cloud
login=google-login
setup=ansible-slurm
security_group=default
image_id=ubuntu-minimal-1804-bionic-v20190403
frontend_nodes=1
# ***EDIT ME*** set compute_nodes to the number of worker nodes that you
# wish to start
compute_nodes=2
image_userdata=
ssh_to=frontend

[cluster/cebroker/frontend]
flavor=n1-standard-2

[cluster/cebroker/compute]
# ***EDIT ME*** change 8 into 2, 4, 8, 16, 32, 64 depending on how many cores
# you wish on each worker node
flavor=n1-standard-8

The only bits that you will need to edit are those highlighted in red, i.e.:

  • The path to the Cresset license on your computer
  • Add cresset_flare if you are planning to run Flare calculations on the cluster
  • Your Google Cloud credentials
  • The number/type of nodes that you wish to use in your cluster

The example above uses a dual-core node to act as the head node (the Cresset Engine Broker does not need huge resources) and starts two 8-core nodes as worker nodes; you may well want to use more and beefier nodes, but this is meant as a small, inexpensive example.

At this stage, you are done with the configuration part. Please note that you only need to do this once; the only future action that you may want to carry out is changing the configuration to modify the type/number of your worker nodes.

Spin up the cluster and use it with Cresset desktop apps

This is the single command that you need to run, as advertised in the blog post headline:

(elasticluster) paolo@cresset77 ~/blog/elasticluster$ elasticluster start cebroker
Starting cluster `cebroker` with:
* 1 frontend nodes.
* 2 compute nodes.
(This may take a while...)

If you feel like having a coffee, this is a good time to brew one – bringing up the cluster will take a few minutes.

If everything works well, at the end of the process you should see:

Your cluster `cebroker` is ready!

Cluster name:     cebroker
Cluster template: cebroker
Default ssh to node: frontend001
- compute nodes: 2
- frontend nodes: 1

To login on the frontend node, run the command:

elasticluster ssh cebroker

To upload or download files to the cluster, use the command:

elasticluster sftp cebroker

Now ssh into your frontend node, forwarding port 9500 to localhost:

(elasticluster) paolo@cresset77 ~/blog/elasticluster$ elasticluster ssh cebroker -- -L9500:localhost:9500
Welcome to Ubuntu 18.04.2 LTS (GNU/Linux 4.15.0-1029-gcp x86_64)
 
* Documentation:  https://help.ubuntu.com
* Management:     https://landscape.canonical.com
* Support:        https://ubuntu.com/advantage

Get cloud support with Ubuntu Advantage Cloud Guest:
http://www.ubuntu.com/business/services/cloud
This system has been minimized by removing packages and content that are
not required on a system that users do not log into.

To restore this content, you can run the 'unminimize' command.

0 packages can be updated.
0 updates are security updates.

Last login: Mon Apr  8 21:28:16 2019 from 81.144.247.249

cebroker@frontend001:~$

Tunnelling port 9500 through ssh creates a secure, encrypted SSL connection to the Cresset Engine Broker running in the cloud. You probably won’t need this if you already have configured your VPC in Google Cloud.

You may easily verify that the Cresset Engine Broker is already running on the frontend node, waiting for connections from Cresset clients:

cebroker@frontend001:~$ systemctl status cresset_cebroker2
* cresset_cebroker2.service - server that allows Cresset products to start/connect to remote FieldEngines
Loaded: loaded (/etc/systemd/system/cresset_cebroker2.service; static; vendor preset: enabled)
Active: active (running) since Mon 2019-04-08 21:41:50 UTC; 8min ago
Process: 20917 ExecStop=/bin/bash /opt/cresset/CEBroker2/documentation/init.d/cresset_cebroker2.sh stop /var/run/CEBroker2.pid (code=exited, status=0/SUCCESS)
Process: 20930 ExecStart=/bin/bash /opt/cresset/CEBroker2/documentation/init.d/cresset_cebroker2.sh start /var/run/CEBroker2.pid (code=exited, status=0/SUCCESS)
Main PID: 20956 (CEBroker2.exe)
Tasks: 0 (limit: 4915)
CGroup: /system.slice/cresset_cebroker2.service
> 20956 /opt/cresset/CEBroker2/lib/CEBroker2.exe --pid /tmp/tmp.Y7Wvk135fM -p 9500 -P 9501 -s /opt/cresset/CEBroker2/documentation/examples/start_SLURM_engine.sh -m 16 -M 16 -v

Now, start a Forge client running on your local machine.

In the Processing Preferences, configure the Cresset Engine Broker to Host localhost, Port 9500, and press the ‘Test Connection’ button; you should see ‘Connection OK’:

Now, align a dataset against a reference. In addition to your local processes, you should see 16 remote processes running on your cloud-hosted cluster:

Using remote computing resources requires a Cresset Engine Broker license and a Remote FieldEngine license; please contact us if you do not currently have such licenses and wish to give this a try. And do not hesitate to contact our support team if you need help to get the above working for you.

Review of Symposium ‘Innovative Software for Molecule Discovery and Design’, New Delhi, India

Manoranjan Singh Sidhu, Neotel Systems & Services (Cresset distributor, India)

On 12th April 2019, the International Centre for Genetic Engineering and Biotechnology, New Delhi, India, hosted the symposium ‘Innovative Software for Molecule Discovery and Design’. Delegates learnt how experienced users at the Institute had used Cresset computational tools to efficiently discover better compounds.

Dr Robert Scoffin, Cresset CEO, opened the symposium with an explanation of Cresset’s patented XED force field. He spoke in detail about the ligand-based and structure-based applications, emphasizing how Cresset technology helps researchers with better visualization and ease of use. Dr Scoffin explained how, Flare™, a new application enabling enhanced designs by using new approaches to protein-ligand analysis, can streamline new molecule design using Electrostatic Complementarity™; this provides rapid activity prediction with visual feedback on new molecule designs and proves invaluable for understanding ligand binding, structure-activity relationships and for ranking new molecule designs.

In a demonstration of Spark™, Cresset’s scaffold hopping and R-group exploration application, Dr Scoffin showed how it can be used to generate highly innovative ideas for your project to escape IP and toxicity traps. Dr Scoffin explained how Spark gives a single assessment of 20 different datasets, thus providing greater insight and adding greater value when compared with alternative applications that build libraries from larger datasets.

Dr Suneel Kumar, Cresset Application Scientist, demonstrated SAR analysis using the Activity Miner™ and Activity Atlas™ components of Forge™, showing how these modules are useful in understanding SAR of the current dataset and how they provide insights to design better molecules.

The symposium was very interactive with many delegates asking questions regarding visualization and scoring correlations, how Cresset software can help fill gaps or complement their existing infrastructure, how large datasets could be reduced to a lesser number of assessments so as to understand the results better, and how Cresset technology could help with synthetic feasibility and commercial availability of a lead molecule.

I encourage anyone who is interested in learning more about how Cresset applications can advance their molecule design projects to request a free evaluation of ligand-based applications or structure-based applications and subscribe to the Cresset newsletter.

“As per the presentation and demonstration, the software provides excellent visualization and gives reliable results, and we look forward to evaluating Forge, Spark and Flare.”

Bioinformatics Group, International Centre for Genetic Engineering and Biotechnology, New Delhi, India

Application of Spark to the discovery of novel LpxC inhibitors

Novartis Institutes for Biomedical Research recently published the paper Application of Virtual Screening to the Identification of New LpxC Inhibitor Chemotypes, Oxazolidinone and Isoxazoline. They report using Spark™, Cresset’s scaffold hopping application, to find core replacements for the indazole moiety of compound 6 in Figure 1. Visual selection of the most promising top-ranking Spark results and further optimization studies led to the identification of novel LpxC inhibitors with subnanomolar binding to LpxC and in vivo antibacterial activity against P. aeruginosa and other Gram-negative bacteria.

Scaffold hopping

The bioactive conformation of compound 26, a simplified version of compound 6 originally reported by Actelion, was used as the starter molecule for the Spark scaffold hopping experiment. Aim of the search was to identify appropriate replacements of the indazole core (Figure 1).

 


Figure 1. Top: LpxC inhibitor 6. Bottom: The bioactive conformation of compound 26 was used as the starter molecule for the scaffold hopping experiment with Spark.

The allowed atom types for the ‘linker 1 atom’ and ‘atom 2’ (see Figure 2) were set respectively to ‘any carbon atom’ and ‘any atom’. It was further specified that all Spark designs must contain at least 1 ring and should not include any reactive functionalities. The Spark experiment was performed on fragment databases derived from ZINC,1 ChEMBL,2 and the VEHICLe3 collection of theoretical ring systems.

The similarity score of the Spark results towards compound 26 was calculated using 50% field and 50% shape similarity. The 100 top ranking clusters were manually reviewed with respect to synthetic feasibility, introduction of hydrophilic groups in the area of the indazole moiety and calculated physicochemical properties.

 


Figure 2. Allowed atom types for ‘linker 1 atom’ and ‘atom 2’  in the Spark experiment.

The oxazolidinone 13 and isoxazoline 25 scaffolds were shortlisted among several proposals which link ‘linker 1 atom’ and ‘atom 2’ with an hydrophilic linker (Figure 3).

Analogues of 13 and 25, where the methoxy group in para position was replaced by a bromine atom (Figure 3) showed cellular activity with Minimal Inhibitory Concentrations (MIC) values below 4 μg/mL against P. aeruginosa. Accordingly, these series were selected for further investigation.

 


Figure 3. Oxazolidinone 13a and isoxazoline 25a showed  MIC <4 μg/mL against P. aeruginosa.

SAR optimization

Further investigation and expansion of both oxazolidinone and isoxazoline series led to the identification of compounds with potent in vitro activity against P. aeruginosa and other Gram-negative bacteria. Representative compound 13f (Figure 4) demonstrated excellent efficacy against P. aeruginosa in an in vivo mouse neutropenic thigh infection model.

The crystal structure of 13f complexed with the P. aeruginosa LpxC enzyme (PDB: 6MAE) shows that the hydroxamic acid moiety is bound to the zinc atom in the active site, and involved in interactions with H78, H237, T190, E77, and D241 (Figure 4). The hydrophobic tail (phenyl group) interacts with several hydrophobic side chains, and the cyclopropyl group further extends to the solvent exposed region. The sulfone oxygen atoms interacts with well-defined crystallographic water molecules (not shown in Figure 4), and the methyl group attached to the sulfone functionality is engaged in hydrophobic interactions with F191. The carbonyl oxygen of the oxazolidinone forms a favorable polar interaction with the C2 CH group of H19.

The crystal structure of 13f thus confirms that the oxazolidinone group can favourably orient the sulfone/hydroxamic acid portion of the inhibitor with respect to the hydrophobic phenyl group while maintaining a low-energy conformation, as predicted by Spark.


Figure 4. X-ray crystal structure of 13f complexed with P. aeruginosa LpxC enzyme (PDB: 6MAE)

 

Conclusion

This paper from NIBR shows that Spark can guide the rational design and discovery of new drug candidates. In this case, a traditional scaffold hopping experiment led to the identification of novel oxazolidinone and isoxazoline LpxC inhibitors with potent antibacterial activity against Gram-negative bacteria.

Try Spark on your project

Request a free evaluation of Spark to move to new series and non-obvious IP by swapping scaffolds.

References

  1. Irwin, J. J.; Sterling, T.; Mysinger, M. M.; Bolstad, E. S.; Coleman, R. G. ZINC: A Free Tool to Discover Chemistry for Biology. J. Chem. Inf. Model. 2012, 52 (7), 1757–1768.
  2. Gaulton, A.; Bellis, L. J.; Bento, A. P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; Overington, J. P. ChEMBL: A Large- Scale Bioactivity Database for Drug Discovery. Nucleic Acids Res. 2012, 40 (D1), D1100–D1107.
  3. Pitt, W. R.; Parry, D. M.; Perry, B. G.; Groom, C. R. Heteroaromatic Rings of the Future. J. Med. Chem. 2009, 52 (9), 2952–2963

Rapid interpretation of patent SAR using Forge

Biological data is now a regular feature of new patent applications and this is readily available for download from Bindingdb which has data on over 2,500 patents encompassing more than 300,000 binding measurements. Generating meaningful insights to this data is perceived as less straightforward. In this post I will use Forge™ V10.6 to demonstrate that it is possible to get an overview of the SAR from a single patent entry with minimal human intervention and time.

Application to PIM-1

Selection and processing of 288 compounds from US9321756, ‘Azole compounds as PIM inhibitors’ (detailed in Appendix I) gave the Activity Atlas™ model shown in Figure 1. The total time to generate and interpret this model was around 30 minutes. It would be relatively straightforward to automate the process.

Figure 1: Activity Atlas model generated in this case study. From data download to model took 30 minutes. The ‘Activity Cliff Summary of Electrostatics’ and ‘Activity Cliff Summary of Shape’ views are shown. These detail regions of acute SAR – Red / Blue = positive / negative electrostatics preferred for greater activity; Green / Pink = activity favors /disfavors atoms in this region.

SAR interpretation

Firstly, the oxadiazole is clearly required as demonstrated in Figure 2 by region of negative (blue) next to both nitrogen atoms and representing the interaction of this group with the side chain of Lys67. Perhaps this is not surprising given the title of the patent application. The model also shows that the amino group next to the oxadiazole is constrained (area of pink surface).

Figure 2: Activity Atlas model close to the oxadizole group. Red = positive electrostatics preferred; Blue = negative electrostatics preferred; Green = Atoms in this region favored; Pink = Atoms in this region disfavored.

On initial inspection there appears to be space in the protein to accommodate a substituent on the nitrogen. However, by viewing the aligned ligands in the context of the protein and showing contacts in Forge, Figure 3 shows it is clear that all N-substituted ligands clash with Asp186 and that the adjacent space is not accessible from this position in the ligand.


Figure 3: Clash of a ligand with a morpholino substituent to Asp186 (orange lines).

The model (Figure 4) shows that there is a clear preference for molecules that extend into the gap between the two arms of the ligand (green surface at the bottom of the model above). Whilst we would want to check the underlying data, the suggestion is that substitution on either R-group is tolerated. Indeed, the most active compound crosses this gap completely which raises the possibility of using a cyclized ligand.

Figure 4: A high active from the patent displayed in CPK. The N-trifluoroethyl group touches the cyclopropyl substituent on the opposite side of the molecule.

Surrounding the green (favorable volume) region between the two arms is large area of red surface. This suggests that positive electrostatics – edges of aromatics or H-bond donors etc. – is preferred in this region.
This summary is reinforced by looking at the individual compounds that make up the data, thankfully this is easy to do with the Activity Miner module of Forge. Using Activity Miner’s top pairs table (Figure 5) there are many pairs of molecules where introduction of a positive charge in the region below (as shown in the pictures) the ligand generates a more active molecule. Generally the difference is around 1 unit better activity for the charged species.

Figure 5: The top pairs table in the Activity Miner module of Forge showing a specific pair of molecules and the electrostatic difference map between them. Red regions indicate where that ligand in more positive than the comparator; Blue where that ligand is more negative. In this case the ligand on the left is over a log unit more active and contains a positive charge in the region at the bottom of the picture.

Looking at the protein structure does not reveal a specific interaction or reason for this gain in potency. However, by using the protein field surface in Flare, we can see that the protein is generating a negative potential in this region which would account for the gain in activity when introducing a positive charge.

Figure 6: The protein interaction potential contoured at 2kcal/mol, Red = positive; Blue = negative. The potential indicates the nature of atoms that to use in a region, positive atoms fit well in negative regions etc.

Lastly, in the region of the pyrimidine group the model has a large area of blue. This indicates that there is a clear preference for molecules with nitrogen atoms in the ring at these points (e.g., pyrazine). This area points towards solvent and hence this is quite surprising. From the crystal structure alone it would be expected that introduction of heteroatoms would have little effect on activity. Examination of the data using Activity Miner confirms that, for example, pyrazine is more active than pyridine. In this case the protein fields do not reveal anything significant in the underlying potential of the protein and we are left to speculate at the reason for the SAR.

Figure 7: PDB 4TY1 showing the region around the pyrimidine group of the ligand. There are few interactions between the protein and the edge of the ligand in this region.

Speculating that protein movement was at the root of the observed SAR, I downloaded into Flare all the PIM-1 structures from the PDB, sequence aligned them and superposed based on the sequence alignment. Looking at this region across the 150+ structures show no clear case for protein flexibility although a number of structures do have a water molecule in this region that would bridge the ligand to the side chain of Arg122.

Figure 8: Over 150 PIM-1 crystal structures superposed in Flare. The backbone is shown in tube, residues close to the depicted ligand of structure 4TY1 are shown in thin sticks. Only two structures have any variation in loop conformation in this region.

The reason for the observed SAR remains elusive and could be a function of protein-protein interaction, water mediated interaction or something else.

Conclusion

Rapid interpretation of Bindingdb patent data can be achieved using Forge. In this case the SAR of 288 ligands was condensed to a single Activity Atlas model in less than 30 minutes. Interpretation of the model over the next 30 minutes generated clear SAR insights that could be employed on competing projects. Inspecting the protein electrostatics using Flare provided further insights into the observed SAR.

Try Forge on your project

Request a free evaluation of Forge to try this on your data or condense a patent into a simple summary of the published SAR.

See all licensing options for Forge.


Appendix I

Background computational details

The raw data was downloaded in tab separated format from Bindingdb and pre-processed in Excel. The raw data contains data for two biological targets – ‘PIM’ and ‘PIM-1’. Compounds with ‘PIM-1’ data were selected and checked for duplicate values. One compound was excluded because of a large variation in the reported IC50 value and four molecules were excluded due to missing activity values. All other duplicate IC50 values were averaged and converted to a pIC50 value resulting in a dataset of 288 molecules in a csv file.

The original dataset included the ligands of PDB codes 4TY1 and 4WT6. The protein-ligand complexes were downloaded into Flare, sequence aligned and superposed. Looking at the binding site, either ligand would work well as a reference for initial alignment of the dataset. The ligand from 4WT6 was chosen for further experiments and both ligand and corresponding protein transferred to Forge (Copy-Paste). The csv file was loaded into Forge (Training Set) and the molecules processed using Accurate but Slow conformation hunting, Substructure alignment and an Activity Atlas model built.


The Forge processing window showing the options used in this case study.

Using the Cresset Engine Broker, the calculation took 15 minutes to complete. Examining the results shows excellent alignment through the common substructure but some variation beyond that.

288 aligned ligands from US9321756 that were used to prepare the Activity Atlas model.

 

About Activity Atlas

Activity Atlas models are created by comparing all pairs of molecules in terms of positive and negative electrostatics plus the hydrophobics and shape properties and then combining these together, weighted by the change in activity for the pair. The result is a simple, qualitative picture of the critical points in the SAR landscape.

The resulting Activity Atlas model was automatically displayed. I always start with the ‘Activity Cliff Summary of Electrostatics’ and ‘Activity Cliff Summary of Shape’ views to understand the data. As this was a quick experiment and the alignments were noisier than a fully curated experiment, the Activity Atlas model is also noisier than ideal. However, by increasing the Confidence Level to 3.0 concentrates on the clear signals in the data.

The display options used for the Activity Atlas models shown in this study.

Model validation

Activity Atlas is a qualitative technique and hence difficult to validate except through manual inspection. However, Forge is capable of building quantitative models that can be used to validate the alignment of the molecules (we believe that consistent alignment is the single biggest factor in generating reliable 3D QSAR models). Using the Automatic regression model building methods of Forge with a 20% activity stratified test set generated an SVM model with q2 0.64 (LOO) and an r2 on the independent Test set of 0.62. Given the noisy nature of the input data I believe this represents a good model and that the alignments are valid.

Designing new molecules in a web browser

Last year we discussed our research aimed at re-imagining molecule design to bring the best of 2D and 3D technologies together in a collaborative environment. The project, code-named TorchWeb, has progressed significantly and is now on the count down to a beta release, expected in the early summer of this year, with an initial release to follow in the autumn.

The web-based interface contains plugin windows with key information. Here, in addition to the ‘Editor’ and ‘3D viewer’ plugins I have: the ‘Designs’ plugin that shows all the molecules that I am currently working on; the ‘LogP’ plugin giving an atomistic breakdown of the calculated logP; the ‘Properties’ plugin showing mutiple physico-chemical properties; and the ‘Similarity search’ plugin that shows similar molecules from a chosen database.

The two central concepts of this new product remain unchanged:

  • Firstly, we want to create an environment where medicinal chemists can draw molecules in a 2D editor, have these automatically converted into a 3D model of how the new molecule would interact with their target, or compare to the molecules that they have made before. To do this we created a new algorithm to grow molecules in 3D which is applied to every change in the 2D molecule.

The 3D pose of the molecule is updated interactively as the molecule is sketched in the 2D window enabling immediate assessment of the potential interactions that could be made by the new molecule. All other plugins also update, giving live similarity searches and logP predictions.

  • Secondly, we recognize that medicinal chemistry designers often work in teams across multiple locations and time zones. Consequently, collaboration had to be central to the application. This has been achieved through session sharing – enabling multiple users to share and simultaneously interact with a design and work on them together.

Joining a shared session enables users to collaborate live on any design, updating the 3D pose and chemical properties as the molecule evolves.

We are working steadily to convert our initial prototype into a full product. Collaboration with selected customers has enabled us to capture detailed requirements and to transition the code base into a robust, secure environment suitable for on-premise installation or deployment in the cloud.

Features have not been ignored! The design application will be joined by a data analysis application that will combine eye catching plots and graphs with 3D protein active site analysis.

Once designed a molecule still has to be made. Here we have embarked on a partnership with Elixir Software’s chemTraX. Together we will provide real time understanding on the status of every molecule from design to analysis, so you only make the molecules that you need to reach your goal.

Want to be among the first to learn more?

Subscribe to our newsletter.

Cresset to participate in the first SCI/RSC Computational Chemistry Workshop on April 10

Talking to a group of medicinal chemists at a conference over lunch raised the following question: “It’s really interesting to see all the clever things that can be done with these software tools, but could we have a meeting where we actually get to try them out for ourselves?” . With this in mind a combined team from SCI and RSC decided to organize a computational chemistry workshop where people could access software and benefit from top quality training from the creators and developers of a range of these tools, each of which address different aspects of pre-clinical drug discovery. All scientists working in this area need tools and techniques for handling chemical information but it is difficult to get an opportunity to try out more than one package at a time and we would all relish a helping hand to get up and running as quickly as possible.

Cresset is always keen to introduce new people to the concept of fields and to demonstrate the ways in which they can be used to design biologically active molecules. We are very happy to welcome Giovanna Tedesco, Senior Product Manager at Cresset, who will present on:


Next generation structure-based design with Flare

Learn how simple structure-based design can be within small molecule discovery projects. The workshop will cover ligand design in the protein active site, Electrostatic Complementarity™ maps and scores, ensemble docking of ligands with Lead Finder, calculations of water stability and locations using 3D-RISM, energetics of ligand binding using WaterSwap and use of Python extensions. Applications you will use: Flare™ , Lead Finder™.

 


Participants will be able to pick 4 out a possible 6 workshops over the day, choosing from sessions covering data processing and visualization; ligand and structure-based design, or ADMET prediction. These are all areas that chemists working in the pharmaceutical, biotech, life sciences and agrochemicals sectors engage with every day. Full details of all workshops are available from SCI and slots will be assigned on a first-come-first -served basis. Most importantly, all software and training materials required for the workshop will be provided for attendees to install and run on their own laptops and use for a limited period afterwards. This will give everyone the chance to take what they have learnt back to their own organisations and try out their newly acquired skills on their own data.

When: April 10, 2019

Where: The Studio, conveniently located next to Birmingham New Street Station, Birmingham, UK

Registration is open and the early bird price of £30 (£40 for non-SCI/RSC members) is available till 27th February. Financial support to cover travel and registration is available for students on application.

I hope you are able to join us for a unique opportunity to get to grips with a wide range of tools and concepts which you can use in your own research.

Find out more and register now.

Caroline Low, PhD, FRSC CChem

SCI Scientific Organizing Committee member and Cresset Discovery Services consultant

Comparing Forge’s command line utility to Blaze – which one should you use?

Here at Cresset we’re very interested in ligand-based virtual screening – it’s been a focus of the company ever since we started more than seventeen years ago. In that time there have been many advances and refinements of the techniques for both ligand-based virtual screening and structure-based methods. We have stuck by our fundamental principle that ligand similarity based on both electrostatics and shape is an excellent way to sort the wheat from the chaff. The results obtained by our services division, who have run more than 200 virtual screening campaigns with a better than 80% success rate, is testament to that.

Difference between falign and Blaze

One of the things our customers ask from time to time is which application should they be using to do virtual screening. The simple answer is that there are two, Forge (and its command-line utility ‘falign’) and Blaze, and the differences are readily apparent.

In falign, you can generate conformations for a large set of molecules, align them to one or more references, and rank them by the similarity score. You also have the option to bias the alignments and scores by adding field constraints, pharmacophore constraints, and protein excluded volumes.

By way of contrast, in Blaze, you can generate conformations for a large set of molecules, align them to one or more references, rank them by the similarity score, and… ok, point taken. So, given that falign and Blaze apparently do the same thing…

Why falign and Blaze?

The answer is scale. As anyone who’s ever played with large data sets knows, doing calculations on a few hundred compounds is fundamentally different to doing them on tens of millions of compounds. Once you are working at large scale, seemingly trivial operations such as filtering data sets become much more difficult if you want to be efficient. Blaze was designed from the ground up to work with large data sets of 107 molecules and more, with an emphasis on maximizing throughput on a computational cluster. Forge/falign on the other hand are much more aimed at small-scale work, enabling simple screening or analysis of relatively small sets of compounds where the big iron of Blaze is overkill.

Data preparation

As an example, let’s look at the preparation of the data set in the two software suites. In falign, this is relatively simple: you provide the compounds to falign in 2D or 3D form, it assigns protonation states as necessary, and computes conformations on-the-fly if required before aligning to the query:


Falign has a secondary mode for use when aligning structurally-related compounds, which ensures that the common substructure within the dataset is perfectly matched:

Blaze, on the other hand, is much more sophisticated in its conformer handling. The average user of Blaze has multiple data sets that they want to screen (in-house compounds, vendor screening compounds, virtual libraries, custom collections), and these often have significant overlap. In addition, these data sets are usually reused multiple times for multiple virtual screens. As a result, Blaze has a sophisticated deduplication and precomputation pipeline that maximizes computational efficiency. The Blaze workflow looks more like this:


Any given chemical structure is only present once within Blaze: it may have multiple different names, and be present in multiple collections, but we’ll only precompute its conformations once and we’ll only align it once in any given screen. The conformer computation pipeline is heavily optimized for performance: we’ve done extensive studies on our conformer generation algorithm XedeX to find the optimal trade-off between conformation space coverage, rejection of higher-energy conformations, calculation time and number of conformations required. In addition, we’ve developed a special-purpose file format that is highly compressed (less than 13 bytes per atom on average, including coordinates, atom types, element, charge and formal charge) while being unbelievably fast to parse.

Blaze has a multiple-step pipeline to filter the data set, so that the full 3D electrostatic shape and alignment algorithm is only applied to molecules that are likely to have a high score. For extremely large data sets there’s an initial filter by FieldPrint, an alignment-free fingerprint method that gives a crude measure of electrostatic similarity. The molecules that pass the filter then go into an ultrafast version of our 3D alignment and similarity algorithm, and the full similarity algorithm is applied only to the best 10% or so of these. As a result, Blaze can chew through millions of molecules very quickly on even a modest cluster. The processing capability of Blaze is further enhanced by the fact that there’s a GPU version which is even faster.

Small versus large data sets

So, falign is designed for the simple use case on small sets of molecules, while Blaze is aimed at maximum computational and I/O efficiency on very large data sets. There is another important difference between the two. As anyone who’s been in charge of maintaining a virtual screening system knows, keeping it up to date is often a painful and thankless task. It’s bad enough keeping up with the weekly additions to the internal compound collection but keeping track of updates to external vendor’s collections is difficult: not only are new compounds being added but old ones are being retired. Blaze makes handling this situation easy. You simply provide Blaze with the new set of compounds that you want to be in the collection, and Blaze will automatically handle the update.


Any new compounds will be added, no-longer-available compounds will be marked and removed from the screening process, and any unchanged compounds will be left alone. This is far more computationally efficient than fully rebuilding the conformations for everything. Blaze can even be directly connected to your internal compound database, so that the Blaze collection holding your in-house compounds is always right up to date.

Given how great Blaze is at handling virtual screening, why would you ever want to use falign?

Blaze is optimized for throughput and computational efficiency, but the downside of this is latency. If you have a set of compounds you want to align and score in Blaze, you have to upload them, wait for Blaze to process them and build the conformations, wait for Blaze to build its indices, initiate a search, and wait for it to be submitted to your cluster queueing system. There’s five- or ten-minute’s latency in all of this, which is fine for a million molecules but is overkill if you have only one hundred. Falign, by contrast, will start work straight away on your local machine with no waiting at all.

The answer to the falign vs Blaze question, then, is largely a question of scale. Got a dataset of a million molecules that you want to run repeated virtual screens against? Blaze is just the ticket. Got a small set of compounds that you want to align and score as a one-off? Forge and falign are just what you need. For our in-house work we tend to find that the tipping point occurs around a few thousand molecules. Falign can easily chew through this many in an hour or so (especially if plugged into your computing cluster using the Cresset Engine Broker). However, if there’s more compounds than this or we’re going to want to run multiple queries, then Blaze it is. Since Blaze is accessible through the Forge front end, and both are accessible through KNIME and Pipeline Pilot, it’s as easy as pie to pick the right tool for the job.

Try on your project

Request a free evaluation to try out Forge or Blaze for your small or large-scale virtual screening needs. Don’t have a cluster? Blaze Cloud and Blaze AWS provide simple ways to access cloud resources to do the number crunching for you.

 

Forge Design: New name, familiar environment

Forge Design is a new licensing level of Forge™ for medicinal and synthetic chemists. It replaces Torch™ and benefits from the familiar GUI, but with V10.6 enhancements.

What can Torch users expect from Forge Design?

The new graphics engine generates enhanced 3D objects, thus delivering strong performance, improved pictures and new smooth transitions between storyboard scenes (Figure 1).


Figure 1: The new graphics engine in Forge Design generates great pictures in an enhanced GUI providing strong performance, faster calculations, improved 2D display of molecules and smooth transitions between storyboard scenes.
 

For larger projects the GUI will be more responsive, with improved performance on common operations such as application of filters, calculation and interaction with custom plots, exporting data. Activity Miner users will experience faster calculations which are less memory intensive.

The 2D display of molecules has been improved to make it clearer and more appealing.

New functionality

Forge Design has a significant number of new features and improvements compared to Torch V10.5. For example:

  • If you are working on a large project, there is a new function to automatically assign the selected molecules to roles, based on their Murcko scaffold
  • The Filters window includes a green/red toggle to control whether each filter is enabled or disabled, and pre-defined structural filters (for example, for groups like Ring, Aromatic Ring, H-bond donor and H-bond acceptor)
  • There is a new function to show the chosen field surfaces as a difference between the two molecules in the 3D display.


Figure 2: New functions in Forge Design include an option to automatically assign molecule to roles based on their Murcko scaffold, improved filters and the new ‘Field Difference’ button to show the chosen field surfaces as a difference between the two molecules in the 3D display. In this case the mono-fluoro derivative on the right is more positive (red) where the F is changed to H but also has a more negative aromatic ring.

  • Improved Molecule Editor, enabling you to conformation hunt and align all the molecules created during the same editing session as you exit the editor
  • Improved Blaze™ results window, which now shows an enrichment plot and statistics for each Blaze refinement level
  • The ideal radial plot profile for your project and the custom settings used for the conformation hunt and alignment can be shared with your team using the new import/export functions
  • Improved display of bonds in the 3D window, greater readability of constraints labels, more intuitive display of interactions between the reference molecules and the protein
  • New function to clip only the display of the protein leaving the ligands untouched
  • New functions to choose column content as the molecule title, or as a label in the 3D window
  • Improved plots now showing a regression line for selected molecules
  • For Activity Miner users, the Disparity matrix can now be filtered by Similarity, Disparity and Δ Activity; there is also a new ‘Find Molecule’ function. Also, you can now tag all the molecules visible in the Activity View in the Forge Molecules table.


Figure 3: In the Forge 10.6 Activity Miner window, the disparity matrix can be filtered by Similarity, Disparity and Δ Activity; molecules that do not pass the filter(s) are shown in gray.

How does Forge Design compare to Torch and Forge?

Figure 4 below shows the modules which are common between Forge and Forge Design (red), and the optional modules in Forge only (blue).


Figure 4: Modules available in Forge only (blue), and modules available in Forge and Forge Design (red).
 

Forge Design uses wizards for common operations just as with the full Forge package.  However, the wizards for building activity models and pharmacophores will be greyed out, as these are optional in Forge Design. The processing window will look slightly different (Figure 5 – right), with the optional Build Model section greyed out.


Figure 5: The wizard and functions available only in full Forge are greyed out in Forge Design.
 

You will find new, pre-defined roles in the Molecules table for training, test and prediction sets. In Forge Design, these roles have no special meaning and you can use them as any other user-created role or ignore them completely.

Simplicity and integration

Having a streamlined platform for ligand-based software makes it easier for existing Forge and Torch users to upgrade their installation to the newer release, with a single installer for both Forge and Forge Design.

This solution also makes it easier to upgrade Forge Design with additional functionality (for example, Activity Miner or the model building package), if desired.

This is a further step towards the integration of all Cresset ligand-based and structure-based functionality and simplifies the product installation and distribution for most customers.

Try Forge Design

If you are an existing Torch user, we will be in contact soon with more information on Forge Design. If you don’t have Forge or Forge Design, contact us to learn more.