Support Vector Machine – a model for QSAR still robust in the era of Deep Learning
The new QSAR framework in Flare makes generating and using 3D-QSAR descriptors to build models with good predictivity and generalizability ...
News
A new release of the Spark™ fragment and reagent databases is now available for download, to accompany the release of Spark 10.5. These are designed to provide you with an excellent source of new biososteres, whilst also ensuring that the results of your Spark experiment are tethered to molecules which are readily synthetically accessible.
In this release we have made significant additions to the source of our fragment databases. The new Spark ‘Commercial’ databases (replacing the previous ZINC fragments) use the combination of ZINC15 and the eMolecules Screening Compounds and include significant new chemical diversity.
The Spark ‘ChEMBL’ databases have also been updated and are based on release 23 of ChEMBL.
In all cases, the compounds in the entire source collection were filtered to remove potentially toxic or reactive fragments. They were then fragmented, and the frequency with which any fragment appeared in the original source database annotated. The fragments were then sorted by frequency and labelled according to the number of bonds that were broken to obtain the fragment, as shown in the table below.
Spark Category | Database | Total number of fragments (to nearest 1000) | Frequency |
Commercial | VeryCommon | 64,000 | Fragments which appear in more than 650 molecules |
Common | 137,000 | Fragments which appear in 140-649 molecules | |
LessCommon | 256,000 | Fragments which appear in 35-139 molecules | |
Rare | 401,000 | Fragments which appear in 12-34 molecules | |
VeryRare | 675,000 | Fragments which appear in 5-11 molecules | |
ExtremelyRare | 749,000 | Fragments which appear in 3-4 molecules | |
ChEMBL | Common | 306,000 | Fragments which appear in more than 6 molecules |
Rare | 506,000 | Fragments which appear in 2-6 molecules | |
Very rare | 570,000 | Fragments which appear in a single molecule |
Overall, the new Spark databases include over 3 Million fragments which can be used to identify novel bioisosteres for your project. Figure 1 plots the number of fragments in each database per connection point count.
% overlap with ChEMBL | Very Common | Common | Less Common | Rare | Very Rare | Extremely Rare |
ChEMBL common | 16% | 18% | 15% | 10% | 8% | 4% |
ChEMBL rare | 1% | 5% | 9% | 10% | 10% | 7% |
ChEMBL very rare | 0% | 2% | 4% | 6% | 7% | 5% |
Not surprisingly, the most common fragments for each database significant overlap. However, the majority of ‘rare’ fragments appear to be unique to each database, showing that the original ZINC plus eMolecules’ Screening Compounds and the ChEMBL collections occupy quite distinct parts of chemical space.
Monthly updates of the Spark reagent databases, derived from the eMolecules building blocks using an enhanced set of rules for chemical transformation, will continue also in this release. The February edition includes over 500,000 reagents with up-to-date availability information, to make it easy for you to move from the results of a Spark experiment to ordering the reagents you require to turn these results into reality.
The number of fragments in each reagent database is plotted in Figure 2.
Each fragment in the eMolecules database is linked back to both the eMolecules ID for the source reagent and its availability. The advanced filtering capabilities in Spark (Figure 3) make it very easy to choose the optimal set of reagents for your experiment based on the Spark similarity score, preferred chemistry (as encoded by the reagent database which generated the result), availability information and overall physico-chemical profile of the results molecules.
The eMolecules IDs for the favorite reagents can be easily exported from Spark and used to purchase the compounds from the eMolecules building blocks database, as shown in the web clip How to use the eMolecules reagents databases in Spark.
Spark fragment and reagent databases provide an excellent source of new bioisosteres. However, if you have access to significant proprietary chemistry, to specialized reagents, or simply want to only consider fragments from reagents that you have in stock then the creation of custom databases will add value to your Spark experiments.
The Spark Database Generator is a user-friendly interface within Spark that lets you easily create custom databases.
This release of the fragment databases significantly increases the chemical diversity available to Spark users, while the monthly updates of the reagent databases ensure that the results of your Spark experiment are tethered to molecules which are readily synthetically accessible.
We are confident that these new and updated Spark fragment and reagent databases, combined with databases from your corporate collections generated with the Spark Database Generator, will provide an even better range bioisosteres for your project.
Please contact us to update to the latest databases, if you wish to access the Spark Database Generator, or to find out how Spark can impact your project.