New release of Spark databases

A new release of the Spark™ fragment and reagent databases is now available for download, to accompany the release of Spark 10.5. These are designed to provide you with an excellent source of new biososteres, whilst also ensuring that the results of your Spark experiment are tethered to molecules which are readily synthetically accessible.

Fragment Databases

In this release we have made significant additions to the source of our fragment databases. The new Spark ‘Commercial’ databases (replacing the previous ZINC fragments) use the combination of ZINC15 and the eMolecules Screening Compounds and include significant new chemical diversity.

The Spark ‘ChEMBL’ databases have also been updated and are based on release 23 of ChEMBL.

In all cases, the compounds in the entire source collection were filtered to remove potentially toxic or reactive fragments. They were then fragmented, and the frequency with which any fragment appeared in the original source database annotated. The fragments were then sorted by frequency and labelled according to the number of bonds that were broken to obtain the fragment, as shown in the table below.

Spark Category Database Total number of fragments (to nearest 1000) Frequency
Commercial VeryCommon 64,000 Fragments which appear in more than 650 molecules
Common 137,000 Fragments which appear in 140-649 molecules
LessCommon 256,000 Fragments which appear in 35-139 molecules
Rare 401,000 Fragments which appear in 12-34 molecules
VeryRare 675,000 Fragments which appear in 5-11 molecules
ExtremelyRare 749,000 Fragments which appear in 3-4 molecules
ChEMBL Common 306,000 Fragments which appear in more than 6 molecules
Rare 506,000 Fragments which appear in 2-6 molecules
Very rare 570,000 Fragments which appear in a single molecule


Overall, the new Spark databases include over 3 Million fragments which can be used to identify novel bioisosteres for your project. Figure 1 plots the number of fragments in each database per connection point count.

Figure 1: Count of fragments in Spark ‘Commercial’ (from ZINC15 and eMolecules’ Screening Compounds) and ‘ChEMBL’ (from ChEMBL23) databases split by the number of connection points of each fragment.An analysis of the numbers of fragments in common between the ‘Commercial’ and ‘ChEMBL’ Spark databases (expressed as percent overlap with respect to ChEMBL) reveals that the databases overall show an excellent level of complementarity.
% overlap with ChEMBL Very Common Common Less Common Rare Very Rare Extremely Rare
ChEMBL common 16% 18% 15% 10% 8% 4%
ChEMBL rare 1% 5% 9% 10% 10% 7%
ChEMBL very rare 0% 2% 4% 6% 7% 5%

Not surprisingly, the most common fragments for each database significant overlap. However, the majority of ‘rare’ fragments appear to be unique to each database, showing that the original ZINC plus eMolecules’ Screening Compounds and the ChEMBL collections occupy quite distinct parts of chemical space.

Reagent databases

Monthly updates of the Spark reagent databases, derived from the eMolecules building blocks using an enhanced set of rules for chemical transformation, will continue also in this release. The February edition includes over 500,000 reagents with up-to-date availability information, to make it easy for you to move from the results of a Spark experiment to ordering the reagents you require to turn these results into reality.

The number of fragments in each reagent database is plotted in Figure 2.

Figure 2: Number of fragments in the Spark eMolecules reagent databases.

Each fragment in the eMolecules database is linked back to both the eMolecules ID for the source reagent and its availability. The advanced filtering capabilities in Spark (Figure 3) make it very easy to choose the optimal set of reagents for your experiment based on the Spark similarity score, preferred chemistry (as encoded by the reagent database which generated the result), availability information and overall physico-chemical profile of the results molecules.

Figure 3: Spark reagent results include availability information from eMolecules.

The eMolecules IDs for the favorite reagents can be easily exported from Spark and used to purchase the compounds from the eMolecules building blocks database, as shown in the web clip How to use the eMolecules reagents databases in Spark.

Create your own database

Spark fragment and reagent databases provide an excellent source of new bioisosteres. However, if you have access to significant proprietary chemistry, to specialized reagents, or simply want to only consider fragments from reagents that you have in stock then the creation of custom databases will add value to your Spark experiments.

The Spark Database Generator is a user-friendly interface within Spark that lets you easily create custom databases.

Figure 4: The Spark Database Generator.


This release of the fragment databases significantly increases the chemical diversity available to Spark users, while the monthly updates of the reagent databases ensure that the results of your Spark experiment are tethered to molecules which are readily synthetically accessible.

We are confident that these new and updated Spark fragment and reagent databases, combined with databases from your corporate collections generated with the Spark Database Generator, will provide an even better range bioisosteres for your project.

Please contact us to update to the latest databases, if you wish to access the Spark Database Generator, or to find out how Spark can impact your project.

Try Cresset solutions on your project