The Spark databases are based on commercially available screening compounds (ZINC1,2 drug-like and eMolecules screening compounds3), from literature reports (ChEMBL4), from theoretical rings (VEHICLe5), and from commercial reagents (eMolecules building blocks6). The larger databases are split based on the frequency of occurrence of the fragments. A Spark database based on data from the Cambridge Structural Database7 is also available. In addition, you can use the Spark Database Generator to create your own Spark databases.
Fragments from screening compounds
- VeryCommon (388 MB) – fragments which appear in more than 650 molecules
- Common (968 MB) – fragments which appear in 140-649 molecules
- LessCommon (1.9 GB) – fragments which appear in 35-139 molecules
- Rare (3.1 GB) – fragments which appear in 12-34 molecules
- VeryRare (5.1 GB) – fragments which appear in 5-11 molecules
- ExtremelyRare (5.9 GB) – fragments which appear in 3-4 molecules.
In general, fragments from the VeryCommon or Common databases are more likely to be readily synthesizable as they appear in many different commercially available molecules. Fragments from the Rare, VeryRare and ExtremelyRare databases are more likely to be non-drug-like or hard to make. These databases have been filtered to remove potentially toxic or reactive fragments (such as alkyl halides or nitroso functionalities). However, all phosphorus-containing fragments have been removed as the calculation of fields on phosphorus-containing functional groups is still under development. See a detailed analysis of these databases. Two optional databases are also available:
- Doubleton (8.9 GB) – fragments which appeared in 2 molecules
- Singleton (23.7 GB) – fragments which appeared in a single molecule.
Contact support if you wish to download these databases.
Fragments from ChEMBL
The current ChEMBL Spark databases are based on Release 23 of ChEMBL and are split based on the frequency of occurrence of the fragments.
- ChEMBL_common (2.1 GB) – fragments which appear in more than 6 molecules
- ChEMBL_rare (3.6 GB) – fragments which appear in 2-6 molecules
- ChEMBL_veryrare (3.8 GB) – fragments which appear in a single molecule.
Spark Reagent Databases are derived from eMolecules building blocks6 using the Cresset reagent importer, which converts a file of usable reagents into the corresponding R-group. For example, to create the eMolecules_acid database, all the eMolecules building blocks containing a C(=O)OH or C(=O)Cl group were processed to add the R-group to the database.
Using databases derived from available reagents ensures that the results of your Spark experiment are tethered to molecules that are readily synthetically accessible. Monthly updates for these databases provide the very best availability information on the reagents that you wish to employ.
The current list of Spark Reagent Databases includes 22 common chemical transformations. See a detailed analysis of these databases.
Spark CSD Fragment Database
The Spark CSD Fragment Database is a collection of fragments derived from the small molecule crystal structures in the Cambridge Structural Database7 (CSD). A valid CSD-System license is required for use of this database. If you do not already have a license, please contact CCDC for assistance.
Create your own database
Spark fragment and reagent databases provide an excellent source of new bioisosteres. However, if you have access to significant proprietary chemistry, to specialized reagents, or simply want to only consider fragments from reagents that you have in stock then creating your own custom databases will add value to your Spark experiments.
The Spark Database Generator is a user-friendly interface within Spark that lets you easily create custom databases. Contact us to get a database generator license for the Spark Database Generator.
- Sterling and Irwin, J. Chem. Inf. Model, 2015 DOI: 10.1021/acs.jcim.5b00559
- Irwin, Sterling, Mysinger, Bolstad and Coleman, J. Chem. Inf. Model. 2012. DOI: 10.1021/ci3001277
- Pitt, Parry, Perry and Groom J. Med. Chem. 52, 9, 2952-2963. DOI: 10.1021%2Fjm801513z