Current Spark databases

The currently released databases for Spark are listed below. Cresset release databases based on commercially available screening compounds (from ZINC1,2 drug-like and eMolecules screening compounds3), from literature reports (ChEMBL4), from theoretical rings (VEHICLe5), and from commercial reagents (eMolecules building blocks6). The larger databases are split based on the frequency of occurrence of the fragments.

Fragments from screening compounds

Optional databases (contact Cresset support to download these databases):

In general, fragments from the Very Common or Common databases are more likely to be readily synthesizable as they appear in many different commercially-available molecules. Fragments from the Rare, Very Rare and Extremely Rare databases are more likely to be non drug-like or hard to make.

Note that these databases have been filtered to remove potentially toxic or reactive fragments (such as alkyl halides or nitroso functionalities). In addition, all phosphorus-containing fragments have been removed as the calculation of fields on phosphorus-containing functional groups is still under development.

Fragments from ChEMBL

The current Spark databases from ChEMBL are based on release 23 of ChEMBL and again have been split based on the frequency of occurrence of the fragments.

Reagents

Spark reagent databases are derived from eMolecules building blocks using Cresset’s reagent importer, which converts a file of usable reagents into the corresponding R-group. For example, to create the eMolecules_acid database, all the eMolecules building blocks containing a C(=O)OH or C(=O)Cl group were processed to add the R-group to the database. The use of databases derived from available reagents ensures that the results of your Spark experiment are  tethered to molecules which are readily synthetically accessible.

Analysis of fragment databases

Database overlaps (number of fragments present in both databases)

VeryCommon Common LessCommon Rare VeryRare ExtremelyRare
ChEMBL23 ChEMBL_common 47,988 53,662 46,596 31,269 23,411 13,343
ChEMBL23 ChEMBL_rare 5,887 26,924 45,604 49,965 50,019 32,963
ChEMBL23 ChEMBL_veryrare 1,863 10,559 23,370 31,351 39,765 31,220

Fragment and connection point counts

Database Total Fragments Frags with 1 connection Frags with 2 connections Frags with 3 connections Frags with 4 connections Rings Only
VeryCommon 63,767 20,561 26,978 13,072 3,156 1,955
Common 137,279 42,778 56,701 29,560 8,240 1,923
LessCommon 255,979 72,051 105,509 59,698 18,721 2,993
Rare 401,110 108,290 163,388 95,357 34,075 3,589
VeryRare 674,993 216,363 257,252 147,774 53,604 5,885
ExtremelyRare 749,135 263,302 268,267 157,186 60,380 5,914
ChEMBL_common 305,932 58,924 115,610 90,337 41,061 7,221
ChEMBL_rare 506,157 104,102 180,027 147,093 74,935 7,683
ChEMBL_veryrare 569,498 143,888 203,965 148,167 73,478 7,960

Number of fragments within specified molecular weight range

MW 1-50 51-100 101-150 151-200 201-250
VeryCommon 242 8,038 31,920 21,761 1,806
Common 83 7,199 54,112 65,463 10,422
LessCommon 71 9,574 88,971 132,195 25,168
Rare 65 10,471 118,174 223,556 48,844
VeryRare 80 13,832 170,413 392,218 98,450
ExtremelyRare 38 12,326 169,805 445,774 121,192
ChEMBL_common 406 20,711 116,539 140,216 28,060
ChEMBL_rare 78 16,376 146,306 266,449 76,948
ChEMBL_veryrare 52 12,377 139,159 311,235 106,675

Atom count distribution

NH 1-2 3-4 5-6 7-8 9-10 11-12 13-14 15-16 17-18
VeryCommon 82 726 4,238 13,349 20,616 17,567 6,584 581 24
Common 10 386 3,893 16,543 38,771 47,318 26,398 3,812 148
LessCommon 11 443 4,699 24,840 65,901 90,673 58,722 10,252 438
Rare 5 459 4,960 28,482 89,754 150,597 108,017 18,108 728
VeryRare 9 545 6,555 37,290 130,851 257,636 206,767 34,218 1,122
ExtremelyRare 6 381 5,715 35,215 132,847 286,015 240,270 46,980 1,706
ChEMBL_common 99 1,563 10,847 37,648 76,306 93,110 65,940 18,656 1,763
ChEMBL_rare 8 598 7,796 39,192 101,138 155,314 144,324 51,308 6,479
ChEMBL_veryrare 9 402 5,758 33,704 98,244 171,253 179,651 71,461 9,016

Number of fragments with specified number of rotatable bonds

Num Rot Bonds 0 1 2 3 4 5
VeryCommon 7,034 19,489 21,637 12,265 2,770 572
Common 10,037 37,173 53,159 29,649 6,200 1,061
LessCommon 17,162 67,293 101,707 56,363 11,582 1,872
Rare 24,859 107,547 166,474 84,939 14,983 2,308
VeryRare 44,647 194,410 284,176 128,930 19,993 2,837
ExtremelyRare 50,394 223,839 312,879 139,748 19,489 2,786
ChEMBL_common 32,169 92,544 106,420 56,077 15,269 3,453
ChEMBL_rare 53,602 158,152 179,852 88,071 21,957 4,523
ChEMBL_veryrare 61,668 186,095 204,502 91,853 21,267 4,113

Number of fragments with specified number of conformations

Number of conformations 1-5 6-10 11-15 16-20 21-25 26-30
VeryCommon 39,276 10,458 4,964 2,845 1,746 4,478
Common 78,474 23,414 11,674 6,953 4,584 12,180
LessCommon 142,525 43,450 22,631 13,545 8,784 25,044
Rare 221,938 69,254 36,562 21,464 13,622 38,270
VeryRare 372,087 118,674 62,752 36,669 23,184 61,627
ExtremelyRare 399,744 135,833 72,734 43,448 27,083 70,293
ChEMBL_common 191,767 45,240 21,663 13,046 8,452 25,764
ChEMBL_rare 308,175 81,731 38,174 22,456 13,956 41,665
ChEMBL_veryrare 357,970 88,586 41,857 23,447 14,621 43,017

Analysis of reagent databases

Number of fragments within specified molecular weight range

The figures below are approximate. The exact number of fragments may change over time as the reagent databases are updated on a monthly basis.

Make sure you have you keep your reagent databases updated following the instructions at installing Spark databases.

Molecular weight distribution Description Total 1-50 51-100 101-150 151-200 201-250
eMolecules_acidCO Acids, keep the CO 36,871 3 413 8,751 22,336 5,368
eMolecules_acid Acids, delete the -COOH 69,920 45 3,168 24,065 34,084 8,558
eMolecules_alcohol Aliphatic alcohols, delete the O 37,834 7 1,442 11,963 20,051 4,371
eMolecules_alcoholO Alcohols, keep the O 29,491 3 435 8,268 16,839 3,946
eMolecules_aliphatic_halide Aliphatic halide 19,013 13 1,081 7,063 8,991 1,865
eMolecules_alkyne Alkynes, delete the -C#C 6,114 17 712 3,197 1,859 329
eMolecules_aromatic_alcoholO Aromatic alcohols, keep the O 11,680 0 56 2,230 7,278 2,116
eMolecules_aromatic_aminesN Aromatic amines, keep the N 31,228 0 116 5,026 19,935 6,151
eMolecules_aromatic_halide Aromatic halide 63,212 7 450 16,161 40,454 6,140
eMolecules_boronic Aromatic boronic acids, delete -B(OH)2 4,534 0 148 1,944 2,106 336
eMolecules_cyano Cyano groups, delete -CN 26,287 17 1,217 8,959 12,810 3,284
eMolecules_isocyanateCO Isocyanates, keep -NCO 1,043 0 14 294 594 141
eMolecules_olefin Olefins, delete the -C=C 6,031 13 666 2,950 1,982 420
eMolecules_primary_aliphatic_amine Primary aliphatic amines, delete the N 42,204 6 1,479 14,645 22,461 3,613
eMolecules_primary_aliphatic_amineN Primary aliphatic amines, keep the N 20,221 0 404 6,994 11,192 1,631
eMolecules_primary_aliphatic_halide Primary aliphatic halide 13,253 12 737 5,330 5,884 1,290
eMolecules_primary_aromatic_amines Primary aromatic amines, delete N 43,191 0 348 9,291 25,883 7,669
eMolecules_secondary_aliphatic_amineN Secondary aliphatic amines, keep the N 30,855 1 269 5,974 18,565 6,046
eMolecules_sulfonicacid Sulfonic acids, delete the -SO2X 9,820 32 830 4,307 3,810 841
eMolecules_sulfonicacidSO2 Sulfonic acids, keep the -SO2 4,938 0 11 372 2,588 1,967
eMolecules_thiol Aliphatic thiols, delete S 2,939 7 431 1,504 853 144
eMolecules_thiolS Thiols, keep S 5,213 0 44 1,157 3,054 958

References

1 Sterling and Irwin, J. Chem. Inf. Model, 2015 DOI: 10.1021/acs.jcim.5b00559.

2 Irwin, Sterling, Mysinger, Bolstad and Coleman, J. Chem. Inf. Model. 2012. DOI: 10.1021/ci3001277.

3 https://www.emolecules.com/info/screening-compounds

4 https://www.ebi.ac.uk/chembl/

5 Pitt, Parry, Perry and Groom J. Med. Chem. 52, 9, 2952-2963. DOI: 10.1021%2Fjm801513z

6 https://www.emolecules.com/info/building-blocks