Is it worth making? Assessing the information content of new structures


We have recently presented a method of summarizing the information obtained from 3D activity cliff analysis: examination of all pairs of molecules can distinguish between apparent cliffs that are outliers, or due to measurement error, and those which consistently point to particular electrostatic and steric features having a large impact on activity. To do this it has proved essential to allow for alignment noise: no 3D alignment technique is perfect, so we apply a Bayesian analysis to correct for potential misalignments and for the case where a molecule is aligned correctly except for a flexible substituent whose conformation is under-constrained. We use the recent AZ/CCDC alignment validation data set to determine valid estimates for the Bayesian priors.

As an extension of this technique, it is possible to mine the data for a simple picture of explored pharmacophoric space, corrected for the conformational and alignment flexibility of each molecule. This provides an invaluable picture to the chemist of which parts of property space around a molecule have been adequately explored. When considering a new molecule for synthesis, it is possible to compute the amount that this would increase the explored pharmacophoric space and hence present an ‘information content’ score for the new molecule: if we made and tested this new molecule, how much would it actually increase the structure activity relationship (SAR) information content of the data set?

The combination of this, with the activity cliff summary data, allows a simple qualitative evaluation of the SAR of a data set in 3D, alongside guidance on which parts of pharmacophoric space have been mined out and which remain underexplored. We present the application of these techniques to several literature data sets.


See presentation ‘Is it worth making? Assessing the information content of new structures‘ given at the 250th ACS National Meeting.