Conformation search performance with XedeX

We did some work a year or two back looking at the various options in XedeX and what their values should be for optimal conformation hunt performance. Of course, to do this you need to define a metric! The general consensus in the literature seems to be that the best metric (at least in a virtual screening context) is to take a bunch of PDB ligands, generate conformers for them from a 1D or 2D input, and see how close you got to the bioactive conformation. There are a few problems with this:

  • The PDB is full of badly-refined and badly-specified ligands
  • The test is very sensitive to how many conformers you generate (an exhaustive systematic search with a one-degree increment is bound to find the right answer, but isn’t a very useful method).
  • The use of RMSD as a metric has problems, in that it’s size sensitive. An RMS error of 2.0A is quite different for a fragment (lousy) and for a tetrapeptide (not bad, actually).

Still, it’s the best suggestion out there, so that’s what we did. The original work was rather long and complicated, so we’ll probably describe it in a later post. The quick summary is that tweaking the options gave a new “default” set that performs significantly better than the original, so FieldScreen was upgraded to use that.

A recent paper from the OpenEye guys presented a new data set for this sort of analysis, in which the protein structures were carefully selected to remove the issue of comparing molecules to bad reference conformations. Accordingly, we felt it was worth revisiting our analysis to make sure that we hadn’t been misled by dodgy datasets previously! We had a few problems sorting out the data set: some of the ligands in the data set we felt were inappropriate as they were covalently bound, and there were a few others where the original authors’ structures differed from our interpretation and we had to go back to the original papers to work out what the true structure of the ligand actually was. Thrashing through these issues cut the data set size from 197 down to 192, of which 20 or so are modified from the original data set.

Overall our performance and Omega’s are fairly similar:

Method <0.25 <0.5 <0.75 <1.0 <1.25 <1.5 <2.0
Omega (200 confs) 9.6 50 71.3 83.0 87.2 92.6 96.8
Xedex (200 confs) 11.6 52 68.0 81.8 88.5 93.6 97.6

The numbers in the table are % correct to within the given RMSD threshold – we’ve expressed it as a percentage rather than as absolute numbers as the xedex results are over 5 repeat runs with different random number seeds – it’s a stochastic method. The relative values from the two programs are intriguing: Xedex gets more molecules completely correct (RMSD<0.25), Omega slightly outperforms in the mid-range, and the two are more-or-less the same at the “wrong” end. These results should be taken with a pinch of salt, however, as the data sets weren’t exactly equivalent (due to the problems mentioned above).

A few interesting things fell out from looking at XedeX’s performance on this data set. Firstly, your chance of finding the bioactive conformation goes down as the number of rotatable bonds goes up:

Graph of RMSD vs rotatable bond count

Or, if you prefer, the same thing using a field similarity measure instead of RMSD:

Well, no surprise you say, as the more rotatable bonds a molecule has, the larger its conformation space is and the less likely you are to find a conformer close to the bioactive one. However, apparently this is not true for Omega (personal communication from Openeye). This is rather surprising: the only explanation I can think of is that their rule-based algorithm has the potential both to get some small molecules very wrong (if their bioactive conformation has torsions which just plain aren’t in the torsion rules database), and to get some larger molecules more right (if said large molecules’ torsions are all very close to the values in the torsion database, then you have a chance of generating the bioactive conformation almost exactly, which is much harder to do with a stochastic search method.

The second thing we looked at was which molecules Xedex struggled on i.e the ones where the RMSD was high but the rotatable bond count was low. The one that stands out is 1s63, which has 5 rotatable bonds but we only get to ~1.45A from the bioactive conformation. Viewing the bioactive conformation made it clear what was going on:

1s63 bioactive conformation

There’s a beautiful cation-π interaction holding the molecule in a folded conformation. The XED force field does cation-π interactions really very well indeed, so you’d think that this would be easy to find. However, the standard XedeX conditions use a modified version of the XED force field in which long-range electrostatic and attractive vdW forces are turned off (as a very quick approximation to a solvation model), and this has the side effect of neglecting intramolecular H-bonding. Omega apparently does something similar, and it gets 1.7A RMSD on this molecule. If you run XedeX with the full force field, then you get the right answer (0.41A RMSD).  This is the tradeoff you make with the modified force field – you get better answers for most molecules, but you lose out on anything with a structural intramolecular H-bond. Running with a “proper” solvation model would help solve this, but is just too slow.