EPSRC-funded software for research communities project to position OpenMM community at forefront of latest CADD developments
A recent EPSRC application in support of a more sustainable, community-driven development model of OpenMM.
As many of you will know, we have a long-standing and fruitful collaboration with the Michel group at the University of Edinburgh, developing and improving free energy methods. Jenke Scheen, a PhD student who we sponsor in that group, has published the paper 'Data-driven Generation of Perturbation Networks for Relative Binding Free Energy Calculations', Digital Discovery, 2022 (DOI 10.1039/D2DD00083K) on an interesting way to combine deep learning and free energy calculations.
In order to understand this advance, a quick primer on relative binding free energy calculations is in order. RBFE methods such as Cresset’s Flare™ FEP work by performing alchemical transformations where one ligand is subtly mutated into another one in the course of a series of dynamics simulations. Analysis the results allows one to calculate the difference in binding free energy of the two ligands (ΔΔG). Although the technique can be applied just to individual pairs of ligands, in practise a far more common use case is to process a set of related ligands all at the same time. The question then arises: for which pairs of molecules in the data set should I compute a ΔΔG? You could just do every pair, but the simulations are quite time consuming, so that rapidly becomes too expensive. Complicating the issue is that the accuracy of the calculation depends on the size of the transformation: a 'big' transformation is likely to be much noisier than a 'small' one.
The art, therefore, is to try to create a graph such that all of the ligands are connected together, the calculation time is minimized, and the overall error is also minimized. This is hard to do well! The paper by Xu Huafeng (DOI 10.1021/acs.jcim.9b00528) placed a firm mathematical foundation on the problem, presenting an algorithm to create an optimal network. Unfortunately, the algorithm required as input an estimate of the likely errors to occur in each link, and that is not generally available. A different approach, LOMAP (DOI 10.1007/s10822-013-9678-y), plans out a network by first using heuristics to decide how 'easy' each link will be (which is assumed to roughly correlate with the number of atoms that change between the two ligands), and then constructs a network that preferentially uses easier links.
The work by Scheen and Michel takes the LOMAP approach and extends it. Rather than use a set of hand-generated heuristics to work out how 'easy' a transformation might be, why not train a machine learning algorithm to compute this for us? The difficulty here is that deep learning methods require a large amount of training data, which would be infeasible for RBFE calculations due to the time required. The key insight that led to this paper was that, in general, the difficulty of a transformation lies in the ligand being transformed, not in the details of its environment, and hence you could use computed solvation free energies rather than protein binding free energies to train the network. These are much faster to compute, so Scheen and Michel were able to put together a training set of nearly 4,000 transformations, each of which was performed in quintuplicate to get error statistics.
Based on this data, they were able to create a novel AI model which, given a pair of molecules, can estimate the difficulty of transforming one into the other in an RBFE context. Validation shows that the model significantly outperforms using a random network, and is comparable to using the scores produced by LOMAP (which was hand-tuned by experts). For full details on the model and the validation please do read the paper. This is an exciting advance and demonstrates the benefits that will be obtained by fusing together AI models and simulation-based methods, to take advantage of the strengths of each.