Artificial Intelligence and machine learning: Where’s the intelligence?

“Of course we have Artificial Intelligence (AI), who doesn’t?” This was a throw-away line, during the closing remarks at a recent conference about future directions in medicinal chemistry. Currently it seems that every company inside drug discovery has an AI project or collaboration. But what is AI in drug discovery, what is it being used for, what are the challenges and what are the opportunities?

The distinction between AI and machine learning

Whilst the phrases ‘Artificial Intelligence’ (AI) and ‘Machine Learning’ (ML) are often used interchangeably, they are subtly different. A true AI method would have the ability to make decisions and propose new ideas from outside its knowledge base, whereas machine learning methods can only use the information from within their knowledge base. In fact, many AI systems are built on a foundation of one or more ML techniques – particularly neural networks. If you were to give the two methods human characteristics, you could say that AI is imaginative and might suggest a completely new series, whereas machine learning is intuitive – it could identify improvements within a compound series, but it would be limited to the transformations present within the training set.

Given that presently, most applications referred to as using ‘AI methods’ in drug discovery utilize modern machine learning methods, we will also utilize the shorthand of talking about “AI” to mean machine learning methods such as ‘deep learning’ which is a method of training very complex neural networks to find patterns in large volumes of complex data (so called ‘big data’). Most of these applications in drug discovery could equally utilize other machine learning methods, such as regular neural networks, support vector machines or random forest regression models as data sets are generally not that complex and not that large. However, there are areas where AI methods have real traction, for example for those working on genomics data, the separation in performance between traditional machine learning and modern AI methods starts to become tangible due to the large size and complex inter-relationships of the data sets involved.

Mining big data in drug discovery

The biggest promise of AI and machine learning for drug discovery is the capability of mining big data. AI methods such as deep learning have been deployed effectively in other industries, most notably in social media interactions.

Most established pharmaceutical companies have large repositories of data stored in large corporate databases built up over many years. This data tends to be noisy simply because of the volume, number of sources and the long time frame during which the data was collected. AI methods are tolerant and accepting of this type of data, providing the data sets are large enough, which makes them eminently suitable for extracting and analysing data from corporate databases. The ability to leverage this corporate knowledge to current targets has the potential to provide tremendous benefit, both in direction and speed of development, which translates to lower development costs.

Over recent years, a number of leading companies have formed strategic partnerships to investigate whether these methods can be transferred to drug discovery (Table 1). AI and machine learning methods are being deployed in a diverse range of areas, ranging from identifying compounds from existing libraries, target identification, target validation to new target discovery. There are also genomics companies who are looking to provide custom drug therapies based on an individual’s bio markers.

However, as with any emerging technology, there are very few case studies where these methods have successfully been applied, so at present these methods remain exciting, but generally unproven.

Table 1: Non-exhaustive list of companies who have formed strategic partnerships involving use of AI.

Company Partner Date Started Target Area
Roche1 GNS Healthcare 2018 Drug candidates
Pfizer2 IBM 2016 Drug candidates
Sanofi3 Berg 2017 Vaccines efficiency
Merck4 Numerate 2012 Drug candidates
Amgen5 GNS Healthcare 2018 Drug candidates
AstraZeneca6 Berg 2017 Drug candidates
Evotec6 Exscientia 2016 Drug candidates
Genentech8 GNS Healthcare 2017 Target validation
Takeda9 Numerate 2017 Target validation



Cloud Pharmaceuticals12




10 disease targets

Biological targets

Multiple targets


Prediction and understanding for smaller data sets

Big data applications are not the sole domain of AI and machine learning; smaller data sets can also be used to generate focused models which are suitable only for a single chemotype. These models can be used to progress a series toward the desired goal, usually removal of some unwanted toxicology, and there are numerous examples where this has been applied – both successfully and unsuccessfully.

Most computational chemists have, at some point, applied machine learning methods to a particularly intractable problem. However, it tends to be a second-tier resource, not because these methods are particularly difficult to employ, but because extracting understanding from these methods is often difficult. Whilst the predictions these models generate tend to be highly reliable, understanding the factors governing the predictability can be difficult or impossible depending on the properties used. If the properties used are chemical drug relevant properties (e.g., logP, PSA, pKa, etc.) then the underlying factors controlling the property under investigation maybe illuminated, but the predictability of the model tends to be lower. If, as often happens, all possible calculated molecular properties are used, then the model predictability tends to increase but the underlying factors controlling the models predictive mechanism can remain a mystery and the user gains no practical insight into the problem, just a method of prediction.

Therefore, there could be a lot of potential in analysing the AI systems currently in operation to confirm the validity of their predictions. It may be possible to use these methods in a way that makes it possible to glean more information about the interplay of factors that govern the predictability of the models, making them more useful to computational chemists.

AI and machine learning at Cresset Discovery Services

Cresset Discovery Services have successfully used machine learning methods when the data is amenable. Of course, producing a ‘black box’ predictive model is not generally enough for us or our customers. Instead, we look for a deeper understanding so a customer can make informed decisions in the context of the rationale behind the predictions made.

AI and machine learning methods are exciting new tools in a growing tool box of approaches which we can apply to bring value to customer projects. The real intelligence lies with the operator in being able to select the appropriate tool for the job. Another comment from the conference mentioned at the start of this blog post: “will AI techniques replace chemists in drug discovery? Probably not for the foreseeable future, but it is clear that chemists who use AI will replace chemists who don’t in the very near future.” Contact us for a free confidential discussion and to discover how we can apply intelligence to your drug discovery challenges.