The Design-Make-Test-Analyze Cycle Needs a Tune-up for AI-driven Drug Discovery

Working with predictive model-driven technology at my last two companies has put a fine point for me on an issue that I am certain causes ML and computational drug hunters to lose sleep. The Design-Make-Test-Analyze cycle needs a serious tune-up!

Here’s the scenario for small molecule discovery at a small to medium biotech: Assume the team has overcome the roadblocks for protein production and is ready to assay compounds. The data science team creates their latest Kaggle-award winning ML model for predicting potent, drug-like, and generally awesome molecules which can both validate the model and drive programs forward. The team, excited to try out this new capability, submits the order for synthesis to the CRO!

Now they wait…and wait…and wait…

Everybody is doing their job as fast as they can. Nevertheless, molecules trickle in over the next two to three months. After coordinating assay resources (because the arrival date was unpredictable), all-told, four to five months have passed since the team generated the list of awesome compounds. Despite the original excitement, someone now needs to go back and hunt down their e-notebook entry to determine why these molecules were ordered in the first place.

In this scenario, with months of waiting between Design and Test, we may get only a few cycles of informed iteration in a year. Furthermore, on top of the long lead time, if these are bespoke synthesized compounds, the cost to produce even a small number of molecules is substantial and hard to justify for an unvalidated model. Certainly, getting enough data to inform a more powerful model for optimization is beyond prohibitive.

While big pharma companies and some biotech companies are working towards solutions to this problem using automated flow chemistry, startups and small/medium biotech companies are stretched to even access a sufficient set of building blocks needed to generate a modest virtual collection.

My conclusion, predictive drug discovery needs a new business model geared towards data science-driven compound identification.

One idea is based on my experience working with @Xia Tian in our multi-year effort to find a way to test more molecules for less money at X-Chem. These efforts ultimately culminated in our successful demonstration of ML/DEL (https://pubs.acs.org/doi/abs/10.1021/acs.jmedchem.0c00452). Our approach was to use the building blocks we had in house for DEL library synthesis. We used our predictive DEL model against a virtual library derived from single step, simple reactions. Then, we generated those molecules with minimal purification of the product chemistry. This approach can be readily enhanced with automation that is accessible to biotech companies or can be provided as a service at local chemistry CROs that can turn around reactions in days. Quickly acquiring 1000s of data points, even if imperfect, could be the key to produce refined predictive models to drive optimization.

To enable rapid synthesis and testing of large numbers of molecules, building blocks need to be co-located with automation and testing capabilities to remove starting material delays. I have started talking to chemical suppliers to explore offerings where customers buy small-scale amounts of large numbers of building blocks, enabling a rapid initial read-out that dovetails perfectly with predictive approaches. While I believe this model would be self-sustaining on its own, it is even more attractive when suppliers factor in that hits will need to be followed up with full-scale synthesis, and that they (the suppliers) can sell the starting material. The repeat business will be worth it, not to mention a higher value proposition to the consumer.

Reach out with thoughts on how this could work and happen sooner than later. Biotech colleagues, comment or like the post. This will give me informal evidence to show the suppliers that the customer base is there for rapid follow-up, DEL library construction, and hit-expansion libraries.