Data in Biotech / Episode 42

How Bayesian Optimization Is Transforming Biotech R&D

with Wolfgang Halter, Head of Data Science & Bioinformatics at Merck Life Science

· 45:32

Key Takeaways

Bayesian optimization reduces experimental cycles by orders of magnitude

Traditional biotech R&D relies on grid search or one-factor-at-a-time experimentation, which scales poorly with the number of variables. Wolfgang explains how Bayesian optimization uses probabilistic models to intelligently select the next experiment, often finding optimal conditions in 10-20 experiments where traditional approaches would require hundreds.

Data infrastructure is the bottleneck, not algorithms

The algorithms for Bayesian optimization are well-established and available in open-source libraries. The real challenge is getting clean, consistent experimental data into a format the optimization engine can use. Wolfgang describes Merck’s investment in data engineering to connect lab instruments, LIMS systems, and optimization platforms into a coherent pipeline.

Domain expertise remains essential

Despite the power of algorithmic optimization, Wolfgang emphasizes that domain scientists remain critical. The optimization engine proposes experiments, but scientists must validate that proposals are physically feasible, interpret unexpected results, and define the objective functions that capture what “better” actually means in a biological context.

Full Transcript

[Full transcript to be added]

Transcript not yet available for this episode.

Frequently Asked Questions

What is Bayesian optimization and how is it used in biotech?
Bayesian optimization is a sequential model-based approach to finding optimal experimental conditions with minimal experiments. In biotech, it guides decisions about which experiments to run next, dramatically reducing the number of trials needed to optimize processes like fermentation, formulation, and assay development.
How much can Bayesian optimization reduce R&D costs?
Depending on the application, Bayesian optimization can reduce the number of required experiments by 50-90% compared to traditional design of experiments (DoE) approaches, translating directly to lower material costs, faster timelines, and reduced labor.
What data infrastructure is needed to implement Bayesian optimization?
You need clean, structured experimental data with consistent metadata, a system for tracking experimental parameters and outcomes, and integration between your optimization engine and laboratory information management system (LIMS). The modeling itself is computationally lightweight — the data pipeline is the hard part.