Protein Sequence Design with Batch Bayesian Optimisation
Abstract
Protein sequence design is a challenging problem in protein engineering, which aims to discover novel proteins with useful biological functions. Directed evolution is a widely-used approach for protein sequence design, which mimics the evolution cycle in a laboratory environment and conducts an iterative protocol. However, the burden of laboratory experiments can be reduced by using machine learning approaches to build a surrogate model of the protein landscape and conducting in-silico population selection through model-based fitness prediction. In this paper, we propose a new method based on Batch Bayesian Optimization (Batch BO), a well-established optimization method, for protein sequence design. By incorporating Batch BO into the directed evolution process, our method is able to make more informed decisions about which sequences to select for artificial evolution, leading to improved performance and faster convergence. We evaluate our method on a suite of in-silico protein sequence design tasks and demonstrate substantial improvement over baseline algorithms.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.