Title : Engineering stable, expressible, functional industrial enzymes with protein sequence likelihood models
Protein sequence likelihood models are a rapidly emerging class of deep learning algorithms which learn the likelihood of each potential amino acid to occur in a given structural, evolutionary, or sequence context. Recently, these models have demonstrated impressive performance in predicting the relative fitness of variants assessed in deep mutational scanning (DMS) experiments without any task-specific training (i.e. in a zero-shot setting). A limited number of DMS experiments have assessed fitness in terms of stability-related properties, but previous sequence likelihood models have performed well in these cases. We conduct a comprehensive analysis to assess the capacity of newly published models to generalize to direct experimental measurements of thermostability across variants of hundreds of heterogeneous proteins. We explore performance relative to state-of-the-art models trained on stability data, assess situations in which likelihood is at odds with stability, and examine the complementarity of likelihood models conditioned on different contexts. Further, we suggest that since zero-shot likelihood-based predictions of thermostability indeed correlate with experimental measurements, protein likelihood models may be used to generate new sequences which simultaneously possess desirable yet often conflicting properties in protein engineering pipelines: improved function, expressibility, and stability. We show how structural and evolutionary likelihood models can be combined to perform ultra-high-throughput variant screening in silico, and how traditional biophysical simulations can further enhance stability of leading candidates. Carbon dioxide-sequestering carbonic anhydrase and plastic-degrading PETase are used for case studies in efficiently selecting stabilized variants with retained folding and function for in vitro analysis.
- The audience will be able to use the described methods to rescue designed enzyme catalysts with lost function, folding, or stability.
- The techniques described will allow rapid prioritization of candidates for accelerating popular in vitro enzyme design strategies such as directed evolution.
- Compared to traditional methods of in silico design, including molecular dynamics and biophysical simulations, the computational burden is reduced by 3-8 orders of magnitude.
- Sequence likelihood models give thermostability information complementary to biophysical models, with higher accuracy and generality than previous-generation statistical potentials.
- The incredible speed of prediction for candidates with arbitrarily many mutations allows for an unprecedented search of the sequence fitness landscape.