Investment Thesis
October 11, 2023

Protein Engineering: a New Paradigm Emerging at the Intersection of AI and Biotechnology

Matias delves into Protein Engineering, discussing the critical importance of proteins, the advancements in design technologies, and the transformative influence they have on fields such as biology, materials science, and life sciences.

Matias Salonen
Investor
Protein Engineering: a New Paradigm Emerging at the Intersection of AI and Biotechnology

In this first part of a series of blog posts on deep and frontier technologies, I touch upon the field of protein engineering, with a focus on de novo protein design. Analyzing the topic from a venture lens, I have tried to avoid getting too technical but rather paint an overarching picture of this fascinating field of biotechnology—and what we are interested in as a venture fund.

Actionable Insights

If you only have a few minutes to spare, here is what you need to know about protein engineering through the venture lens:

  • Proteins & protein engineering. Proteins are marvels of biological machinery, involved in most life-critical processes, including nerve signal transmission, digestion, and respiration. Protein engineering is the process of modifying, designing, and optimizing proteins to enhance their properties or create new functions.
  • From Alpha Fold to AI-designed proteins. Understanding how proteins fold is essential to understanding how they function. DeepMind’s Alpha Fold solved the protein-folding problem by accurately predicting the 3D structure from just an amino acid sequence. Similar to how AI solved the protein-folding problem, AI is now able to generate entirely new proteins with desired functionality.
  • A paradigm shift in biology. Being able to design and validate novel proteins at scale computationally is opening up new opportunities across fields of biology, life sciences, and materials science, with applications such as protein-based therapeutics, antibody engineering, and new materials for a cleaner planet. 
  • Design is easy, validation hard. Generative and diffusion-based AI models for protein design are being democratized and made open-source. However, computationally designed proteins need to be tested in a wet-lab environment. This remains a challenge. 
  • Beyond the realm of evolution. When data from experimental validation gets fed back into the AI models, we can further fine-tune their capabilities, and a question emerges: what is the realm of potentially useful proteins existing outside the boundaries of what billions of years of evolution has produced?
  • It’s still early days. We have so far just seen a glimpse of how AI is transforming protein engineering and our capabilities to design novel, highly functional proteins at scale. Opportunities will emerge and we are excited for the startups building the enabling infrastructure. 

Protein Engineering Holds the Promise to Transform Biology & Beyond

We have seen incredible things emerge when the world of biology blends with the latest advances in technology, from gene editing with CRISPR, to accurately predicting how amino acid sequences fold into 3D structures using Alpha Fold, and diagnosing cancer with AI more accurately than humans.

We are now on the brink of another paradigm shift in biotechnology. With the advent of artificial intelligence and computational methods, we can now controllably design novel proteins at scale with desired functionalities—paving the way for transformative impact on advancing science, curing disease, and cleaning our planet.

“Artificial intelligence can create new proteins that may be useful as vaccines, cancer treatments, or even tools for pulling carbon pollution out of the air” – Baker Lab

Proteins explained

Proteins are the fundamental building blocks of all life on earth, participating in most biochemical reactions and processes critical to life. They are essentially “the executive ends of information flow systems in living organisms, each performing one or a few specifically encoded functions, jointly defining the corresponding organism in turn”, as described in a recent article on protein design.

Functioning akin to minuscule biological machines, proteins are responsible for most of the work done in cells, being actively engaged in processes such as transmitting nerve signals, facilitating digestion, transporting oxygen in our bloodstream, repairing damaged tissues and safeguarding us against infections.

From an amino acid sequence to a functional protein

Proteins consist of chains of amino acids, also known as the primary structure. Through interactions between adjacent amino acids, proteins fold into secondary structures like alpha helices and beta sheets. These secondary structures, in turn, fold into a unique three-dimensional shape known as the tertiary structure.

After folding into their complex three-dimensional shapes, proteins can interact with other proteins or molecules to carry out a broad range of functions in living organisms. Therefore, understanding how proteins fold is essential to understanding how they function. Solving this problem, known as the protein folding problem, has been something of a holy grail in the field of biology—and the reason why Alpha Fold by DeepMind was such a monumental breakthrough (but more on that later).

What exactly is protein engineering?

Protein engineering is the process of modifying, designing, and optimizing proteins to enhance their properties or create new functions. It combines principles from molecular biology, biochemistry, and biophysics to manipulate proteins at the molecular level. Typically, the goal is to create proteins with novel functions or properties not found in nature. There are many approaches to modifying existing or designing proteins from the ground up, including through rational design, directed evolution (irrational design), or both (semi-rational design).

From traditional approaches to de novo protein design

Traditional methods in protein engineering such as directed evolution and rational design, mainly focused on imitating or accelerating the natural evolutionary process. Despite yielding proteins with improved performance or even new functions, traditional approaches have constrained opportunities for creating desired functions, due to limitations in altering and enhancing natural protein sequences and structures.

Extensive modifications of existing proteins may also result in functional impairment: natural proteins tend to be inherently unstable, and modifying their sequences can cause unfolding or aggregation. Consequently, the constraints associated with traditional protein engineering approaches, including limitations in modification, potential loss of function, and instability, have made researchers look elsewhere.

In recent decades, computational de novo protein design (creating proteins from scratch) has paved the way for creating entirely novel protein sequences and structures from scratch without any pre-existing template or natural protein as a starting point. De novo design leverages known fundamental biophysical and biochemical principles to come up with novel designs. Proteins designed in this way have demonstrated enhanced stability with distinctive functionality, addressing limitations in traditional evolutionary approaches.

Deconvoluting the Protein Engineering Process

1. Protein design

Protein design involves the use of different approaches (e.g. directed evolution, rational design, and de novo design) to come up with a potential functional protein. Large language- and diffusion-based models can now be used to design new proteins, either singly or in combination. There are a few approaches, or perhaps more accurately starting points for using AI in de novo protein design. 

  • Fixed-backbone design – start with a desired protein structure, and then use an algorithm to figure out what sequence of amino acids will fold into that shape
  • Structure generation – using an algorithm trained on protein structures to create a novel protein structure
  • Sequence generation – by training a model on amino acid sequence data and protein function to generate a new sequence of amino acids that might have the desired function (without knowing the shape the protein characterized by that amino acid sequence will fold into; however this can be predicted accurately with models like AlphaFold)
  • In-painting technique / autocomplete – by starting with a somewhat complete structure or sequence and then using AI fill in the blanks, i.e. autocompleting either the structure or the sequence

Companies like Profluent and Cradle provide software tools and platforms for this process and leverage generative or diffusion-based AI models. Some well-known open-sourced models include ProGen2, ProtGPT 2, and RFDiffusion.

2. In-silico validation

In-silico validation of proteins includes the use of computational methods and simulations to assess and validate the properties and behaviors of the generated protein candidates. This approach allows researchers to predict and analyze various aspects of a protein's structure, stability, function, and interactions before conducting actual experimental work in the lab. For example, companies like A-Alpha Bio use machine learning to assess the binding strength of an amino acid sequence. One of our portfolio companies, PipeBio, provides a platform for scientists to annotate, define and characterize regions and liabilities of antibody sequences to de-risk antibody development, as well as identify optimal antibodies based on molecular properties and sequences.

3. Protein synthesis

Protein synthesis is the actual laboratory process of producing proteins based on their genetic sequences. This process involves creating the protein molecules in vitro (outside of a living cell) through various techniques. The synthesized proteins can then be subjected to experimental analysis to validate the effects of protein engineering and design. Companies like Tierra Biosciences synthesize testable proteins from diverse sources.

4. In-vitro & in-vivo validation

In-vitro validation plays a crucial role in protein engineering by providing direct experimental evidence to confirm or refute how the computational, in-silico models predict the behavior, function, and interaction of the designed protein. In-vitro validation involves conducting experiments "in vitro," Latin for "in glass," indicating that the experiments are performed in test tubes, petri dishes, or other controlled environments outside of a living organism. Companies such as Adaptyv Bio provide protein synthesis and wet-lab testing services.

Market Map

Practical Applications of Protein Engineering

Protein engineering holds the promise of transforming multiple fields of biology, materials science, and life sciences. Applications include protein-based therapeutics, antibody engineering, and enzyme engineering, as well as some more experimental applications like protein logic circuits.

Drug discovery & development

The cost of bringing a drug from discovery to market has ballooned from an inflation-adjusted $100 million in 1990 to over $2.5 billion today, taking 10-15 years on average. If we could reduce the cost and accelerate the development process using AI-driven tools, we could ultimately bring the benefits of modern medicine to more of humanity and across a broader set of treatments.

Fundamentally, a new biological therapeutic needs to satisfy a set of biochemical and biophysical requirements to qualify as a potential drug candidate—with a protein’s amino acid sequence being the starting point. Predicting protein fitness from an underlying sequence would drastically simplify and rationalize novel biotherapeutic development, but the relationship between sequence and fitness remains an elusive goal.

Startups and larger incumbents alike are working on building accurate computational models capable of better predicting multiple biochemical and biophysical properties to close this gap. Ultimately the goal is to increase the flux of molecules with excellent biochemical and biophysical properties that enter in vitro validation and clinical trials.

De novo protein design has emerged as an especially attractive route for designing therapeutic proteins. An example is the IL-2 therapeutic, the world’s first protein therapeutic designed de novo, and has shown promise as an anti-cancer immunotherapeutic.

Notable companies working on protein-based therapeutics and drug discovery include, among others: Serotiny, ElevateBio, LabGenius, Nabla Bio, Exscientia, Generate Biomedicines, Absci, AbCellera, and BigHat Biosciences.

New materials & enzyme engineering

Billions of years of evolution have given living organisms incredible protein-based materials with unparalleled functionalities compared to what humans are able to create synthetically. Protein-based materials found in nature, such as spider silk, collagen, and mussel adhesives, have unique structural and functional properties. Although some of these have been reproduced through biomimicry, protein engineering opens up new realms of possibilities in creating entirely novel materials with altered or enhanced characteristics from those found in nature.

Enzymes—proteins that act as catalysts in living organisms to accelerate biochemical reactions in cells, including respiration and food digestion—have long been used in commercial applications, such as chemicals, biofuels, food & beverage, and consumer products. Protein engineering can be used for de novo enzyme design (enzyme engineering), opening up new avenues for creating enzymes with, for example, increased stability at elevated temperatures or at acidic pH environments.

For example, protein engineering allows for the creation of novel natural biopolymers that can cully decompose naturally, holding promise of replacing petroleum-based plastics. Protein engineering could play a role in reducing food scarcity, waste, and malnutrition around the globe through, for example, bio-inspired food coatings, that help keep food fresh for longer.

Companies working in new materials and enzyme engineering include Biomatter (platform for enzyme design), Protein Evolution (circular plastics), and Arzeda (various enzyme products).

Tailwinds

Accurate protein folding at scale paved the way for AI-powered protein engineering

Until the releases of the Alpha Fold 1 model in 2018 and Alpha Fold 2 in 2020 that revolutionized the prediction of 3D models of protein structures using AI, figuring out the exact structure of proteins was an expensive and time-consuming process—and scientists had only covered a tiny sliver of the universe of proteins. Now, Alpha Fold can highly accurately predict the structure of a protein just from its 1D amino acid sequence “at scale and in minutes, down to atomic accuracy”. Read more about Alpha Fold and its impact on DeepMind’s page.

Advances in generative AI are transforming how proteins are designed

Similar to how AI revolutionized protein folding, we can see the same happening for protein design: AI models can now be used to design proteins at a speed and accuracy not possible before. Until recently, protein engineering required a detailed understanding of the biochemical and biophysical aspects of naturally occurring proteins. In just the span of a couple of years that has changed due to the broader availability of AI models that can spit out high-quality de novo protein designs at atomic accuracy.

Problem Space

Although tools for designing proteins are being democratized, proteins need to be validated in a real-world environment

Generative AI models for protein design, such as Protein Generator from the Institute for Protein Design and RFDiffusion by Baker Lab, are democratizing access to de novo protein design. However, computationally designed proteins need to be validated in a real-world environment. A useful comparison can be drawn from generative AI-based large language models (LLMs):  fluent speakers can quickly determine whether the model is spitting out nonsense—however the same kind of verification for proteins is not as straightforward. Translating computationally made designs into functional proteins in the real world remains a challenge. Companies such as Adaptyv Bio are addressing this by building a full-stack protein foundry.

“It’s not enough to trust that the computer is designing proteins well—you have to actually study these molecules in the real world.” – Basily Wicky

The decider of whether a protein works is what happens when it is in a biochemically realistic environment. Working on novel protein engineering needs to be tied to the making and validation of proteins in a wet-lab environment. Experimental validation is required to validate the desired protein functions, including the protein’s toxicity, immunogenicity and binding properties with other proteins. Solving this expensive bottleneck by integrating high-throughput experimental capabilities into novel generative protein design will accelerate the potential of protein engineering, ultimately transforming multiple fields of biology and beyond.

What does the future hold?

The fascinating thing is what can happen when all of this knowledge from experimental validation gets fed back into even more advanced artificial intelligence models. Additionally, as all of the computationally designed proteins are based on known amino acid sequences and folds, a question emerges: does a world of potentially useful proteins exist outside the boundaries of what evolution has discovered over billions of years?

Our Perspective

1. De novo protein design will be democratized.

The ability to create novel proteins using artificial intelligence will be democratized and open-sourced.

2. Protein synthesis and in-vitro validation will be centralized.

Synthesizing the designed proteins and testing their real-world efficacy will be done via centralized wet labs through something akin to an API request. While larger biotech companies might develop this in-house, this will not make sense for the mass majority of companies working on protein engineering-based applications.

3. Protein engineering for different applications will become highly specialized.

Protein engineering for different purposes (e.g. therapeutics, plastics, food) will become highly specialized and fragmented to application-specific players, and these players will access third-party design and validation platforms to help with protein engineering.

4. It’s still early days.

We have so far just seen a glimpse of how AI is transforming protein engineering and our capabilities to design highly functional proteins at scale. As more companies start working on protein engineering across domains, new opportunities for founders will emerge. The infrastructure layer—the picks and shovels—making this possible is rapidly evolving and we are excited to see what the future holds.

...

We are on the lookout for startups building the future of protein engineering so if you are founding a company that will become a core component of the protein engineering process, we would love to have a chat.

Further Readings

Articles & other resources
Academic publications
More about the author(s)
Matias Salonen
Investor

As part of the investment team, Matias sources and evaluates investment opportunities, builds our internal investment theses, and supports portfolio companies and fund operations, including data-driven VC efforts.

More about the author(s)
No items found.