The problem of protein design
This is the first of a three-part blog post. In the first part, we’re going to review the concept of energy landscapes, which some of you may already be familiar with. In the second part, we’ll discuss how a concept from physics, called a partition function, can help us think about energy landscapes. In the last part, we’ll propose a way that we might use these concepts of energy landscapes and partition functions to improve protein design in Foldit.
The energy landscape
There’s a problem with the way we currently design proteins in Foldit—and not just in Foldit, but also in Rosetta. In fact, it’s a problem in any protein design strategy that optimizes the absolute energy of the design. This strategy is the premise of a Foldit design puzzle. The Foldit score measures the absolute energy of a solution (with a negative multiplier), so that when players compete to find solutions with the highest score, they are actually competing to find solutions with the lowest absolute energy.
However, the success of a protein design (i.e. whether or not the protein folds) does not depend only on the absolute energy of the design. Rather it depends on the protein’s energy landscape. The energy landscape is a concept we use to think about all the possible ways that a string of amino acids can fold. As any Foldit player knows, there are a lot of different ways to fold up a string of amino acids, and they all have different energies (or Foldit scores). We can imagine the energy landscape as a surface where every (x,y) coordinate represents a different fold, or state, and the height of the surface (the z-coordinate) represents the energy of that state. In some places there will be hills, which represent states with a high energy (low Foldit score), and in other places there will be valleys, where folds have a low energy (high Foldit score).
Conceptual illustration of a protein energy landscape, from Dill, K.A. and MacCallum, J.L. (2012)
One of the reasons we like the analogy of energy landscapes is that we intuitively understand how things tend to “prefer” low points in the landscape. If you place an object randomly on the energy landscape, it will tend to slide downhill, from a high-energy state to a low-energy state. If we consider the effect of thermal motion that is constantly jostling around the object (imagine a Mexican jumping bean that randomly jumps around the landscape), then the object will explore all the different valleys of the energy landscape. Nevertheless, the Mexican jumping bean will spend the most time in the deepest valleys of the landscape.
A protein behaves the same way in its energy landscape. At room temperature, there is a considerable amount of thermal motion that allows the protein to explore its energy landscape, although the protein will spend the most time in the states with lowest energy. Every amino acid sequence has a different energy landscape, with different valleys in different places. When you mutate amino acids in a Foldit puzzle to find higher scores for your design, what you’re really doing is looking for an energy landscape where your design is in a deeper valley. However, the Foldit score only tells you about the energy of your designed folded state—or the “depth” of your desired valley. What we’re not considering in Foldit is the rest of the landscape, and whether there might be other low-energy “decoy states”—other deep valleys for your protein to explore.
This is a difficult problem to solve because the energy landscape for a protein is vast. It’s difficult to account for the decoy states because we don’t know what they might look like. We don’t know where to search in the energy landscape for other low-energy valleys, and the landscape is too big to search exhaustively.
The search for decoys
As many of you are probably aware, a lot of the recent De-novo Freestyle prediction puzzles have targeted Foldit player-designed proteins. The purpose of these puzzles is to look for low-energy decoy states, or alternative valleys in the energy landscape. We already run Foldit designs through Rosetta@home to look for decoy states—and for the most part, Rosetta@home seems to do a pretty good job. But occasionally Foldit players find solutions that Rosetta@home misses.
In the following example we're going to pick on fiendish_ghoul, because this energy landscape problem is clearly illustrated by two of their designs, shown below:
The protein on the left is a design originally from Puzzle 1331; the protein on the right is a design from Puzzle 1239. Beneath each cartoon protein structure is a scatter plot with the results from corresponding De-novo Freestyle puzzles that we posted using the sequence of each design. Each black point represents a solution, plotted with respect to its RMSD to the folded state (x-axis) and its energy (y-axis). Together these points give us a profile of the energy landscape for each protein. We see that the design on the left has a “funnelled” landscape, such that the lowest-energy solutions are those close to the folded state (RMSD close to zero) and solutions very different from the folded state (large RMSD) all have higher energies. In the design on the right, however, Foldit players identified a number of decoy states that are very different from the folded state (large RMSD), and have energy just as low as the folded state. These decoy states (marked with colored circles in the scatter plot) appear as “valleys” in the energy landscape of the protein.
The cartoon structures of these decoy states are shown below using the same rainbow coloring as above, with the N-terminus of the protein colored blue, and the C-terminus of the protein colored red:
In each of the decoy structures, all of the α-helices and β-strands are there, but it appears there is some ambiguity about where the helices should go. According to the solutions from the De-novo Freestyle puzzle, the three α-helices can fold in different arrangements around the central β-sheet, and all of these arrangements have similar energies. Since all of these states have similar energy, the protein will not have a strong preference for any single one of them.
Both of fiendish_ghoul's proteins were designed by optimizing their absolute energy, but the protein on the right has a problematic energy landscape. If we made these proteins in the lab, we would expect the protein on the left to be well-folded, and to spend most of its time in the designed state, since it appears to be the only deep valley in the landscape. However, we would expect the protein on the right to be poorly-folded, and to spend its time sampling all the different decoy states discovered by Foldit players.
Check back on Monday for the next blog post, where we’ll discuss these energy landscapes in more detail!( Posted by bkoep 90 630 | Fri, 08/24/2018 - 21:17 | 3 comments )
WeFold paper on CASP11 is now published!
The latest WeFold paper titled: "An analysis and evaluation of the WeFold collaborative for protein structure prediction and its pipelines in CASP11 and CASP12" was just published in Nature's online open access journal: Scientific Reports.
This publication describes the results and analysis of the CASP11 and CASP12 WeFold coopetition (cooperation and competition), highlighting lessons learned and improvements over the first WeFold attempt from CASP10.
Foldit Players consortium are listed as co-authors, and the Acknowledgements section of the paper begins with: "The authors would like to acknowledge the collaboration of hundreds of thousands of citizen scientists who contributed millions of decoys through the Foldit game."
Congratulations to all of you and keep up the great folding!( Posted by beta_helix 90 1764 | Tue, 07/03/2018 - 22:12 | 8 comments )
WeFold paper on CASP11 has just been accepted!
The paper is titled: "An analysis and evaluation of the WeFold collaborative for protein structure prediction and its pipelines in CASP11 and CASP12"
and just as with the first WeFold paper about CASP10, Foldit played a very large part in CASP11!
In order to publish these results, however, we must now abide by the new authorship policy that journals have now implemented requiring author names and affiliations for all authors. Previously, we had used "Foldit Players" (or Players, F.) to represent all of you.
Similar to Foldit's previous publication in Nature Communications, for this paper you will all be under the group consortium: "Foldit Players", and anyone who played a CASP11 puzzle has the option to list their complete real name (we cannot use Foldit usernames).
If you played one of the CASP11 puzzles and would like your full real name to be included in the group consortium list for this paper, please follow the directions below in the comments by Saturday June 2nd at 11:59pm GMT
We would like to emphasize that this is completely voluntary, as we will of course also have a statement in the acknowledgements thanking all Foldit players, just not by name.( Posted by beta_helix 90 1764 | Thu, 05/24/2018 - 02:35 | 6 comments )
Foldit's 10 Year Anniversary!
Today marks the 10-year anniversary of Foldit’s launch on May 9, 2008!
In the past decade, Foldit players have advanced protein science by accurately predicting the structure of a viral protein1, by developing an algorithm for protein modeling2, and by redesigning a protein enzyme with improved activity3. Foldit players have shown that they can refine protein models better than sophisticated computer programs4, and that they can interpret electron density maps as well as expert crystallographers5. We have high hopes for the next 10 years of Foldit, and can't wait to see what Foldit players will discover next!
Protein Design in Foldit
Most recently, Foldit players have been designing brand new proteins from scratch. The ability to design proteins is a big milestone for Foldit players, and we’re excited about the new types of problems that we can start to tackle with protein design in Foldit! This achievement has been a long time in the making—below you can review previous blog posts to follow this progress over the last four years. Play the latest design puzzle now!
Nov. 1, 2013 - First batch of Foldit player-designed proteins selected for testing
Mar. 25, 2014 - Improvements in Foldit player-designed proteins
Jun. 18, 2014 - First positive testing results for a Foldit player-designed protein
Feb. 10, 2015 - First alpha/beta Foldit designs selected for testing
Feb. 28, 2017 - Better backbones yield promising alpha/beta designs
Mar. 1, 2017 - Diverse player designs fold up in the wet lab
Apr. 15, 2017 - Protein crystallography of a Foldit player design
May 30, 2017 - X-ray diffraction of a protein crystal
A high-resolution crystal structure (cyan) aligned with the design model (green) shows that this protein folds up just as it was designed by Waya, Galaxie, and Susume. The protein backbone aligns to the design with a Cα RMSD of 1.1 Å, and the sidechains in the protein core pack just as intended.
Small Molecule Design in Foldit
We’re also excited to ramp up small-molecule design in Foldit, allowing Foldit players to create new ligands that could bind to protein targets! Play the latest small-molecule design puzzle now!
New tools allow Foldit players to build small molecules that can bind to protein targets
We'd like to thank all the Foldit players that have contributed to Foldit over the last 10 years! None of this would have been possible without you! Happy folding!
1. Khatib, F. et al. Crystal structure of a monomeric retroviral protease solved by protein folding game players. Nat Struct Mol Biol 18, 1175–1177 (2011).
2. Khatib, F. F. et al. Algorithm discovery by protein folding game players. Proc Natl Acad Sci U S A 108, 18949–18953 (2011).
3. Eiben, C. B. et al. Increased Diels-Alderase activity through backbone remodeling guided by Foldit players. Nature Biotechnology 30, 190–192 (2012).
4. Cooper, S. et al. Predicting protein structures with a multiplayer online game. Nature 466, 756–760 (2010).
5. Horowitz, S. et al. Determining crystal structures through crowdsourcing and coursework. Nat Commun 7, 12549 (2016).
Aflatoxin Challenge Update
Foldit Community, we are ready for an Aflatoxin update! Thank you for all your efforts in designing the first round of designs to combat aflatoxin. To date we have had hundreds of players design >400,000 structures! There is a lot to choose from and it has been a fantastic way to start off this effort. Players have been incredibly creative, and have engineered exciting new molecular interactions that truly have the potential to stabilize the aflatoxin hydrolysis transition state we provided in the models. Two examples are below: the first is a pi-pi stacking interaction between a tryptophan and the conjugated ring system of aflatoxin AFB1 (left, Figure 1). If the designed protein is stable enough in this state to provide this interaction in the physical world it would almost certainly position aflatoxin in a manner poised for enzymatic hydrolysis. The second interaction is a classical hydrogen bond with a ketone group (right, Figure 1). Again, these energetically favorable and strong interactions will provide the physical properties needed to stabilize aflatoxin AFB1 in a manner ready for hydrolysis by the naturally occurring catalytic core of the enzymes. We look forward to seeing more of these interactions in subsequent rounds!
Figure 1. AFB1 interactions in Foldit player designs. In both images the white structure is the native enzyme (starting structure), and the green is a Foldit player design. Left, a double mutation accommodated by some backbone movement enabled a pi-stacking interaction between tryptophan and AFB1. Right, a single mutation and backbone change introduce a hydrogen bond with the ketone group on AFB1. Keep making changes! And remember that it is not only important to interact with AFB1, but also to stabilize the new protein structure to reinforce the AFB1 interactions!
On the experimental front we are excited to announce that we have transferred all methods into a microtiter plate format and have tested the first 100 GeneStrings from ThermoFisher designed by players (Figure 2). These were selected on a variety of factors (~30 of the top scoring based on overall score; ~30 based on top AFB1 energy with above average overall score; best score of 20 players sorted on best scoring designs; all of the player Scientist Shares). While the process from design to data was seamless, unfortunately all of the data was negative (i.e. none of the designs degraded aflatoxin). We are going back to this first round and re-evaluating how we picked designs, as well as going back and refining some of the designs we thought were most interesting ourselves, however in the meantime we want to get another round of puzzles going. Don’t lose hope! We expect this will require several rounds of design as we optimize the puzzle and solution selection parameters, as well as start to prepare a few new scaffold proteins to try. But we are confident we will find something in the next few rounds and we appreciate your diligence and efforts in helping solve this global issue!
There was great feedback from the community about the original puzzles and we plan to adjust the next puzzles based on this feedback. Most importantly, we plan to trim some of the frozen protein regions, so you can deal with a smaller puzzle and focus on the regions of interest. We may also upweight the ligand scoring, to encourage more interactions with the ligand. Although some players have requested the ability to move the ligand around the binding pocket, we will continue to keep the ligand fixed in place; the ligand's orientation with respect to the catalytic residues is critical if we want the reaction to occur!
We will continue to update you as we go, but we're off to a strong start! Check out Puzzle 1497: Aflatoxin Challenge: Round 5 now!( Posted by 90 1764 | Wed, 03/14/2018 - 23:04 | 6 comments )