The Foldit protein design paper
Today, the scientific journal Nature published a paper titled De novo protein design by citizen scientists, all about the work of Foldit players!
The paper is written for an audience of professional scientists, and gets somewhat technical. This blog post is meant to summarize the main points of the paper, so that everyone can appreciate the significance of this achievement. If you have trouble accessing the paper on the Nature website, try this view-only online version or check the Baker Lab website.
What is 'de novo' protein design?
The Latin phrase de novo translates literally to “from the new”—we usually use it to mean “from scratch.” Veteran Foldit players will recognize this phrase from De-novo Freestyle Foldit puzzles, where players fold up a protein from a completely unfolded starting position (i.e. from scratch), rather than from a partially-folded starting position.
In the field of protein design, this phrase has a special meaning. De novo protein designs are created without referencing the sequences of natural proteins.
To illustrate, you could imagine designing a 3-helix bundle protein just by looking at the sequences of natural 3-helix bundles and choosing the most common amino acid at each position. Since we have lots of data about natural protein sequences, and powerful ways to extract patterns from data, this method is relatively easy. But it will only ever let us design proteins that are similar to natural proteins.
On the other hand, de novo protein design is much more difficult. Rather than relying on patterns in massive datasets, de novo design requires an understanding of the physical principles behind protein folding. The advantage is that we can use de novo methods to design brand new proteins that are unlike any proteins found in nature.
Why is protein design hard?
A designed protein must fold entirely on its own, without direction or instruction from any outside source.
The number of possible folds for a protein is huge, and a protein dissolved in solution is generally free to sample any of those possible folds. But if the protein sequence is chosen carefully, then the protein chain will have lower energy in one fold than in any other, and the protein will naturally prefer that lowest-energy fold.
It is difficult to choose the sequence because there are also many possible protein sequences (more than there are atoms in the universe!). And, once we choose a sequence for our target fold, we cannot check all the possible folds to ensure that our target fold has the lowest energy.
For a deeper discussion about the difficulties of protein design, see this previous blog post.
How can computer gamers design proteins?
Figure 1 below shows the Foldit game interface. Foldit players have a number of tools that allow them to change both the fold and the sequence of a virtual protein. The player's score is calculated from the energy of the virtual protein, with a state-of-the-art energy function developed by academic protein scientists. By competing with one another to reach the highest score, Foldit players arrive at virtual proteins with extremely low energies (a high Foldit score corresponds to a low protein energy).
Since energy alone is not enough for protein design, the Foldit team has had to make some adjustments to the Foldit score function. Every step of the way, we’ve relied on the work of Foldit players to expose problems with our score function. Foldit players are excellent at exploring new kinds of protein folds that are unlike anything seen in nature. For this reason, Foldit players are incredibly helpful for identifying unanticipated weaknesses in our energy function, and ultimately can improve our understanding of protein folding.
How do Foldit players actually design proteins?
Figure 2 shows that Foldit players design proteins much differently than automatic protein design algorithms. From start to finish, players will routinely accept huge penalties (high-energy spikes; colored traces in panel 2a), that ultimately pay off with low-energy designs.
Automatic algorithms, on the other hand, can only accept very small penalties, and they do so less frequently (gray traces in panel 2a).
How do virtual Foldit designs behave in real life?
Figure 3 shows data from the lab tests that we perform on protein designs from Foldit players.
The first thing to note, in panel 3a, is that these proteins are extremely diverse and span many different protein folds. Due to the amount of planning and creativity required to conceive a protein fold, a protein engineer will usually focus on a small number of protein folds for a given task. This paper reports a greater number of protein folds than any other protein design paper to date—including a brand new fold that is not observed in any natural proteins!
Panels 3c-f show that these proteins are very well-behaved both on the computer and in the lab. The plots in panel 3c show that Rosetta@home computer simulations predict the designs will fold accurately (details here).
Panels 3d-e show that the proteins don’t aggregate together, and are rigidly structured in solution. And panels 3f-g show that the proteins do not unfold except in extremely harsh conditions (read more here. Most natural proteins unfold with only 3-5 kcal/mol of energy; many of the designed proteins are hyper-stable and require >10 kcal/mol!
How do we know that the proteins fold up as designed?
Since proteins are smaller than the wavelength of visible light, we can’t see them directly under a microscope. However, in some cases we can use very intensive techniques to determine the structure of a protein indirectly (read more here and here). We used these techniques to solve high-resolution structures of 4 proteins designed by Foldit players.
Figure 4 shows the exact placement of atoms in the real-life protein structures, which is nearly identical to the virtual protein design in every case.
So, what does this all mean?
This is a huge accomplishment for Foldit players! De novo protein design is a very new field, and already citizen scientists are making significant contributions—not just by designing new proteins, but also by helping us improve our understanding of protein design. We hope that scientists in other fields will be able to find similar ways to engage public creativity and enthusiasm, to increase our understanding of the world.
Now that Foldit players can accurately design high-quality proteins from scratch, we can start to challenge Foldit players with more applied protein design problems. We’d like Foldit players to help us design new proteins that can assemble into multi-component structures and materials, or that can bind to biological targets as potent medicines, or that can degrade toxic chemicals!
Because Foldit depends on the cooperation and competition of its player community, our scientific ability grows rapidly with the number of Foldit players. We look forward to expanding the Foldit community and recruiting more creative and curious Foldit players!
Help us design a protein for cancer treatment right now, by playing Puzzle 1683: Integrin Antagonist Design!( Posted by bkoep 172 3336 | Wed, 06/05/2019 - 18:11 | 6 comments )
New Custom Contests feature
We are excited to announce a new feature in Foldit: Custom Contests! As you may know, contests have been a feature that allows anyone to host their own private Foldit puzzle, chosen from a limited, pre-selected list. Now, you can make your own custom Foldit puzzle of whatever you choose and host it as a contest. We designed the Custom Contest feature especially for educators, who can now tailor their Foldit puzzles to their exact curriculum. There are plenty of other uses as well, including private contests that research groups can use to brainstorm new ideas, or even Foldit parties!
We just published a paper that can be found here that describes the Custom Contests in depth. If you’re interested in making Custom Contests, please email mail.fold.it |at| gmail.com for access.( Posted by beta_helix 172 15968 | Mon, 03/04/2019 - 16:53 | 8 comments )
New Foldit tool: Pick Sidechains
We're introducing a new Foldit tool for folding protein side chains, called Pick Sidechains. Pick Sidechains is currently available to devprev users for testing, and soon will be released for all users!
To use Pick Sidechains, select a segment of your protein and click on the Pick Sidechains button. This starts the tool, which runs continuously until you stop it with the Stop button in the upper left corner of the screen (just like Shake, or Remix).
Pick Sidechains displays a cloud of all possible side chain positions for the selected segment. Each possibility is called a rotamer. Use the mouse to hover over the cloud and highlight individual rotamers. Left-click on a rotamer to apply it and see how it affects the score of your solution.
When you start Pick Sidechains, a new panel will appear that shows the segment you selected, along with a list of rotamer options for that segment.
Every rotamer in the list is labeled with a shorthand name, using the letters ‘m’, ‘p’, and ’t’ to describe the rotation of each bond in the side chain (for “minus”: -60º, “plus": 60º, or “trans": 180º).
The panel also includes a gauge for each rotamer’s prevalence. Certain rotamers are more common than others, and these are typically preferred for protein folding. A full gauge indicates a common rotamer, while a rare rotamer will have an empty gauge. More common rotamers will usually score better, but sometimes a rare rotamer can make an excellent hydrogen bond or fill a void to gain more points!
After you have picked a rotamer, stop Pick Sidechains with the Stop button in the upper left corner of the screen.
A tool for H-bond Networks
In most cases, the Shake tool will still be the fastest and most effective way to fold the side chains of your protein. Pick Sidechains is meant for special situations where Shake performs poorly. One example is designing H-bond Networks.
Shake is not very good for designing H-bond Networks, because it doesn’t know about puzzle Objectives—Shake can only optimize the base Foldit score (without Objective bonuses). This means Shake will sometimes ignore a potential H-bond Network because other rotamers improve the base score, even if the network would yield a huge Objective bonus for more points overall.
In these situations, it’s up to Foldit players to design H-bond Networks by hand. Pick Sidechains should give players improved manual control over side chains, and hopefully will help players design better H-bond Networks!
An example of a well-satisfied H-bond Network designed by fiendish_ghoul in Puzzle 1561. This network spans the interface between the two symmetric units, and is located in the core of the protein, with nine polar atoms that need to form hydrogen bonds (numbered 1-9). Since these atoms are buried in the protein core and cannot make H-bonds with the water around the protein, they need to form H-bonds with each other. Note that there are un-bonded hydrogens on atoms 3 and 5, so this network is not completely satisfied—still, this network is over 80% satisfied, which is very impressive! The Shake tool is unlikely to find well-satisfied H-bond Networks on its own, and may require guidance from Foldit players. Pick Sidechains can help Foldit players build H-bond Networks.
Tips for using the Pick Sidechains
You can select more than one segment for Pick Sidechains, to view multiple rotamer clouds at once.
Some side chains (ASN, GLN, HIS, THR) have different atoms with different bonding abilities, but it can be hard to tell these atoms apart in the default Foldit view. If you have enabled "Show advanced GUI" in General Options, then you can change your View Options to one of the “CPK" Colors (like "Score/Hydro+CPK”). This will color oxygen atoms red and nitrogen atoms blue.
Some side chains (CYS, SER, THR, TYR) have a hydrogen that can be rotated to form hydrogen bonds in different directions. However, this hydrogen is hidden by default. In the View Options menu, change the View Hydrogens setting to “Show bondable hydrogens” to see these hydrogens in your protein. If hydrogens are visible, then Pick Sidechains will display extra rotamers so you can control the position of the hydrogen.bkoep 172 3336 | Mon, 12/03/2018 - 19:49 | 6 comments )
Partition Tournament Final Results
Our Protein Design Partition Tournament concluded yesterday. In the final week, we ran four regular puzzles for the tournament front-runners, so that the greater Foldit community could help us explore the energy landscapes for those designs. Below, we present the tournament results and discuss the final outcomes.
We will host a Science Chat next Tuesday, October 23 at 21:00 GMT. We'll be happy to discuss any questions about the tournament, as well as any other recent Foldit activity! Leave your questions in the comment section below!
The final rankings for the tournament are as follows:
The champion of the tournament is Galaxie, whose design resisted challengers more effectively than any other design. No challenger was able to find a decoy with energy comparable to the design structure; the highest scoring decoy has a probability of only 10-19 (one in ten billion billion)!
Galaxie's winning design
Congratulations, Galaxie! Galaxie will receive the brand new Partition Tournament achievement (coming soon!). Galaxie was also an exceptional challenger in Phase Two, and found the highest-scoring decoy state for 7 different targets!
Two other players will receive the Partition Tournament achievement, for their outstanding contributions as challengers in Phase Two of the tournament:
The most prolific challenger was robgee, who found 20 decoy states for 14 different targets—more than any other challenger!
The most venturous challenger was Mike Cassidy, who found a decoy state for Partition Puzzle (B): MicElephant with an RMSD of 13.5 Å (off the charts—literally!).
Recall that the initial motivation for this tournament concerned our protein design strategy. In a typical Foldit design puzzle, players optimize the absolute energy (the Foldit score) of their design. However, the success of a protein design depends not on its absolute energy, but on its energy landscape. In theory, a design with a mediocre Foldit score can still have an excellent energy landscape, and might be expected to fold up well in the lab. Inversely, a high-scoring Foldit design could have a problematic energy landscape (with other high-scoring decoys), and would be expected to fold poorly. The Partition Tournament was set up to evaluate Foldit players' designs based on their energy landscapes, instead of their absolute energy.
Note that, of the five top-scoring tournament submissions selected for Phase Two of the tournament (from matosfran, actiasluna, phi16, Galaxie, and fiendish_ghoul) all five performed very well in the tournament. None of these designs were overtaken by a high-scoring decoy. This is encouraging, because it suggests that our normal strategy (i.e. optimizing for absolute energy) is an effective shortcut for finding favorable energy landscapes.
However we also see that many of the lower-scoring submissions still performed well in the tournament, and maintained high partition scores. These designs were able to resist challenges from other Foldit players, and seem to have favorable energy landscapes. This indicates that Foldit players may have protein design skills that are not captured by our energy function. If we focus solely on optimizing absolute energy, then we're probably going to miss out on some perfectly good protein designs.
These are exciting results, and we're very keen to keep exploring protein design with Foldit players! Nevertheless, we should recognize a significant limitation of the tournament results.
Most challengers tended to stay very close to the designed structure. In many of the energy landscapes below, we see the majority of solutions (usually a dense cluster of black dots) right around the 2.5 Å RMSD cutoff. The average decoy had an RMSD of only 4.0 Å.
This was also clear in our weekly tournament updates (see here and here), where we looked more closely at some players' high-scoring decoys. In most cases, the decoys were largely identical to the design structure, and only small, localized regions of the design were refolded by challengers.
These low-RMSD decoys only inform us about how these designs could partially unfold. The challengers have revealed how certain regions of the protein might unfold locally. However, it doesn't tell us much about the stability of the protein core, and how the protein might be able to refold globally. For this we would want to see many more solutions that are very different from the design structure, and that explore the energy landscape distant from the designed structure.
If we were to repeat this tournament, we would need to find ways to encourage more explorational challenges. We could imagine applying a bonus that scales with RMSD (as has already been suggested by some of you), by concealing the designed structure from challengers, or some other means to encourage broader exploration of the energy landscape.
Many of the tournament submissions are excellent designs, and these tournament results are promising, but unfortunately we can't be confident enough to test these in the lab until we have a better picture of the global energy landscape. For that, we'll submit these designs to Rosetta@home for ab initio structure prediction, which is very effective for finding decoys that are globally-refolded.
In summary, the results from the Protein Design Partition Tournament support the idea that our regular design strategy—though not perfect—is still effective for finding designs with favorable energy landscapes.
Foldit players were very effective in this tournament at finding locally-unfolded decoy states. In the weekly updates we saw exactly how this happened, and proposed ways to avoid designs that can unfold locally. Specifically, designers should focus on building a substantial, deeply-integrated core that involves all regions of the protein, and avoid long stretches of completely polar residues. Otherwise, parts of your protein may be able to unfold locally while maintaining a high score.
Partition Puzzles Summary
Below are the final energy landscapes and partition functions for all the Partition Puzzles from Phase Two of the tournament. For an explanation of these plots, see this previous blog post. Thanks to all those who participated in the tournament!
( Posted by bkoep 172 3336 | Wed, 10/17/2018 - 22:48 | 3 comments )
Electron Density results and challenge!
We wanted to give you an update on the latest Cryo-EM puzzles.
In Puzzle 1554 we gave you 5 starting models to work with, and these were the results:
For all the energy plots below:
Each green dot represents a Foldit solution plotted against GDT_TS (where a value closer to 1.0 indicates a model closer to what we believe is the native structure) and Rosetta energy (where a very negative value corresponds to a very high Foldit score).
So the further to the right you are, the closer you are to the correct fold... and the lower you are, the better your Foldit score.
We don't know why we were surprised, as Foldit players never cease to amaze us with your incredible results!
This time, however, we really do have a challenge for you... because we've never posted a 221-residue density puzzle before, but these were the results for Puzzle 1579 without any experimental data:
Clearly the starting models we provided you with were nowhere near the native (they were actually 5 different CASP13 server models. You can read more about this at the very bottom if you like).
We realize how big 221-residues is for a Foldit puzzle already, which is why we are giving you over 2 weeks to work on it with electron density.
We know this is a big ask, but we also know that if anyone can do this: it's you!
Best of luck, and keep up the great folding!
For those interested in the background details for these puzzles:
As we mentioned in Puzzle 1554, the recent puzzles are part of a large protein complex with multiple subunits, which has recently been the target of some cryo-electron microscopy (cryo-EM) experiments.
These complexes were actually targets in CASP13 this past summer, but the experimentalists were kind enough to provide us with their cryo-EM data once CASP was over.
The first subunit (from Puzzle 1554) was part of this CASP13 target and the recent 221-residue subunit (Puzzle 1579) was part of this CASP13 target. You can see how large these subunits are, which is why we tackled the 149-residue protein first, and trimmed the 229-residue one for the most recent puzzle.
Most interestingly, Puzzle 1579 has no known homologs (or related proteins that have already been solved), which explains why the CASP servers had so much trouble with their predictions.( Posted by beta_helix 172 15968 | Tue, 10/16/2018 - 17:27 | 6 comments )