Improvements in Foldit designs

Hi all, I wanted to share some exciting results we've gotten from folding predictions of Foldit designs!

As many of you know, after a design puzzle closes we submit a selection of Foldit player designs to the Rosetta@home distributed computing project. Rosetta@home distributes your design sequence to 100,000s of home computers all over the world, so that each computer can calculate a prediction about how that amino acid sequence might fold up. This huge dataset of predicted structures tells us a lot about the weaknesses of a design, making this the most rigorous test available to validate designs before we construct the actual proteins in the lab.

The plots below show Rosetta@home datasets from two Foldit monomer design puzzles. Each red dot represents a different predicted structure, and is positioned according to its RMSD (root-mean-square deviation in Cα position; closer to 0 means closer to the designed structure) and its score (a calculated potential energy; more negative score means a more stable structure). What we like to see is a "funnel" running from the upper-right to the lower-left of each plot. This indicates that predictions very different from the design structure are unstable, and that more similar predictions are more stable.

The top-most plot represents the top-scoring solution from Puzzle 798, which we ran in October of 2013. Note that the closest prediction has an RMSD of >2 Å, meaning that no prediction even got close to the designed structure. Furthermore, the closest predictions were not even the best-scoring; the lowest-energy prediction for this structure has an RMSD of 7 Å, representing an entirely different fold.

The lower three plots represent the three top-scoring* solutions to Puzzle 854, which closed a couple weeks ago. Each of these plots shows that the lowest energy prediction is <1 Å RMSD from the designed structure—an incredible result (the first funnel is stronger than many of the designs we come up with in the Baker Lab). Perhaps even more exciting than the quality of these folding funnels is the fact that they were derived from the best-ranked Foldit solutions, whereas in previous puzzles, scientists in the lab have been able to identify poor-ranking designs that fold better than the top-ranked solutions. These are all very exciting results, and a batch of designs from Puzzle 854 is being fast-tracked to lab production presently.

We appreciate all the effort that our Foldit players have invested in adapting to the recent changes in gameplay, and a big thank you especially goes to all those players who have been helping us troubleshoot and fine-tune the latest design tools. Note that we are still working to slim down the client and optimize these tools to be efficient as possible. Likewise, there is still plenty of room for improvement on the side of the Foldit players (we'd love to see some more beta-sheet designs** :P). Stay tuned for results from the lab!

*These are the three top-scoring designs that did not come from the same group, since players from within a group often have very similar top-scoring solutions. These designs are all significantly different from one another.
**We're working on this as well. Foldit is inherently biased towards helices, so this will be a bit of an uphill battle.

( Posted by  bkoep 88 1379  |  Tue, 03/25/2014 - 06:07  |  9 comments )
7
spvincent's picture
User offline. Last seen 9 hours 15 min ago. Offline
Joined: 12/07/2007
Groups: Contenders
Thanks for the update bkoep:

Thanks for the update bkoep: good to hear that the painful transition to NC appears to be paying off.

Can we get an explanation of what the green dots in the plots represent?

bkoep's picture
User offline. Last seen 7 hours 21 min ago. Offline
Joined: 11/15/2012
Groups: None
Of course

The cluster of green dots in the lower left corner of each plot represent random perturbations/relaxations of the designed structure. In this case, they mostly serve as a control to verify that there isn't a lower-energy conformation nearby.

Apologies for the images. The notable differences between these structures exist in the protein core, which is difficult to represent with a static image from any angle.

spmm's picture
User offline. Last seen 5 weeks 2 days ago. Offline
Joined: 08/05/2010
Groups: Void Crushers
Image orientation

It is difficult to see the differences between the three 854 solutions because the orientations are different.

spmm's picture
User offline. Last seen 5 weeks 2 days ago. Offline
Joined: 08/05/2010
Groups: Void Crushers
Ah tx

The core is Not that easy to see when you are folding either, :) thanks bkoep.

Joined: 09/24/2012
Groups: Go Science
A question

Sorry, I don't understand well.

In graphic 1, does it mean that fold it players got a solution that Roseta@home could not find? but that this is not a very stable solution (because we do not see a beautiful funnel)? so that this design should be rejected?

For graph 2, you say that "the first funnel is stronger than many of the designs we come up with in the Baker Lab". If scientists can design anything in the lab, why are we useful?
How do you know that the funnel is "stronger"? I suppose not only visually? What is the metric? Is it a kind of algorithm to calculate this?

Now if the predictions got to <1 Å RMSD from the design, does it mean that Roseta@home was as successful as fold it players? In this case, are we only useful to prove that NC is working well (but Roseta@home could find a solution alone)?

Are there situations where roseta@home could find a better solution than the design?

Sorry for these dude questions showing how bad I could understand the explanations (but I would be very exited to understand ..).

bkoep's picture
User offline. Last seen 7 hours 21 min ago. Offline
Joined: 11/15/2012
Groups: None
@Bruno

In this case, the Rosetta@home calculations are trying to solve a very different problem from the Foldit players. These data should not be interpreted as a comparison between Rosetta and Foldit.

The Foldit players are solving a design problem; they fold up a structure and then mutate residues so that the structure is stable. Rosetta@home is solving a structure prediction problem; it starts with a fixed sequence, and tries to find its most stable structure (imagine a de-novo freestyle Foldit puzzle that starts with your design unfolded into an extended chain). Rosetta@home serves as a kind of test for the design, and can yield a few different results:

Sometimes, Rosetta@home will try to refold a design only to discover a more stable fold with the same sequence (this would look similar to the first plot, but worse—maybe with red dots in the bottom-right corner). This would suggest that if we were to construct this protein in the lab, the polypeptide would fold up into a better alternative structure(s). This is a failed design.

More frequently, Rosetta@home returns a plot like the first one above. In this case, it was not able to find a better alternative structure, but it also was not able to find the intended structure. This is not necessarily a failed design, but it indicates that the design does not follow patterns that are known to promote stable folding. Maybe the protein will fold, probably it won't. If we could afford the time and money, we would test all of these designs.

Finally, when we see plots like the lower three, we know that Rosetta@home was able to find a stable structure for the sequence, and it happens to match the designed structure. At this point, all the data we have indicates that this protein will fold up into the predicted structure.

To address some of your other questions: We do have metrics for evaluating the "funnel-ness" of Rosetta@home data, but in this case the data is loud and clear. And there are many reasons why it is useful that Foldit players can come up with strong-funnel designs. The biggest one is precisely that Foldit players are NOT in the lab—successful Foldit designs would demonstrate that you don't have to have a PhD in structural biology to design a protein. On top of that, these initial designs are simple stepping stones to more complex (and applicable) design problems.

Joined: 09/24/2012
Groups: Go Science
Thks for your answers !

A latest question: If I understand well, you can build an AA primary sequence in the labo (but not it's SS), then it folds by its own in the solution? In the first case, you would have at least a mixing of 2 types of proteins (and even more because these are unstable)? In case 2, there is a chance that you get a stable solution of one pure protein with the designed SS?

If this is true, I suppose you are able to synthesize a native protein knowing its primary structure? But you need our SS to be able to understand what this protein can do?

=====

I'd like to see a small graphic as those funnels when available after the score of the solutions you tested with Roseta@home. Just for self satisfaction. To see that, some times, our result "passed the test". or whatever other kind of reward. For example, it would be exciting for low scoring designers to see that their intuition (shared with scientists or not) was quite good.

bkoep's picture
User offline. Last seen 7 hours 21 min ago. Offline
Joined: 11/15/2012
Groups: None
@Bruno

All of that is correct, more or less. Realistically, proteins in solution are dynamic molecules that are always shifting conformation. If there is not a single favored, low-energy conformation for a polypeptide, that peptide will usually exist as an unstructured "random coil" and will either aggregate or be degraded in the cell. If there is one conformation that is much lower in energy than all other conformations, the polypeptide will naturally fold up into that conformation.

For native proteins, we generally know the amino acid sequence (primary structure) and can synthesize the protein in the lab. However, we don't know the protein's conformation (secondary/tertiary structure), which is important for understanding the protein's function.

We've been thinking a lot recently about ways to give feedback to players with good designs. We can't post all of the Rosetta@home results we collect, but maybe we could share some of the most encouraging funnels.

egran48's picture
User offline. Last seen 25 weeks 3 days ago. Offline
Joined: 03/31/2014
Groups: Go Science
Thank you

Thank you for taking the time to post and answer these questions. It is very informing and gives us the incentive to work towards better outcomes. Many of the folders are in this for the science not the competitive scoring and rankings. Any 'science' information is a great incentive. Thanks to everyone involved.

Get Started: Download
  Windows    OSX    Linux  
Windows
(Vista/7/8)
OSX
(10.7 or later)
Linux
(64-bit)

Are you new to Foldit? Click here.

Are you a student? Click here.

Are you an educator? Click here.
Search
Only search fold.it
Recommend Foldit
User login
Soloists
Evolvers
Groups
Topics
Top New Users
Sitemap

Supported by: UW Center for Game Science, UW Department of Computer Science and Engineering, UW Baker Lab, VU Meiler Lab,
DARPA, NSF, NIH, HHMI, Microsoft, Adobe, RosettaCommons