Foldit Blog
Blog Feed
This is the place where we will describe some of the outcomes and results of your folding work, provide a glimpse of future challenges and developments, and in general give you a better sense of where we are and where foldit hopes to go in the future.

The energy landscape optimization paper

This blog post is a walk-through for an upcoming paper, showing how researchers at UW and Harvard developed a new method for protein design. This research relied heavily on the work of Foldit players, who will be listed as authors on the paper. (If you have played a Monomer Design puzzle in Foldit, you can opt in to the author list here).

The paper has not yet gone through peer review, but a pre-print draft of the paper is already available online. The paper is written for highly-specialized academics with a scientific background, but we think its content can be appreciated by anyone with an interest in protein folding and design.

Below we discuss some of the background for this research, take a look at the figures, and review the main points of the paper.

What is an energy landscape?

The title of the paper is Protein sequence design by explicit energy landscape optimization. Before we jump in, we will need to make sure we understand the idea of an energy landscape. We’ve discussed energy landscapes previously, so let’s recap:

There are a lot of possible ways that an unfolded protein might fold up (think of all the different knots you can tie with a shoestring). Each of these possible folds has some amount of free energy, which depends on the amount of clashing, voids, H-bonding, etc. The lower the free energy, the more stable the fold.

In an energy landscape, we like to imagine all of these possibilities laid out on a grid, like a map of possible folds. Then we imagine that the depth at each point of the map corresponds to the energy of each fold. There will be deep valleys and wells where we have stable, low energy folds; and there will be hills and peaks where we have unstable, high energy folds. This map is our energy landscape.

An unfolded protein will naturally fold into its most stable structure. This is the structure with lowest free energy (the deepest point in the landscape).

Every protein sequence has a different energy landscape. Most random protein sequences have a featureless landscape with many, many shallow wells of similar depth. These sequences will not have a strong preference for any particular fold, and they will be poorly folded in real life.

On the other hand, a well-designed protein sequence will have an energy landscape with a single deep well. This sequence has an overwhelming preference for the low energy fold at this well, and the sequence will be well-folded in real life.

Normally, we try to approximate the energy landscape of a sequence by folding the sequence into thousands and thousands of different structures, and calculating the energy of each one (details here). Even though this only gives us a partial view of the energy landscape, it is computationally intensive, and it takes some 10,000s of CPU hours to compute. (A big thanks goes to Rosetta@home volunteers for providing this CPU power!)

Because energy landscapes are expensive to compute, most protein design methods focus on just the design structure, and ignore the rest of the landscape. We only try to reduce the free energy of our design, and we cross our fingers that the energy landscape has no other energy wells. This is sometimes effective, but it can lead to an energy landscape that has multiple low-energy wells (which means the protein could fold into an unintended structure).

Ideally, we would like a design method that considers the entire energy landscape, but without requiring thousands and thousands of energy calculations.


Figure 1. Energy landscapes and trRosetta. (A) An energy landscape visualizes the energy (depth) across all different folds, or "conformations." Suppose that we want to design a protein with fold P. Most design methods optimize the free energy of fold P and arrive at sequence B (green). Since these methods are blind to the rest of the energy landscape, sequence B might have a landscape with alternative energy wells. A better design method would consider the entire energy landscape to produce sequence A (blue), which has a single low-energy well. (B) The trRosetta neural network takes an input sequence and makes predictions about how the residues will be oriented in the folded structure. This new work shows that trRosetta predictions serve as a good proxy for the energy landscape. The neural network can optimize the sequence to improve the match between the predictions and the desired structure, molding the landscape to favor our desired structure.

Neural networks and sequence likelihood

trRosetta (transform-restrained Rosetta) is a machine learning program developed after the breakthrough AlphaFold program (details in this blog post). The input for trRosetta is a 1D protein sequence, and the output is the predicted distance and orientation between every pair of residues in the 3D folded protein structure.

Previously, researchers at the Baker Lab showed that these distance and orientation predictions are good for protein structure prediction problems. The orientations help us generate a complete 3D model of the folded protein, which accurately shows how the protein will actually fold.

In the new paper, researchers turn trRosetta on its head to evaluate and design proteins. Rather than use the predictions to generate a structure for the input sequence, they compare the distance and orientation predictions to the intended structure, and calculate the sequence likelihood for that structure.

This sequence likelihood score tells us whether trRosetta thinks the design sequence is a good match for the design structure. If the intended distances and orientations of the design structure are a close match to the trRosetta predictions, then the sequence likelihood for that structure will be high. If the design structure is a poor match to the predictions, the sequence likelihood will be low.

Predicting energy landscapes

The researchers used sequence likelihood to show that trRosetta can predict useful information about the entire energy landscape of a protein sequence -- not just information about the preferred structure.

To show this, they used a dataset of energy landscapes for >4000 Foldit designs, which have been accumulated from several years of Foldit design puzzles. This dataset represents about 100 million CPU hours of energy landscape calculations! They divided this dataset into favorable and unfavorable energy landscapes.

First, they calculated the sequence likelihood just for the design structure. They found that the sequence likelihood of a design is a good predictor of whether a design has a favorable or unfavorable energy landscape. Importantly, trRosetta sequence likelihood was a much better predictor than just the Rosetta energy (or Foldit score) of the design. Since trRosetta takes just a couple minutes to run, this could cut down the need to run expensive landscape calculations!

Next, the researchers calculated sequence likelihoods for many different structures across the energy landscape of each design. They found that these likelihoods accurately reflect the shape of the landscapes.

For example, when they looked at a favorable energy landscape with a single energy well, they saw that models within the well had a high sequence likelihood, and models outside the well had low likelihood.

They also looked closely at a few special cases, where an energy landscape shows two competing energy wells. One of these wells represents the intended design fold, and the other well represents a decoy fold that is equally stable. We expect that a protein sequence with this kind of energy landscape is equally probable to fold into the design fold or the decoy fold. This is correctly reflected in the sequence likelihood scores, which are reduced for the design fold, and are comparable between design and decoy folds.


Figure 2. trRosetta predicts information about energy landscapes. (A) Histogram of sequence likelihood (left) and Rosetta energy (right) for 4200 Foldit designs. The distribution of favorable landscapes is shown in blue, and unfavorable landscapes in gray. There is significant overlap in the distributions of Rosetta energy, showing that Rosetta energy is a poor predictor of the whole energy landscape. Sequence likelihood is a better predictor, with less overlap between blue and gray distributions. (B) Energy landscape plots for Foldit designs, with color gradient showing the trRosetta sequence likelihood of models across the landscape. At the top, a landscape with a single well has very high sequence likelihood within the well. Below, landscapes with multiple wells have weaker, more dispersed likelihood. Cartoon illustrations show the design and decoy folds X and Y. On the right, example bimodal distributions show the “ambivalency” of trRosetta distance predictions when a landscape has two energy wells.

This is all well and good. We’ve seen that trRosetta is really useful for predicting theoretical energy landscapes, and can help us cut down on computational work. But does it actually reflect physical reality? A more stringent challenge would compare trRosetta against real experimental data from lab testing.

Last year we published the experimental testing results for 145 Foldit player designs. When the researchers checked this data, they found that trRosetta sequence likelihood was a good predictor of success in the lab!


Figure 3A-B. trRosetta predicts experimental testing results. (A) When we look at the testing results for 30,000 IPD-designed proteins, we see that trRosetta sequence likelihood correlates well with folding stability (as approximated by protease resistance). By contrast, Rosetta energy of the design is poorly correlated with this stability measure. (B) Histogram of sequence likelihood (left) and Rosetta energy (right) for 145 experimentally-tested Foldit designs. Successful designs are in blue, and failures in gray. Sequence likelihood is a better predictor and energy alone, with less overlap between the success and failure distributions.

Optimizing the energy landscape

Finally, the researchers put trRosetta to the test, to see if it could actually redesign proteins to have favorable energy landscapes.

From the 4000 Foldit designs, they selected a representative set of 200 models and used trRosetta to redesign their sequences. Remember that, in Foldit, the original designs were made to optimize the energy (the Foldit score) of just the target fold. Now, trRosetta is trying to optimize the entire energy landscape, which encompasses the energies of all possible folds.

The results were surprising: although trRosetta was good for eliminating decoys and coarsely sculpting the energy landscape, the resulting landscapes lacked a sharp, deep energy well that we like to see for a stable, well-folded protein design. Instead, a combination of trRosetta (optimizing the landscape) and traditional design (optimizing the design energy) yielded the best energy landscapes, with a single deep energy well.


Figure 3C-D. Redesigning proteins with trRosetta energy landscape optimization. (C) Example energy landscapes for two redesigned Foldit proteins. Redesign with trRosetta alone produces a landscape with a single shallow well, and Rosetta lowers the energy without favoring a single energy well. Combining both approaches gives a favorable energy landscape with a single deep energy well. (D) The quality of energy landscapes across all 200 redesigned proteins. The colored lines show how many redesigns (y-axis) meet a threshold for energy landscape quality (x-axis; increasingly stringent threshold). Traditional Rosetta redesign (green) is susceptible to low energy decoys, and less than 50% of redesigns pass the lowest threshold; however, Rosetta redesigns that do pass have very deep energy wells and also tend to pass higher thresholds. trRosetta (purple) improves landscapes that fail the low-quality threshold, but cannot achieve deep energy wells that meet a high-quality threshold. A hybrid approach, in magenta, achieves the best of both worlds.

What does this mean for Foldit?

In all Foldit design puzzles so far, we’ve seen that players are very good at optimizing the score of their designs. But the real challenge of protein design is how to account for the rest of the energy landscape, and we still haven’t found a good way to do this in Foldit.

Some players probably remember the 2018 Foldit Partition Tournament, which challenged players to explore the energy landscape of each others’ designs. That showed some promise, but still was time-consuming and low-throughput (we generated only 20 landscapes in 6 weeks).

trRosetta offers a fast alternative for predicting energy landscapes, and we may be able to combine it with normal Foldit scoring. trRosetta might be able to report the sequence likelihood of a Foldit solution, and even suggest mutations to improve its energy landscape.

One disadvantage with machine learning programs like trRosetta is that they are “opaque” and sometimes difficult to make sense of. We can’t really say why trRosetta makes certain suggestions, or ask which design features are causing problems. That could make it difficult to reconcile trRosetta suggestions with Foldit score components like clashing and H-bonding.

Another shortcoming of trRosetta is that it cannot suggest how to refold the protein backbone to improve an energy landscape. Some protein backbones are inherently more difficult to design than others (or even impossible). Finding designable backbones is an important aspect of protein design, and we think that’s a particular strength for Foldit players.

Still, trRosetta is clearly a useful tool for protein design, and we’ll be looking at ways to incorporate trRosetta into Foldit. Maybe players could find new and unexpected ways to use feedback from neural networks!

( Posted by  bkoep 102 694  |  Fri, 07/31/2020 - 21:00  |  7 comments )
3

Newsletter July 24: A Good Week for Go Science

Hey folders!

Dev Josh here with your weekly Foldit update. Congratulations to Go Science! for being the top of all three puzzles this week! Go Science has been an open and active group since 2010. One of the best ways to learn and improve in Foldit is to join a group.

If Go Science isn't your style, try the hopeful and determined Anthropic Dreams, the fun and light Gargleblasters, or the dedicated Contenders

Solutions from This Week's Puzzles

(Disclaimer: This is not scientific feedback; these solutions are not officially endorsed by the Foldit scientists.)

Puzzle 1863: Refinement R1043

I've heard this puzzle was crashing pretty frequently. Thanks for your patience everyone, the devs are hard at work trying to fix these issues!

Puzzle 1864: Symmetric Trimer Design: Limited Interface

To master this puzzle, you needed to limit how big your binding interface was. Notice how the top scores rotated their helical bundles to limit their attachments!

Puzzle 1865: Coronavirus Anti-inflammatory Design 8

Bkoep said there were 15 unsolvable BUNS, but some of the top solutions got them down to 11! Great job on satisfying those BUNS everyone, keep it up!

Want to know more about why we're designing binders from scratch? Check out this forum thread for details on why we're not just using the ACE2 receptor design.

Recipe of the Week

This week's recipe is new but with great potential:
mwm64's UnBun is designed to help you reduce BUNS. This recipe only works on puzzles with the BUNS objective, and I haven't personally tried it out much, but I've heard a few folks are trying it. Plus, if you're looking to get involved with recipe evolving, this simple recipe could be a good way to get some practice with Lua. Given how important the BUNS objective is, we're going to need more recipes like this! So thanks mwm64 for making the first de-BUN-ifier!

Player of the Week

A quick shout-out this week to malphis, a friendly newcomer who joined a couple of months ago and has been really active in chat. Malphis has also been super helpful submitting bug reports to help the devs track down issues. Thanks!

Art of the Week

Looking for some more protein beauty? Check out this beautiful proteins blog! It's got a ton of real proteins that are naturally amazingly beautiful.

Today’s Master Folding Tips

Beginner: Before trying to wiggle your designed protein into the perfect shape, give it a mutate first! This will help the protein pack together better and give you a cleaner structure to work with. You can also mutate by hand: for example, although all of your amino acids start as isoleucine, it's actually better to set your loops to asparagine to start with.

Intermediate: Have you learned how to use the Rama map yet? We're working on a few new guides that should help make it easier to learn, but in the meantime Susume has two guides on how to use the Rama map to fix un-ideal loops and even copy a loop

Expert: Are you planning your design before you make it? Before you start drafting, spend a few minutes thinking about what your design will look like. How long will each helix and sheet be? Will you try to make pi stacks? What part of the protein will bind at the interface, and how will that give it shape complementarity? Once you're ready, use Loci's AA Edit and SS Edit to enter your design and give it a quick early/midgame rinse. Then hand it off to a novice member of your group to evolve and try another design!

Have a tip to share with the community? Reply with your wisdom, or post on our Forums!

Until next time, happy folding!

( Posted by  agcohn821 102 1205  |  Wed, 07/29/2020 - 18:53  |  0 comments )
2

Foldit Newsletter July 17: Bonjour Encore Triple Hélice

Hey folders!

Dev Josh here with your weekly Foldit update.

3 Solutions from This Week's Puzzle

(Disclaimer: This is not scientific feedback; these solutions are not officially endorsed by the Foldit scientists.)

Puzzle 1861: Symmetric Trimer Design: Buried Unsats

Triple helix is here to stay, look how clean and neat these bundles are! Great job silent gene and Spvincent

This solution took a less common approach to the triple helix meta. I'll let you decide for yourself whether you think it scored well or not. What do you think of it? Let us know in the Discord!

Puzzle 1862: Coronavirus Round 13

An extra special congratulations goes to clark92 for being top rank for this puzzle! This up-and-coming folder only started folding at the end of February, and already they've taken the leaderboards by storm!

These solutions come from some of our beginner folders! Can you tell what they could do better?

As a reminder, here are some helpful tips from bkoep on designing a good binder!

Want to get your top solution featured in the weekly newsletter? Click the "Share with Scientists" button in the "Open/Share Solution" menu and your solution might get featured! Don't forget to fill out our username sharing form if you'd like your username to be shown with your solutions!

Recipe of the Week

Not sure what recipes are good? Check out this all-in-one recipe: Constructor by Grom!
This mini-cookbook contains 19 different recipes all packed in one. Check it out for some inspiration this week!

Player of the Week

Big thanks to nspc this week for putting out two new French tutorial videos on how to get started with design puzzles and prediction puzzles.

If you're still on the intro puzzles, nspc also has a video on beating Hydrophobic Disaster.

I think I speak for everyone when I say merci beaucoup! Nspc (pc on Discord) is a beginner folder who has been learning fast by being really active in the chat. Say hi next time you see them around!

Art of the Week

Here's some art from 1861: a cool-looking triangle and a crazy ball of... I don't even know what... Thanks for sharing!

Today’s Master Folding Tips

Beginner:
Despite how common they are, I really recommend trying a helix bundle like the ones you've seen from the top-scoring solutions! Helices are easier to make than loops or sheets, so practicing on helix bundles is a great way to get a higher rank and practice the basics before trying something tricky and advanced like long loops or a sheet structure.

Intermediate:
Are you paying attention to which structures your AA structure preferences. AAs prefer to be in? It's not a hard-and-fast rule, but check out the wiki for AA structure preferences. I find this especially helpful for getting started by mutating my isoleucines away into something more suitable for the structure I'm designing, like asparagines for loops, valines for sheets, and MALEK for helices.

Expert:
How many structural motifs can you name? Most of you know pi stacking, some of you even know about beta hairpins. But do you know about ST turns, Greek keys, and Omega loops? What about sequence motifs?

Having these concepts in your toolkit will give you more conceptual legos from natural proteins to think about when designing. There's plenty of research out there on common patterns, and if you're looking for expert tips, then you're ready to dig into real literature. Good luck, and let us know what you find on the
Discord

Want to give your group a shoutout in the next newsletter? Reply with a blurb about what your group is and why new players should join, and your group might get featured in the next newsletter!

Until next time! Happy folding!

( Posted by  agcohn821 102 1205  |  Fri, 07/24/2020 - 03:47  |  0 comments )
0

Newsletter July 10: Triple-Quad Helices and Borromean Rings

(This post was originally sent out on July 10 to our mailing list. You can sign up for the mailing list here to receive weekly updates about Foldit, including tips and tricks and see the top-scoring solutions to the week's puzzles. Don't forget to join our Discord as well to stay in the chat even when you're not folding!)

Hey folders!

Dev Josh here with your weekly Foldit update.

Solutions from This Week's Puzzles

(Disclaimer: This is not scientific feedback; these solutions are not officially endorsed by the Foldit scientists.)

Puzzle 1858: Symmetric Trimer Design

Personally, I went with a 4-helix design for this puzzle, and it seems like that's what a lot of the highest scoring solutions did. But there were also a couple of 3-helix designs, and even some sheets!

Puzzle 1860: Refinement R1040

The highest scoring solutions for this puzzle kept two medium-sized sheets lined up and folded the rest into short helices around a core.

Compare this to some of the intermediate solutions. Although these folds are okay, they had some minor problems: some loose helices and poor scoring ends.

What was the trickiest part for you about this puzzle? Let's talk about it in Discord!

Recipe of the Week

This week's recipe has been described by Phyx as "The Best Recipe of 2014": Wisky's Repeating Rebuild All!

Let this late-game recipe run for 3-4 hours and it will do some rebuilding magic on your pose.

Player of the Week

I want to honor LociOiling! for constantly being the #1 contributor to our wiki!. This week he created the pages for Reaction Design Puzzles and Camera Controls! If you've ever read a wiki page that was made in the last few years, chances are Loci wrote it. Give him your thanks in chat next time you use the wiki!

Art of the Week

This week's most beautiful fold comes from Formula350 for his Borromean rings! This would never fold up in real life, but wow, is it pretty!

Today’s Master Folding Tips

Beginner: Don't be afraid to reassign your secondary structures to different sheets and helices! While this might seem like you're "changing the puzzle," you're really just making a suggestion for what shape the protein should take, and this suggestion can help your other tools better serve you. Try a bunch of different secondary structure assignments and use Ideal SS on them afterward, then see how this new arrangement might be easier or harder to fold. Play around with it, Foldit is about experimenting!

Intermediate: If you haven't learned to use Backbone Pins yet, I highly recommend it. This tool, hidden away in the view options, gives you more control over wiggling than CI alone. A locked pin is similar to a ZLB, it will keep your wiggle locked to that spot, while moving everything else more.

Expert: Although it might seem like more hbonds means better binding, hbonds at the interface don't actually add to the strength of the bind, since they aren't much stronger than these atoms simply binding to water. What use are interface hbonds then? Their purpose is eliminating BUNS. The real strength of your binding comes from hydrophobic interactions, shown in the Hiding and Packing subscores, and your hbond network gives the bind its specificity.

Want to recommend a recipe of the week or have your solution featured in the next newsletter? Send us your cookbooks and screenshots, we'd love to see what you're up to!

Until next time, happy folding!

( Posted by  joshmiller 102 853  |  Tue, 07/14/2020 - 20:19  |  0 comments )
1

Newsletter July 3: Initial Reactions

(This post was originally sent out on July 3 to our mailing list. You can sign up for the mailing list here to receive weekly updates about Foldit, including tips and tricks and see the top-scoring solutions to the week's puzzles. Don't forget to join our Discord as well to stay in the chat even when you're not folding!)

Hey folders!

Dev Josh here with your weekly Foldit update.

This week we saw the introduction of the Reaction Design tool. The devs are working hard on polishing it up and making it more usable! As always, thanks for your feedback and bug reports. You can submit more feedback here.

Top Results from Puzzle 1856: Coronavirus Round 12

In this puzzle, I accidentally evo'ed on a broken developer build and got the top score. Whoops, sorry about that!
Here are some of the solutions at the top of the leaderboards. [A note from our scientists: the top of the leaderboards doesn't always mean the most scientifically useful. These highlights are not scientific feedback and are not officially endorsed as scientifically valid designs by the Foldit team.]

Join the mailing list to see what others are folding!

Recipe of the Week

This week's recipe is an oldie but a goodie from drjr. The recipe is called Reset, and it does what it says on the tin: reset to the best score, unfreeze the protein, remove all your bands, and set the CI to 1. A simple recipe, but a handy quality of life tool for when you just need to backtrack a little.

Player of the Week

Quick shoutout to argyrw for always being a friendly voice in chat! Say hi to her in global or veteran chat.

Today’s Master Folding Tips

Beginner: Are you still using Pull to draft your protein in the early game? Try making cutpoints and moving pieces around with the Move tool, it's so much easier! Don't forget to disable cutpoint bands in the Behavior tab, or they'll all come together again when you wiggle.

Intermediate: It can be really tempting mid-game to just switch to running recipes. But give some time to carefully inspect every acceptor and donor (the red and blue dots) to see what hydrogen bonds you can form, and manually mutate as needed. Not only will this lower your BUNS, but it'll help form a strong hbond network. The scientists love this, and your rank will too!

Expert: If you haven't already, read bkoep's blog on binder design metrics. DDG, SASA, and SC are going to become really important soon since we're looking to add objectives for them. So understanding and practicing these principles now can help you get a headstart on the competition! Use the protein design sandbox to try out some ideas.

Have a tip to share or a recipe to recommend? Reply with your suggestions or make a wiki page for your ideas! Reaction Design doesn't have a page yet, so if you understand this tool, help out your community by writing about it! (Since writing this post, LociOiling has graciously created the page for Reaction Design puzzles.)

Until next time, happy folding!

( Posted by  joshmiller 102 853  |  Mon, 07/06/2020 - 18:11  |  3 comments )
2
User login
Download links:
  Windows    OSX    Linux  
Windows
(7/8/10)
OSX
(10.12 or later)
Linux
(64-bit)

Are you new to Foldit? Click here.

Are you a student? Click here.

Are you an educator? Click here.
Social Media


Search
Only search fold.it
Other Games: Mozak
Recommend Foldit
Topics
Top New Users
Sitemap

Developed by: UW Center for Game Science, UW Institute for Protein Design, Northeastern University, Vanderbilt University Meiler Lab, UC Davis
Supported by: DARPA, NSF, NIH, HHMI, Amazon, Microsoft, Adobe, RosettaCommons