Foldit design update - Part 1
It's been a long time since our last update on Foldit protein design! Here we lay out some recent progress and highlight the latest improvements in proteins designed by Foldit players.
Local Backbone Quality
Unlike designed α-helical bundles, which Foldit players have mastered with relative ease, the design of α/β folds has proven to be more problematic. For some time, we've suspected that the crux of the problem lies in unfavorable local backbone conformations. In particular, we found that the α/β proteins designed by Foldit players seemed to have loops that are never observed in natural proteins.
The Ideal Loop Filter, which was introduced last June, has helped Foldit designs remarkably. And in subsequent updates spanning the last several months we've seen further improvement in the backbone quality of Foldit-designed proteins. The box plot below shows the average local deviation from natural protein backbones in top-scoring Foldit designs. (Imagine breaking up each designed protein into 9-residue fragments, for each fragment searching natural proteins for a fragment with a similar backbone, and then measuring the RMSD to the closest match. If every backbone fragment of a design has a close match in a natural protein, that design should have a low mean RMSD; if there are regions of the design that have an unusual backbone, the design will have a higher mean RMSD.)
You can see that backbone quality in Foldit designs improved significantly after imposing the Ideal Loop Filter; disabling Rebuild; adjusting the IdealizeSS torsions; and introducing the Blueprint Panel. The dotted line marks a reference value from successful Baker lab designs; all designed proteins from Koga et al. fall below that line. In the latest design puzzles with the Blueprint Panel, we see that most high-ranking Foldit designs also fall below that line.
Rosetta@home Folding Funnels
The improvement in Foldit backbones is reflected in other types of analysis. With the improved backbones, Rosetta@home is better able to predict the structure of Foldit designs from their amino acid sequences (explained here).
Below is a set of 14 Foldit player designs that were successfully folded by Rosetta@home—all but one originate from puzzles using the Ideal Loop Filter. The strong "funnel" shape of each plot indicates not only that Rosetta is able to sample the intended fold (note the numerous red points with RMSD < 2 Å), but also that Rosetta predicts the intended structure to be the most stable. Compare these folding funnels to those of earlier α/β designs.
mimi, Mark- (Contenders) — Puzzle 1245
Bletchley Park, Mark- (Contenders) — Puzzle 1248
tokens, Galaxie (Anthropic Dreams) — Puzzle 1251
tokens, Galaxie (Anthropic Dreams) — Puzzle 1257
tokens (Anthropic Dreams) — Puzzle 1257
dcrwheeler — Puzzle 1263
fiendish_ghoul — Puzzle 1285
gitwut(Contenders) — Puzzle 1290
Bletchley Park, Cyberkashi, Mark- (Contenders) — Puzzle 1294
Hollinas, Bruno Kestemont, Scopper (Go Science) — Puzzle 1294
tokens (Anthropic Dreams) — Puzzle 1294
fiendish_ghoul — Puzzle 1297
fiendish_ghoul — Puzzle 1299
retiredmichael (Beta Folders)— Puzzle 1299
Each of the designs above has been reverse-transcribed into synthetic DNA, which is inserted into E. coli and expressed in our lab for further testing (read more about lab testing here). However, in the list above I've omitted four particularly promising designs that are already showing encouraging results. Next week we'll post a follow-up with more information about those designs, alongside some brand new experimental data.
A big thank you is due to all the Foldit players who have been designing proteins every week! We're learning a lot about protein design from your contributions, and credit goes to all participants—not just to those players acknowledged above. We appreciate your patience and persistence as we experiment with new tools and filters. Keep up the great folding!( Posted by bkoep 172 3336 | Tue, 02/28/2017 - 19:34 | 7 comments )
Sci Chat Roundup (Developer Edition)
We had a number of development style questions during our last open call for science chat questions, and as such, weren’t able to prioritize those during our last busy chat. We still wanted to get these answered for everyone, so here they are.
Will a parallel programming language, such as CUDA or OpenCL, ever be used to optimize processing speed in Foldit? Source question
Most of the heavy-weight processing is done inside of Rosetta, so if and when it is added, we can consider it. Aside from the technical problems, this also introduces potential social problems. If the benefits of using these platforms are meaningful, it could make a high end graphics card a requirement for competing. While this is also true for CPUs, CPU performance is not as varied or expensive compared to GPUs. We don't want a situation where your ability to compete is determined by your graphics card.
Will Foldit ever be open source? Currently, only Rosetta is open source, not Foldit. Source question
Unfortunately, this is unlikely. Please note that Rosetta itself is not open source, either.
The descriptions of remix and rebuild in jflat06's blog post raise some questions about how things work. First, if rebuild always works with a fragment length of 3, what does a rebuild of length 2 do? (A certain recipe defaults to starting with length 2.) Source question
In this case, rebuild is not actually inserting a fragment at all. It just places a cut, and then does a loop closure to close it again.
Could someone comment on the internal code:
LocalWiggleSequence in GUI versus structure.LocalWiggleSelected() in LUA:
Are they the same internally?
Or what would be the equivalent LUA function for this GUI one?
Extract from chat on 2017-01-13:
22:09 Wbertro TomTaylor5: converted "Wiggle by sheets" to LUA
22:09 TomTaylor5 Great
22:10 Wbertro but the code does not give the same results as the GUI one
22:10 Wbertro however it does not crash the client at the end
22:11 TomTaylor5 What would you like to hear first? The good news or the bad news?
22:11 Wbertro I think the GUI LocalWiggleSequence code is NOT the same as the LUA LocalWiggle code
22:11 TomTaylor5 That was probably the function I couldn't find.
22:12 Wbertro but the 1322 puzzle I tested them against is so sensitive that two runs don't give the same result twice
22:12 Wbertro I don't think it is a missing function
22:13 Wbertro it is simply DIFFERENT internal code, it seems
22:13 Wbertro I think I will ask on the next chat
They are using the same underlying procedure, but they differ in how they call it. The GUI script weirdly wiggles the residues sequentially, one at a time.
Thanks for the questions, everyone! We hope these additional answers help.( Posted by jflat06 172 5442 | Thu, 02/23/2017 - 19:05 | 7 comments )
Protein folding pathways
This blog post addresses another question that was neglected in our last Science chat:
Is there any pathway for natural folding? – Bruno Kestemont
This an excellent question, but unfortunately it does not have a simple answer. The folding pathway—sometimes discussed as "folding kinetics"—describes how an unfolded protein transitions to its native fold over the course of time. In general, folding pathways are poorly understood, but it is an area of active research (in fact, our very own David Baker started off studying the kinetics of protein folding in the '90s!).
Most of us working with Foldit or Rosetta do not think much about folding pathways (as one colleague put it, "Who cares?"). We lean heavily on the assumption that a chain of amino acids will naturally adopt its lowest-energy structure (see Anfinsen's dogma), and we don't worry too much about the path required to get there. In other words, we're more interested in how a protein system behaves at equilibrium; exactly how the system reaches equilibrium is another matter. Coincidentally, I am not an expert in folding kinetics, but I can touch on the main points.
Most people agree that strong, local interactions will form first (e.g. the short-range hydrogen bonds that stabilize α-helices and β-hairpins); and weak, nonlocal interactions will form more slowly (e.g. β-strand pairings between distant residues, interactions between pre-folded domains, etc.).
Many small proteins seem to fold via a concerted, two-state mechanism. You might imagine that such a protein is translated completely by the ribosome, and exists briefly as a random coil in solution before collapsing all-at-once into a stable fold. We observe such proteins in only two states: either completely unfolded or completely folded. This is the most likely scenario for the types of small proteins (<150 residues) that are encountered in Foldit puzzles.
Larger proteins seem to follow more complex, multi-state folding pathways. In some cases, we can actually observe multiple populations of a protein that exists in various, discrete stages of “foldedness.” Many of these proteins even fold co-translationally in the cell, so that the N-terminus of the protein might be completely folded before the ribosome finishes translating the C-terminus. In fact, there is evidence that certain genes have evolved “brake” regions in their mRNA, which actually slow down the ribosome at certain points during translation so that the N-terminus has a chance to fold before the C-terminus is translated.
If you want to know more (and we hope you do), I strongly recommend this review article by Dill et al. It is a clearly-written overview intended for readers outside of the field. And, like any good review, it includes many pages of references for more curious readers.
More about the Blueprint Tool
Foldit players posed several great questions about the Blueprint tool for our last Science chat, but we didn’t have time to answer all of them. We're long overdue for an in-depth explanation about the Blueprint tool, but it seems that players are finding the tool useful and we'd like to share more about it's mechanics. In particular, we hope this blog post can shed some light on the following question:
It has been pointed out that removing Blueprint tool constraints towards the end allows for substantial score improvement. Why is this, as it seems counterintuitive? – gitwut
Before I answer this question, I’d like to offer a little more background on the Blueprint tool:
There are two motivations behind the Blueprint tool: The first is simply to make “ideal loops” more accessible to players. The Ideal Loop Filter has helped Foldit designs tremendously, and the recent top-scoring designs have all had excellent loops. However, it seemed that players were required to do a lot of work in order to satisfy that filter. Hopefully, the Blueprint tool has made it easier (especially for beginners) to satisfy the Ideal Loop Filter.
The second motivation for developing the Blueprint tool is to provide an alternative design process. Some of us suspect that bad Foldit backbones are the result of aggressive loop building in middle- or late-game strategies. For example, suppose you're designing a protein and decide to form the loops last: by the time you build loops, you may have already cemented your helices and sheets into place and optimized the core packing of your protein, and as a result the backbone does not have a lot flexibility for rebuilding loops. The endpoints of two neighboring beta strands may be positioned such that there is no stable loop to bridge them. Aggressively using Rebuild/Remix to force a loop between incompatible endpoints is akin to hammering a square peg into a round hole. It will be impossible to close the loop without compromising the geometry of the backbone. We had hoped the Blueprint tool could be used early in the design process to quickly construct a "healthy" rough draft of a design, which could be gradually optimized without compromising the backbone geometry.
BuildingBlock Torsion Constraints
To answer gitwut's question, BuildingBlocks include torsion constraints. Torsion constraints force a residue to a certain region of the Ramachandran Map—much like Rubber Bands (which represent distance constraints) force two residues to be a certain distance from one another. When constraints are present, Wiggling a solution will not produce points as quickly, but the solution will try to follow the constraints. Broadly speaking, constraints allow us to redirect Wiggle toward a desired result, usually sacrificing short-term gains to find an ultimately better model.
Placing a BuildingBlock loop onto the Blueprint Panel introduces torsion constraints to the loop residues (likewise, removing the BuildingBlock removes the constraints). The torsion constraints are intended to preserve the BuildingBlock loop while a player develops the rest of his or her design. Constraints are needed in this case because the Foldit energy function does not necessarily favor the BuildingBlock loops. In fact, we don't fully understand why the BuildingBlock loops are so prevalent in natural proteins. These loops may be favored for reasons that are not explicitly modeled in Foldit—like folding kinetics, or more complex entropic effects. (In contrast, helices and sheets are naturally stabilized by hydrogen bond forces, which are captured by the Foldit energy function.) Without the torsion constraints, Wiggle is prone to obliterate the BuildingBlock loop in favor of more short-sighted energy gains. We intended that players might keep the constraints around to preserve the BuildingBlock loops until a design-in-progress has settled into a mature fold—only then removing the constraints for late-game refinement.
To make things even more complicated, note that we've manually adjusted how BuildingBlocks are applied through the Blueprint Panel. That is, when you drag a BuildingBlock onto the Blueprint Panel and the protein backbone snaps into place, this initial "adjusted" form is only a rough approximation of the loop's optimal form. When you Wiggle the loop, the torsion constraints will drag the backbone to its optimal shape, which may be slightly different from initial adjusted shape (this is particularly noticeable for β-hairpins BuildingBlocks). This is because the BuildingBlock loops are derived from native proteins, which never have perfectly ideal helices and sheets. If you were to apply the optimal BuildingBlock loops to Foldit's ideal beta strands, the ideal beta strands would not align to form hydrogen bonds (Figure A, above). In order to make the tool more user-friendly, we adjusted the optimal BuildingBlocks so that the hairpin loops would be compatible with Foldit's ideal sheets. Thus, a BuildingBlock hairpin will initially snap two ideal strands into perfect alignment (Figure B); and subsequent Wiggling will allow the beta strands to flex slightly, so that the BuildingBlock loop can relax into its optimal form (Figure C).
As an aside, some astute Foldit players have noticed that the BuildingBlocks collection is missing a BAAB β-hairpin, which is a stable loop frequently found in nature. As it turns out, this loop induces significant deformation of the adjacent beta strands. As much as we tried, we were unable to adjust the BAAB BuildingBlock so that it would be reasonably compatible with Foldit's ideal beta strands, and that particular loop was omitted from the BuildingBlocks collection.( Posted by bkoep 172 3336 | Mon, 01/30/2017 - 20:33 | 7 comments )
Tuberculosis Challenge – Alternate Target
Tuberculosis (TB) is a disease that affects millions of people. We have posted a protein drug target puzzle previously on this topic. In our continued effort to make a dent in this disease, we have also partnered with the Sacchettini lab at Texas A&M University to post another drug target puzzle for TB.
The Sacchettini lab is working in collaboration with other groups on understanding biology and virulence factors of tuberculosis bacteria. The ability of Mycobacterium tuberculosis, which causes TB, to survive inside the host depends on sensing the environment and launching appropriate responses to stimuli. This means that specific protein production levels are strictly controlled and tuned. The machinery and the players of this carefully orchestrated battle against our immune system are poorly characterized. In general, protein production levels could be regulated on multiple levels and by different means. One of the ways involves small non-coding RNA molecules which aid in efficient translation of some mRNAs into proteins and the degradation of others. In pathogenic bacteria specifically, the regulation of production of the proteins required for virulence and intracellular survival has been shown to depend on small RNAs. Reviewed here (Oliva G., Sahr T., Buchrieser C. (2015). Small RNAs, 5’ UTR elements and RNA-binding proteins in intracellular bacteria: impact on metabolism and virulence. FEMS Microbiol. Rev. 39 331–349. :
To carry out their missions, small RNAs require protective chaperon protein – Hfq. Specifically, the protein structure adopts an Sm like fold composed of 6 subunits forming a homo-hexameric ring. Hfq and Sm proteins have been identified in numerous bacteria, yet no known homologs have been annotated in Mycobacterium tuberculosis genome. Through careful examination of secondary structure patterns predictions of the Mycobacterium tuberculosis proteome, Rv3208A has been proposed as a possible Hfq candidate.
If we were able to solve the structure, it would mean that we learn about machinery which has been shown to be important for virulence in other pathogens but is not characterized in Mycobacterium yet. By targeting this RNA chaperon protein, instrumental to any small RNA mediated responses, scientists can prevent Mycobacterium tuberculosis from survival inside human host.
Right now, the protein has been crystallized and diffraction data are available, but none of the models that scientists have created have helped to solve the phase and build the structure. By posting this protein, we are hoping that everyone can come up with a model that will help resolve the structure. As always, we are committed to publishing the work and sharing models created by Foldit players. Lets make a dent in TB!( Posted by free_radical 172 15968 | Tue, 11/29/2016 - 19:29 | 6 comments )