Two-sided protein interface design
This blog post explains some of the background science behind recent Two-sided Interface Design puzzles, like Puzzle 1963. IPD scientist Ryan Kibler elaborates on the goals of these puzzles, how they might be used, and the special challenges we face when designing proteins that bind each other to form organized protein assemblies.
Protein assemblies in nature
A major theme of recent Foldit puzzles is designing symmetric proteins. Through playing these puzzles, you have no doubt realized that symmetry allows you to build a large protein assembly by designing a single chain to bind with itself. Nature has apparently realized this, too. About 63% of the different kinds of proteins naturally produced by E. coli bacteria exist as symmetric homo-oligomers (“assemblies of the same chain”).1
Hetero-oligomers (“assemblies of different chains”) are much rarer but usually serve important biological functions. A great example of this is the LSm family of proteins. In humans and other eukaryotes, seven different chains assemble into a hetero-heptameric (“seven different parts”) LSm ring. These LSm rings have many functions, but they often involve binding to RNA and acting as hand-holds for other proteins to grab and make modifications to the RNA, or carry it from place to place.
Curiously, LSm proteins are found in all types of organisms (eukaryotes, archaea, and bacteria) but they don’t all form hetero-heptameric rings. In archaea they form homo-heptameric (“seven of the same part”) rings, where multiple copies of the same protein bind each other. Since eukaryotes are thought to have evolved from archaea, this suggests an interesting evolutionary tale where the gene encoding the homo-heptameric LSm protein in archaea was duplicated and diversified to the point where each one of the seven proteins prefers to assemble with different partners rather than with identical copies of itself.
Designing protein assemblies at the IPD
Designing hetero-oligomers with more than a few chains is a difficult task because there is a lot that can go wrong when you design so many interfaces simultaneously. Instead, scientists at the IPD have taken inspiration from this evolutionary story of LSm and are taking a different, incremental approach. Rather than designing all the interfaces at once to make a hetero-oligomer in one step, we have broken the problem down into smaller sub-problems. We start with a single homo-oligomer and redesign both sides of the interface many different times, then make sure the interface forms correctly through experimentation, and finally we recombine the interfaces into a single protein. Using this strategy, we have already successfully transformed a homo-trimer into a hetero-trimer.
Figure 1. A schematic diagram shows how we create hetero-oligomers from a homo-oligomer starting structure. We can break the problem down into smaller pieces, and start by creating lots of different homo-oligomers. After we test the different homo-oligomers in the lab, we can recombine their different interfaces to create hetero-oligomers.
The two keys to this strategy are: (1) keeping the protein backbone fixed, except for the parts that make up the interface, so that the different interfaces can be copied and pasted onto the same starting structure; and (2) making the interfaces as diverse as possible in order to prevent unintended off-target assembly between the wrong interfaces.
Key #1 is easily accomplished by freezing the non-interface region to hold the backbone in place. We have been using the same alpha helical backbone for all interfaces, allowing only small tweaks to the starting structure. But key #2 is harder. So far, we have relied on H-bond networks to prevent unintended off-target assembly. This works because if two wrong chains attempt to bind, their non-matching arrangements of polar residues become energetically unfavorable BUNS and prevent binding. Only when the two correct chains come together, the H-bond network forms and produces a stable interface with zero BUNS.
Assemblies with increased complexity
On our path to increasingly more complicated hetero-oligomers, we are now trying to make a hetero-tetramer (“four different parts”). The more chains there are in an assembly, the harder it is to make them bind in the correct arrangement, because there are exponentially more chances for off-target assemblies. We believe we will need more than H-bond networks to prevent off-target assemblies.
One very good way of further diversifying the interfaces is to make them physically different by changing the protein backbone. Like bkoep mentioned in Lab Report #17, scientists’ designs use mostly alpha helices at the interfaces, because we have a good handle on how to generate alpha helical backbones, and we have a good understanding of how to design sequences for alpha helices. Unfortunately, we’ve observed that interfaces which are made entirely out of alpha helices are prone to off-target assembly with similar-looking alpha helical interfaces. If we want to get a large number of specific interfaces, it will not be enough to simply make small tweaks to our starting alpha helices.
Two-sided interface design puzzles
This is where Foldit comes in. Foldit players are extremely good at making diverse protein backbones, so we’re challenging you to redesign the interfaces of our homo-tetramers, in a series of Two-sided Interface Design puzzles!
The most recent and most promising starting structure is a homo-tetramer called RC4_20, a circular tandem repeat protein (known internally as a “donut”). This protein was originally designed at the IPD by Phil Bradley, PhD,2 and later modified by Alexis Courbet, PhD, to have a larger interface, making its tetrameric form more stable and also easier to see under an electron microscope.
Figure 2. This homo-tetramer "donut" protein was designed by IPD scientists. We can use it as a starting point and create a hetero-tetramer by redesigning the interface between chains. Highlighted in red are the frozen stubs that were provided in the starting structure of Puzzle 1959.
To work within the constraints of key #1 (making sure that different interfaces can be copied and pasted onto the same starting structure), our Foldit puzzles start with a section that is frozen and a section that you can refold. Keep in mind that this is only a small part of the larger assembly (Figure 2). To convert the homo-oligomer RC4_20 into a hetero-oligomer, we simply swap out the original interface with the different interfaces designed in Foldit.
Initial design results
We’ve been happy to see many players attempting to install beta sheets at the interface. This is exactly the sort of diversity we were looking for! The design below, from an anonymous player, has a beta sheet with the outer edge pointing out to surrounding water and the inner edge placed against the interface. The polar atoms on the inner edge can participate in a H-bond network if they are satisfied by polar residues from chain B. Unfortunately, this design didn’t satisfy all of these backbone polar atoms. In future puzzles, we would love to see players use the edge strand of a beta sheet as part of an H-bond network!
Figure 3. A creative design with a beta sheet at the interface. A variety of backbone shapes will help us create well-behaved assemblies that can avoid off-target binding. The highlighted inner edge strand has polar atoms along the backbone that are buried at the interface. Those polar atoms would be more stable as part of an H-bond network.
Another feature we like to see is tight packing between both starter alpha helices on chain A. While it’s hard to tell from the truncated starting structure we gave you, making good contacts with both of these alpha helices is important to maintain structural rigidity. If your refolded interface only contacts one of these alpha helices, that alpha helix could act like a hinge and cause your interface to swing around and not be in the correct position to bind the other half. The solution below from LociOiling has good packing between both alpha helices of the starting stub and an alpha helix backing up the interface. Also, we like the placement of a key tryptophan buried at the interface. It would be really great to see a H-bond network sprout out of that sidechain nitrogen in future designs!
Figure 4. A design with great sidechain packing by LociOiling. On the left, the highlighted alpha helix packs closely against both alpha helices on the chain A starting stub, and also backs up the alpha helices that actually contact chain B. On the right, a buried TRP sidechain is buried at the interface for tight binding, but the polar N atom is unsatisfied and would like to make a H-bond.
In the most recent Two-sided Interface Design puzzles, we’ve asked Foldit players to install H-bond networks across the interface, in addition to refolding the backbone, to make the interfaces as different as possible and avoid off-target assembly. We’ve also asked for one disulfide bond across the interface, and that’s because we know that a disulfide is important for encouraging assembly of this particular RC4_20 donut.
In solutions from Puzzle 1959, the disulfides you’ve designed look really good, but the H-bond networks need a little work. We’ve noticed that the H-bond networks are usually not very big, and often are at the protein surface, near the water that surrounds the protein. Instead, H-bond networks should be buried in the core of the interface where they can be most effective at discouraging off-target assembly. Remember, the polar network residues will form BUNS if they try to bind against a chain with non-matching polar residues. If you want some inspiration for making effective H-bond networks, you can review some of the published H-bond networks designed by IPD scientists.3
Thank you all for your solutions to these puzzles! We’re accumulating a list of designs that we plan to test in the wet lab in the coming months. But there’s still a lot of room, so keep up the good work! Check out the latest Puzzle 1963: Two-sided Interface Design now!
1. Goodsell, D. S. & Olson, A. J. Structural symmetry and protein function. Annu. Rev. Biophys. Biomol. Struct. 29, 105–153 (2000)
2. Correnti, C. E. et al. Engineering and functionalization of large circular tandem repeat protein nanoparticles. Nat. Struct. Mol. Biol. 27, 342–350 (2020)
3. Boyken, S. E. et al. De novo design of protein homo-oligomers with modular hydrogen-bond network-mediated specificity. Science 352, 680–687 (2016)
2020 Snowflake Challenge Results
Dev Josh here with the results of our 2020 Snowflake Challenge! A big thanks to everyone who participated, I hope you had as much fun making these snowflakes as I had admiring them!
This challenge was a sequel to our 2014 Snowflake Challenge, and this year y'all crushed it again! Like last time, I tried to have each dev pick a favorite, but some of these snowflakes were so impressive, they got picked multiple times. Let's start with the runner-ups:
These are already so great, thanks to everyone who submitted a snowflake!
Next up, our Bronze winners:
Coming up next, our Silver winners, who received multiple mentions from the team:
The self-described "Summer Snowflake" by AlkiP0Ps:
Last but not least, our Gold winners!
This marvel by Vincero had my vote among others. Vincero provided a title and description for their own work:
Finally, for the very top favorite of the team and the winner of this year's Snowflake Challenge, we congratulate fiendish_ghoul for this gorgeous snowflake.
Thanks again to everyone who particated! This was a ton of fun and we hope to do it again soon!
Until next time, happy folding!( Posted by joshmiller 70 555 | Tue, 01/19/2021 - 18:22 | 2 comments )
Coronavirus Designable Linker Puzzles
This week we are introducing a brand new Designable Linker Puzzle! This kind of puzzle involves two or more protein domains that are fixed in space, and players are challenged to link them with a rigid, well-folded linker that preserves the orientation of the starting domains.
Linking Coronavirus Spike Binders
Puzzle 1912b is the first puzzle of this type. This example is particularly special, as we are asking you to link two of the best known SARS-CoV-2 spike binders. These computationally-designed binders came from scientists at the Institute for Protein Design, and currently exhibit some of the best binding affinities for any known SARS-Cov-2 spike binder. The original binders are currently being developed for possible COVID-19 tests or therapeutics.
It took a large number of supercomputing hours to generate these binders, and less than 0.1% of those that were tested showed any binding affinity for the target. This goes to show just how hard binder design is! You can read more about these binders in this previous blog post. With these binders now in hand, we want to see how much we can improve them.
A model of how two designed proteins can bind the SARS-CoV-2 spike. The spike chains are shown in green, magenta and cyan. Colored in salmon are two designed binders, LCB1 and LCB3. The binders have been truncated and augmented with helices to bring their termini closer together.
The starting structure of Puzzle 1912 has a linker connecting two frozen α-helix bundles. These α-helix bundles are truncated parts of two spike binders designed by scientists at the IPD, LCB1 and LCB3. The puzzle also includes small sections of the target spike protein, although we don't need to make any more binding interactions with the spike. The goal of the puzzle is to design a rigid linker that keeps the binders in the starting orientation.
The binding affinity measures the tightness of binding between two chains, and is directly related to the change in free energy between the bound and unbound states (also known as DDG, described previously). If we can find a rigid linker that holds the two binders in a fixed orientation, we can roughly double the DDG to significantly increase the binding affinity of the linked binder.
The loopy linker in the starting structure won’t work because it will be too flexible in solution. The two binder domains will flop around and behave independently, like two separate binders. However, if the linker were well-folded and rigid, the two binder domains could behave like a single protein with double the binding surface.
The starting structure for Puzzle 1912. The helical binding domains and small portions of the spike are frozen. In green is the designable linker that needs to be folded.
How to Score Well
Designed linkers with lots of secondary structure (sheets or helices) will score better and be more likely to do the job. We're looking for linkers that hold the binders in the proper orientation with more rigidity than a flexible alanine chain. We are using a few objectives to encourage well-folded linkers, including the Core Exists and SS Design Objectives. The BUNS Objective is also active on the linker.
We look forward to seeing how Foldit players solve this problem! Promising designs may be tested at the IPD for improved binding to the spike. A tighter binder would be especially useful for detecting small amounts of coronavirus in a fast and sensitive diagnostic test.
The Future of Designable Linker Puzzles
Rigid linker design is an outstanding problem in protein design. It is made especially difficult by the fact that the connected domains are constrained to their starting position, and the designed linker cannot clash with other chains nearby (like the binding target).
Scientists have been trying to develop computational methods to design rigid linkers from scratch, but have not had much success. They suffer from limitations that don't apply to Foldit players, and we think that human ingenuity and hands-on problem solving might be the answer to this problem.
Check out the Designable Linker: Coronavirus Spike Binder puzzle now!
Happy Folding!( Posted by neilpg628 70 1855 | Tue, 11/03/2020 - 21:43 | 1 comment )
Update on Aflatoxin Challenge
The Siegel Lab is back again with an update on the Foldit Aflatoxin Challenge!
We were really thankful for all of your designs and even gave a few back to you as prediction-style puzzles in Rounds 14-17. These puzzles challenged you to predict the apo structure of the designs -- the protein structure when aflatoxin is not present. We have some news regarding our results, now that we have transferred them from in silico to in vitro!
Prediction of apo structures
When we compiled all of your puzzle solutions from Round 13 and narrowed it down to the most promising entries, we wanted to do a pilot study of a few of the most radical designs to determine how well the laccase starting protein would handle large, structural changes.
Many of these designs expressed and were active in our reporter assay, which we were happy to see; however, all had lost the ability to degrade aflatoxin. We believed that the active site was changed in a way that didn’t allow aflatoxin to fit for catalysis. Your solutions for our apo structure puzzles readily confirmed this.
Below, the original player design is shown in green, with the other colors depicting the best scoring player apo solutions. We can see that these top-scoring apo structures are crowding the position where aflatoxin is supposed to sit. Clearly aflatoxin would have a hard time fitting in those active sites, which was very consistent with what we were seeing in our assays.
New testing results
Using this information we ordered 56 player designs from Puzzle 1739 and tested them all using our high throughput methods. Fortunately, several had activity on aflatoxin and all of these were grown and assayed in larger scale to ensure accuracy.
Of the approximately 20 active designs, 3 were found to be the most active and highest expressing, making them excellent candidates for more design! Two of these come from LociOiling, and the third is a design by Phyx.
We want to understand how the active sites for these 3 designs may look when aflatoxin is not present, so we are releasing new apo prediction puzzles based on these designs in the coming weeks. We hope you will give these puzzles a try and help us in the next step of the Aflatoxin Degradation Project!bkoep 70 476 | Tue, 10/06/2020 - 21:52 | 0 comments )
Introducing Foldit Metrics
Foldit Metrics are a new kind of Objective. They will appear in the Objectives dropdown, under the score panel at the top of the screen.
Just like normal Objectives, Metrics calculate useful properties of your solution, and can award bonuses that boost your score. However, Metrics are different from other Objectives in that they are much slower to compute.
Normally we like to ensure that Foldit can calculate your score (the base Foldit score plus all Objectives) in less than 30 milliseconds -- brief enough that it appears to your brain as "immediate." That way Foldit can constantly update your score in real time as you fold your protein.
However, some kinds of protein calculations simply can’t be completed in that time. Anything that takes more than about 100 milliseconds would cause a noticeable delay, and Foldit gameplay would become frustratingly “choppy” as the scoring struggles to keep up with your folding. We’ve developed Foldit Metrics as a way to handle these slow calculations without interrupting regular gameplay.
Our latest devprev update includes support for Metrics, and we’ve posted a non-competitive puzzle for devprev users to try out the new features. After some time in devprev, we will release Metrics in a main update so we can start using them in our Science Puzzles.
Puzzles with Metrics will behave a little differently than other puzzles. Below we describe the Metrics features and discuss the new challenges they bring to Foldit gameplay.
Hand-folding with Metrics
Since Metrics are too slow to compute in real time, Foldit runs them in the background. Whenever you make a substantial change to your solution, the Metrics will start calculating in the background, while the rest of Foldit continues to respond to mouse clicks and keystrokes.
Until the calculation completes, your score at the top of the screen will be greyed out and will not update. When all Metric calculations are completed, the score will update and regain its usual color.
You can continue folding your solution while the Metrics are calculating.
When the Metrics finish, the calculations will automatically restart for your latest solution. Note that the Metrics will skip over any intermediate solutions, so you don’t have to worry about accumulating a backlog of Metrics to slog through. [CORRECTION: Metrics will continuously calculate in the background. When a calculation is complete, it will be permanently associated with the solution in case you want to go back.] If you want to see the Metrics for your current solution, you can just stop folding, and the Metrics should catch up in a second or two. If you don’t want to wait a second or two for score updates, you can disable Metrics while you are hand folding. [CORRECTON: You can disable Metrics while you are hand folding but your score will keep updating without the metric score while the metrics are running either way.] While a Metric is disabled, your score will update in real time like in regular puzzles, but the score will be invalid. To trigger a one-time Metric calculation while it is disabled, click the “Run” button next to the Metric.
Using recipes with Metrics
Existing recipes will ignore the new Metrics by default. You can run any normal recipe in a Metrics puzzle, and it should run just as fast as in any other puzzle.
This comes with an important caveat:
Existing Lua functions like current.GetScore do not include Metrics bonuses.
That means that the value returned by current.GetScore may not match the competitive score at the top of your screen. And the value returned by creditbest.GetScore may not match your competitive score on the Foldit leaderboards. Recipes will need to be modified to support Metrics.
In order to get your competitive score in a recipe, you will need to add together the value of current.GetScore and metric.GetBonusTotal. But be careful -- accessing Metric bonuses in a recipe can drastically increase the recipe’s run time! Every time you access a Metric bonus in a recipe, the recipe stops to wait for the Metric to compute.
Metrics are distinct from filters in Foldit recipes, and have separate Lua functions. Functions like filter.DisableAll will have no effect on Metrics, and filter.GetNames will not return the names of any Metrics. Our first release includes three new Lua functions for Metrics:
Return type: table
Description: Returns a table containing the names of all metrics in the puzzle.
Parameters: string name [only names of metrics are recognized, others produce Lua errors]
Return type: number
Description: Triggers the (slow) computation of the named metric. Blocks computation of the script until the metric is finished computing, then returns the metric score.
Return type: number
Description: Triggers the (slow) computation of all metrics. Blocks computation of the script until all metrics are finished computing, then returns the sum of all metric scores.
Learning to play with Metrics
It will take some time for us to figure out the best way to use Metrics in Foldit. We think that they will help us produce better solutions in Science Puzzles, but this has to be balanced with gameplay and fair competition in Foldit.
Compared to the base Foldit score, Metrics are much slower to compute, but the good news is that we don’t think they need to be calculated as frequently. Although we’d like to strive for solutions with decent Metrics, we don’t necessarily want to grind away at them to squeeze out tiny gains.
Likewise, we don’t want to invest too much importance in Metrics. The Foldit base score is still our primary tool for judging solutions, although we know from lab experiments that some Metrics have informative thresholds.
For example, we’ve seen that most successful binder designs tend to have a shape complementarity (SC) Metric > 0.60. However, it’s not clear that increasing SC beyond this threshold is helpful, and we certainly don’t want to sacrifice other design features (like a well-packed, hydrophobic core) for good SC.
With this in mind, we’ll be starting with Metrics that award a flat bonus at a threshold value. [NOTE: We are also trying out metrics that award increasing bonuses UP TO a threshold]. For example, we may award a set bonus for a binder with SC of at least 0.60, but you will not get a bigger bonus for increasing SC further than that. Once you find an initial solution that comfortably meets the threshold, we hope that you can turn off the Metric and only check it periodically while you optimize other features of your solution.
Beyond that, we’re not sure about the best strategies for folding with Metrics! Scientists traditionally use them to weed out poor designs from big batches, but never spend time tweaking those designs to improve their Metrics. This is an experiment and we don’t know where it will lead.
We’ll be counting on players for feedback about what works and what doesn’t. Please don’t hesitate to leave us feedback or suggestions, or to ask questions in the comments below!
Devprev users can check out the new Metrics now in the [DEVPREV] LCB1 Binder with Metrics puzzle.( Posted by bkoep 70 476 | Thu, 10/01/2020 - 09:02 | 6 comments )