The AlphaFold prediction tool in Foldit
We are announcing a brand new Foldit feature that will enable players to use the revolutionary AlphaFold algorithm from DeepMind!
The AlphaFold feature is currently available for devprev users, and we expect to release it as a main update in the coming days. The AlphaFold feature is now available for all users in select Foldit puzzles.
AlphaFold v2.0 is an algorithm to predict the folded structure of a protein from its sequence, and was developed by the company DeepMind in 2020.
Previously, in the 2018 CASP competition for protein structure prediction, DeepMind had made a splash with their initial version of AlphaFold, outperforming dozens of research groups from around the world. The DeepMind group specializes in a type of algorithm called a neural network, and they showed that this type of algorithm held huge potential for the field of protein structure prediction. We wrote a blog post about the initial AlphaFold algorithm when DeepMind published it in January 2020.
After this initial success, DeepMind completely restructured their algorithm, and at the 2020 CASP competition they amazed the world with an even bigger leap forward. The new AlphaFold v2.0 is able to predict protein structures with astounding accuracy. The 2020 CASP results promised big advances for protein research, and the scientific community has been anxiously waiting for DeepMind to release the details about AlphaFold v2.0.
AlphaFold for protein design
AlphaFold is especially accurate for predicting natural proteins, where it can draw on the rich information in evolutionary patterns. But we’ve also found it to be very good at predicting the structures of designed proteins—even though these proteins have no evolutionary history. In fact, when we check against solved structures of designed proteins, we find that AlphaFold is usually more accurate than the design model itself!
Figure 2. Comparing the accuracy of AlphaFold predicted models and design models for 22 designed proteins with solved structures. The diagonal represents the line of equality. Points above the diagonal are cases where the AlphaFold prediction is more accurate than the design model.
We’ve also found that AlphaFold may be able to help us pick out designs that will fail lab testing. Whenever AlphaFold predicts a structure, the algorithm also produces a confidence value for the prediction. We see that AlphaFold tends to report a higher prediction confidence for successful protein designs.
In 2019, we tested 148 Foldit designs in the lab and found 56 were successful designs—a total success rate of about 38%. If we had rejected designs with AlphaFold confidence under 80%, then we still would have found 50 successful designs, with a success rate of over 60%!
A new Foldit feature
We are excited to announce a new Foldit feature that will let you get AlphaFold predictions for proteins you design in Foldit.
Certain puzzles will display a new DeepMind AlphaFold button in the Main Menu. This button opens up a dialog with a list of your saved solutions on the right-hand side. To request an AlphaFold prediction for a solution, select the solution and click the Upload for AlphaFold button. This will send your solution to the Foldit server and remotely run the AlphaFold algorithm.
A new solution will appear in the left-hand list and show the message “Pending…” while AlphaFold makes its prediction. It will take at least a few minutes to run, and the wait time may be longer depending on how busy the server is.
You will not be able to make a new AlphaFold upload while you have a submission currently pending. You may submit up to 5 concurrent jobs; if you currently have 5 AlphaFold uploads pending, you must wait for one to complete before making another submission. Click the Refresh Solutions button to check if your AlphaFold job is done.
When the AlphaFold algorithm has completed, the left-hand solution will display two values:
Confidence is AlphaFold’s own estimate about the accuracy of its prediction. Figure 3 above suggests that designs with higher confidence are more likely to fold successfully. Players should aim for confidence values of 80% or higher.
Similarity measures how closely the AlphaFold prediction matches your designed structure. If similarity is low, then AlphaFold has predicted that your design sequence will fold into a different shape than your designed structure.
To load the AlphaFold prediction into the Foldit puzzle, select the left-hand AlphaFold solution and click the Load button at the bottom of the dialog. Note that AlphaFold predictions may not score as well as solutions that have been optimized in Foldit. If you decide to work off of the AlphaFold solution, we recommend a quick Wiggle and Shake of the raw AlphaFold model.
The AlphaFold confidence and similarity values will not affect your Foldit score in any way. For the time being, the AlphaFold feature is simply a tool that you can use to get feedback about your solution, and to see how your design sequence is predicted to fold up.
Unlike typical Foldit tools, the AlphaFold algorithm runs remotely on an online server.
Normally, when you run Foldit on your computer, all of the Foldit computations are performed by your computer. If your internet connection fails in the middle of a puzzle, you can still continue to use all of the Foldit tools.
This AlphaFold feature is different, and the actual computations will run on a server hosted at the UW Institute for Protein Design (IPD). So, when you click the Upload for AlphaFold button, your solution is sent to the IPD server, which runs the AlphaFold algorithm and then sends the result back to your computer.
The biggest reason for this is that the AlphaFold algorithm is... big. Even the basic slimmed-down version requires several GB of disk space. If we wanted to distribute the AlphaFold software with Foldit, that would increase the download size of Foldit by 10x.
Another reason is that the AlphaFold algorithm runs much less efficiently on common CPUs than on GPUs, which many players may not have. If you ran AlphaFold on your CPU at home, it might take an hour to get a result back. However, if we use our GPUs at IPD, the actual processing will go much faster. Since most of our recent Science puzzles have had fewer than 100 active players at a time, we think that players can get results faster if we process AlphaFold jobs on our server GPUs.
This is an exciting time for the world of protein research! DeepMind has inspired other research groups, including the IPD, to explore similar kinds of neural network algorithms for protein structure prediction. As more researchers publish their findings and learn from one another, we can probably expect to see even more accurate algorithms in the future.
AlphaFold is already transforming the study of natural proteins, and has provided researchers with confident predictions of important proteins with unknown structures. But in the field of protein design, we are still learning how to make the best use of these advances. We hope that Foldit players will find the AlphaFold predictions helpful for designing creative new proteins!
Please note that the new AlphaFold feature is experimental, and it may change or even disappear in the future. Foldit is sharing the server GPUs with other research projects, and we may need to adjust our usage or develop new strategies for running GPU-heavy computations.( Posted by bkoep 71 390 | Sat, 07/31/2021 - 22:39 | 13 comments )
Experiment results for MERS-CoV binders
We have lab results for Foldit MERS-CoV binder designs! Several months ago, we challenged Foldit players to design proteins that could bind to the spike protein of the MERS coronavirus and block infection (this is similar to our previous challenge to bind the COVID-19 viral spike). After the puzzles ended, we used yeast-display FACS experiments to test the most promising Foldit designs and see if they stick to the MERS spike protein.
Long story short, this experiment did not reveal any successful binders. Read on for details about these designs and the lab experiments we used to test them, including some new ideas we used to boost our chances of success. Evidence suggests that this particular MERS-CoV spike protein is an especially difficult binding target. And this latest experiment highlights the challenge of the protein binder design problem.
Starting in October 2020, we ran 7 rounds of MERS-CoV Binder Design puzzles. Prior to these puzzles, we had challenges to design binders for the SARS-CoV-2 spike and the IL6 receptor. But these MERS-CoV puzzles were some of the first Foldit puzzles to award bonuses for binder metrics, like SASA and Shape Complementarity.
Since then, we’ve replaced SASA and Shape Complementarity with the new Contact Surface Objective, which is faster to run and seems to be a better predictor of binder success. We were only able to run one MERS-CoV puzzle with the Contact Surface Objective, but the results from that puzzle looked especially good. (In fact, we ended up testing more designs from that puzzle than any other puzzle in the series!)
After the round 7 puzzle closed, we ran some additional analysis on all of Foldit players’ solutions to select the most promising designs. Our selection criteria included binder metrics like DDG, Contact Surface, and BUNS. We also ran some other calculations, like secondary structure prediction, that indicate whether a design is likely to fold correctly.
In the end, we selected 59 solutions that we believed could bind to the MERS-CoV spike protein, and queued the designs for testing:
2010667_c0023 Bletchley Park,infjamc
2010727_c0156 Bruno Kestemont
2010727_c0545 Bruno Kestemont
2010727_c0840 Mike Lewis,Enzyme
2010816_c0034 Bruno Kestemont
Boosting Foldit binder designs
If you read our previous blog post about design throughput, you might recall that the success rate for binder design experiments is around 0.1%. We think the main source of failure is from designs not folding up exactly as designed. Even a tiny inaccuracy in folding can be ruinous for binding, for example if it creates a clash or an unsatisfied polar atom.
Researchers are working hard to improve our ability to select good binders before testing, so we can increase this success rate. But right now even the best-looking design only has a 1 in 1000 chance of correctly binding the target. This also means that we’d like to be testing at least 1000 designs in each experiment. So, even though Foldit players created 59 excellent binder designs, it is still unlikely that our experiment will reveal a successful binder from a batch of this size.
In order to boost our design numbers and make the most of Foldit players’ work, we used a new grafting technique to recombine Foldit binder designs with automated design scaffolds.
Using high-throughput folding experiments, scientists at the Institute for Protein Design (IPD) have accumulated a database with millions of automatically-designed proteins that seem to be well-folded in the lab. Even though these proteins don’t do anything (like bind a target), they are good starting points for further design, and serve as useful scaffolds for modification. For many protein design projects at the IPD, scientists prefer to start from one of these well-behaved protein scaffolds rather than try to design a new fold from scratch.
From each of the 59 parent Foldit designs, we extracted the portion that makes the most binding interactions with the target. Then we looked to see if we could computationally graft that portion onto the scaffolds in our database. This method is finicky, and the graft has to match the scaffold backbone very closely for it to have any chance of working. Even though there are millions of proteins in the scaffold database, some Foldit designs cannot be matched to any scaffold.
This technique lets us recycle Foldit-designed interfaces into many unique designs with different folds and different sequences. By recombining the Foldit designs with the scaffolds, we were able to multiply our 59 parent designs into 873 grafted designs.
For good measure, we also redesigned each of the 59 parent designs using the IPD’s latest machine-learning algorithm. Typically, this does not change the parent design drastically, but it still provides a little bit more sequence diversity for the experiment. That brought us to a total of 989 designs to test for binding against the MERS-CoV spike protein.
Below is a preview of the data. You can download the data for all 989 designs here.
pdb_id counts1 counts2 counts3 counts4 counts5 counts6 ddg contact_surface BUNS 2010629_c0001 37 0 0 0 0 0 -44.673 401.919 3 2010629_c0107 4 0 0 0 0 0 -56.559 567.339 7 2010629_c0138 0 0 0 0 0 0 -41.261 422.217 6 2010629_c0143 34 0 0 0 0 0 -42.075 430.841 6 2010629_c0407 2 0 0 0 0 0 -44.809 424.300 7 2010629_c0998 1 0 0 0 0 0 -39.433 356.197 4 2010629_c1072 0 0 0 0 0 0 -36.672 440.411 5 2010629_c1101 0 0 0 0 0 0 -37.413 381.733 7 2010667_c0002 4 0 0 0 0 0 -43.552 481.502 3 ...
- Enrichment at 1000 nM target
- Enrichment at 1000 nM target
- Binding at 1000 nM target
- Binding at 200 nM target
- Binding at 40 nM target
For details about the binding experiment and how to interpret these numbers, see previous blog posts here and here. To recap: yeast cells display our designs on their surface, and we do successive rounds of sorting to collect yeast cells that appear to stick to the target. After each round of sorting, we use DNA sequencing to track which designs were collected. A high number indicates that we collected many yeast cells that appear to bind the target with our design.
Generally speaking, if a design successfully binds the target, we expect to see steady high numbers for that design across all six rounds of the experiment. An unsuccessful design will have decreasing numbers and eventually drop out of the sorting rounds.
Unfortunately, none of our 989 designs appeared to bind to the MERS-CoV spike protein.
A difficult target
In parallel with the Foldit designs, IPD scientists also tested about 30,000 designs that were created with an automated design method. From those 30,000 designs, only 11 showed any binding--and only weak binding at that. That translates to a success rate considerably lower than the usual 0.1%, and hints that the MERS-CoV spike protein is an especially difficult target.
The 11 IPD binders all have especially high Contact Surface values (around 500 or greater). This was a bit surprising, since previous data had suggested a Contact Surface value of 400 can be sufficient for good binding. This new data will help us improve our binder design puzzles in Foldit, and in the future we’ll be challenging Foldit players to strive for binder designs with even stronger binder metrics!( Posted by bkoep 71 390 | Sun, 05/09/2021 - 01:06 | 0 comments )
Competition results for influenza HA binder design
Friday, March 26 was the last day of our Influenza HA binder design competition! After Puzzle 1968 closed, we collected all of the solutions that were shared with scientists and tallied the valid submissions from each player.
The final rankings
LociOiling - 43 designs
CharlieFortsConscience - 32 designs
ucad - 21 designs
Dudit - 20 designs
spvincent - 10 designs
Bruno Kestemont - 10 designs
nspc - 8 designs
BootsMcGraw - 7 designs
silent gene - 7 designs
ichwilldiesennamen - 6 designs
akaaka - 5 designs
Enzyme - 5 designs
Galaxie - 5 designs
robgee - 3 designs
dcrwheeler - 3 designs
zippyc137 - 3 designs
Anfinsen_slept_here - 2 designs
OWM3 - 2 designs
irk-ele - 2 designs
NinjaGreg - 1 design
georg137 - 1 design
martinzblavy - 1 design
Jpilkington - 1 design
grogar7 - 1 design
alcor29 - 1 design
stomjoh - 1 design
Blipperman - 1 design
Norrjane - 1 design
phi16 - 1 design
infjamc - 1 design
sgeldhof - 1 design
blazegeek - 1 design
Congratulations to LociOiling, who submitted an astounding 43 designed binders for influenza HA!
What did we learn from this competition?
To recap, the aim of this competition was to trial an experimental reward system that encourages players to create the greatest number of quality designs, rather than focus on creating the single highest-scoring design (as in normal Foldit puzzles).
We think this could be a way to make Foldit more effective for protein design research problems, because Foldit is currently limited by design throughput (not by the quality of top-scoring designs). Optimizing for the highest Foldit score works well for protein prediction problems, but the problem of protein design is not so straightforward; a higher-scoring design is not always better. In addition, there is a secondary concern that competitive players tend to optimize solutions so tenaciously that late-game refinement exceeds the limits of our score function.
The competition puzzle was set up to mirror the previous Puzzle 1962: Influenza HA Binder Design: Round 3. Both puzzles used the same score function and Objectives. The only difference between the two puzzles was a scoring offset of 7,500 points (so a 10,000 point competition solution is equivalent to a 17,500 point solution in Puzzle 1962), and the competition puzzle ran for two weeks instead of just one. Using Puzzle 1962 as a control, we can look at the competition results to answer the two big questions about our experimental reward system:
1. Does the competition reward system actually increase throughput?
2. Are competition submissions still high-quality solutions?
Let’s start with question #2.
Are competition submissions still high-quality solutions?
Yes, competition designs appear just as promising as designs from regular puzzles.
This was largely enforced by rule #1 of the competition, which set a threshold of at least 10,000 points for all valid submissions. Foldit scientists chose this threshold based on the results of the previous Puzzle 1962. It seemed 10,000 points could be achieved only if you were able to satisfy most of the Objectives and also attain a reasonable base score.
Note that 10,000 points is still a very high bar for this puzzle, and most of the soloists in Puzzle 1968 were unable to reach this score. All of the players to reach this level have been playing Foldit for at least 6 months, and many of them are experienced veterans. (Bravo to akaaka, who joined Foldit in September 2020--the “youngest” Foldit player to submit a valid competition solution!)
We should also clarify that many solutions below the 10,000 point threshold are still scientifically valuable and will be analyzed by Foldit scientists as possible candidates for lab testing. The 10,000 point threshold does not represent a cutoff for “scientifically useful” solutions. Rather, past this threshold we think further optimization is not very helpful, and a player could contribute more to research by working on another solution.
So, we know that all of the valid submissions scored at least 10,000 points, which should correspond to promising designs. But let's spot check a couple of values to be certain they are reasonable...
Among valid solutions, the worst DDG value was -32.4 kcal/mol, and the worst Contact Surface value was 336. While these values do fall short of their targets (DDG < -40; Contact Surface > 400), these are still promising numbers that could indicate a successful binder. The majority of submissions met the targets for both of these difficult binder design Objectives.
This gives us confidence that the 10,000 point threshold was stringent enough to ensure that all submissions were high quality designs. Note that Foldit scientists will still run additional analyses on these solutions before selecting designs for lab testing.
Does the competition reward system actually increase throughput?
Yes, players created quality designs at almost triple the rate of a normal puzzle.
After any Foldit puzzle closes, we comb through all the puzzle solutions to pull out distinct designs, using protein sequence and structural alignment to sort out duplicate and unfinished solutions. After the competition puzzle ran for two weeks, we identified 242 distinct solutions with at least 10,000 points (this includes solutions from players who opted out of the competition and played Puzzle 1968 normally). By contrast, in one week our “control” Puzzle 1962 yielded 43 distinct protein designs above the equivalent score threshold. Accounting for the difference in puzzle duration, this works out to a rate increase by a factor of 2.8x.
This is a good sign! It indicates that Foldit does have the capacity for greater design throughput, and that a tweak to our reward system could make Foldit more effective for research in protein design. However, the experimental system used here may still need some adjustments...
Was the “puzzle reset” rule effective against duplicated work?
Mostly. But there were several instances where a player, after submitting a solution, restarted the puzzle and rebuilt almost the exact same solution from scratch!
The puzzle reset rule was intended to force players to make multiple distinct designs. Without this rule, we were afraid that each player would make only a single 10,000 point solution, and then repeatedly submit it with trivial changes. In effect, this would boost their competition standing without actually making a meaningful scientific contribution.
Nevertheless, there were some cases where a player submitted two valid solutions with almost the exact same sequence and structure, even though they were designed completely independently after a puzzle reset. This strategy circumvents the purpose of the puzzle reset rule. If we want a reward system that accurately reflects the scientific contribution of each player, we will need to make some changes to the system used in this competition.
A successful experiment
Congratulations again to our champion LociOiling and all of the players who participated in the competition!
One thing that is still missing from this analysis is player feedback. We invite all players (participants and observers) to leave a comment below with your thoughts about this competition. Was gameplay significantly different than in normal puzzles? Did you enjoy it more or less? Do you have suggestions that would make this kind of competition more fun, or more productive?
Keep up the great folding, and practice your binder design skills in the latest Puzzle 1973: Tie2 Binder Design: Round 1!( Posted by bkoep 71 390 | Sun, 03/28/2021 - 20:59 | 18 comments )
Influenza HA binder design competition
We are announcing a special competition for the newest binder design puzzle! We are challenging players to design as many binders as possible for influenza hemagglutinin (HA).
Unlike puzzle rankings, your competition ranking will NOT be determined by your best score in the puzzle. Instead, the winner of the competition will be the soloist player that submits the greatest number of valid solutions before the puzzle closes March 26 at 23:00 GMT.
There are two rules for a valid submission:
1. The solution must have a score greater than 10,000.
2. You must reset the puzzle for each submission.
Rule #2 means that each submission must be restarted from scratch, and no work may be shared between submissions. Foldit keeps track of each solution's history, and we will reject multiple submissions that come from a common "intermediate" solution. Loading a saved solution or clicking on the Undo Graph will NOT reset the solution history. You must use the Reset Puzzle button to begin each new submission from scratch.
To participate in the competition, simply submit each 10,000 point solution using the Upload for Scientist button in the Save Menu, and include the word “submission” somewhere in the upload title. For logistical reasons, we will only consider soloist solutions in the special competition. Evolved solutions from two or more players will not count as valid submissions.
The competition rankings and submissions will be showcased in a special blog post after the competition ends. The winner will be highlighted in the April 2021 Lab Report, where bkoep will take a close look at the designs from the winning player.
Note that Puzzle 1968: Influenza HA Binder Design Competition will also function like a regular puzzle. If you do not want to participate in the special competition, the puzzle will still reward points as usual, based on your best score when the puzzle closes.
The backstory: Protein design throughput
This competition will serve as a kind of experiment for Foldit, as we think about ways to make Foldit more effective for scientific research.
Currently, one of the big problems facing protein design in Foldit is throughput. We simply aren't generating enough designs to test in the lab. For a typical binder design experiment, we can expect a success rate of about 0.1% for binders that satisfy all of our binder metrics. That means we need to test thousands of designs in order to find a hit, and a typical Foldit puzzle only produces a couple hundred designs.
At the same time, we suspect that a lot of late-game optimization in Foldit design puzzles is wasted effort, and this work may not actually improve the final protein design. We’ve noticed that, after initial construction and refinement of a protein design, many players resort to heavy-duty scripts that run for days on end, making tiny changes to squeeze out the last few points and get to the top of the puzzle leaderboards. If that late-game optimization does not lead to higher-quality designs, then we would like to redirect that effort towards new designs.
In the past, we've experimented with the Move Limit Objective as a possible approach to this problem. The Move Limit prevents players from spending time running heavy-duty optimization scripts, because these scripts will quickly burn through the allotted moves. We had hoped this would refocus player efforts toward multiple puzzle attempts.
While this seems to be moderately effective, the Move Limit has some problems. There's no strong incentive to actually restart a puzzle once you hit the Move Limit. It's also difficult to calibrate the actual number of moves that should be allotted, since different players with different play styles will naturally require different numbers of moves to make a good protein design.
A different approach
A more radical, but more direct, approach is to revise the overall reward system in Foldit (at least for protein design problems) to encourage multiple solutions for each puzzle. In this kind of system, the goal of competitive players (make many good designs) would be better aligned with the goal of Foldit scientists (test many good designs). So, instead of awarding points based only on your best score, perhaps we should award points for multiple high-scoring solutions.
This competition will serve as a kind of pilot experiment for such a reward system, where rankings reflect the number of solutions contributed to each research problem. We’ll be looking to see how this system impacts puzzle results, and whether it has any unintended effects on gameplay. (Of course, we also hope this competition will produce lots of binder designs for influenza HA!)
The competition will remain open for two weeks. Players will have until March 26 to create as many 10,000 point solutions as possible. Play Puzzle 1968: Influenza HA Binder Design Competition now!
Edit: See the followup blog post for the final results of this competition!( Posted by bkoep 71 390 | Fri, 03/12/2021 - 23:48 | 5 comments )
Two-sided protein interface design
This blog post explains some of the background science behind recent Two-sided Interface Design puzzles, like Puzzle 1963. IPD scientist Ryan Kibler elaborates on the goals of these puzzles, how they might be used, and the special challenges we face when designing proteins that bind each other to form organized protein assemblies.
Protein assemblies in nature
A major theme of recent Foldit puzzles is designing symmetric proteins. Through playing these puzzles, you have no doubt realized that symmetry allows you to build a large protein assembly by designing a single chain to bind with itself. Nature has apparently realized this, too. About 63% of the different kinds of proteins naturally produced by E. coli bacteria exist as symmetric homo-oligomers (“assemblies of the same chain”).1
Hetero-oligomers (“assemblies of different chains”) are much rarer but usually serve important biological functions. A great example of this is the LSm family of proteins. In humans and other eukaryotes, seven different chains assemble into a hetero-heptameric (“seven different parts”) LSm ring. These LSm rings have many functions, but they often involve binding to RNA and acting as hand-holds for other proteins to grab and make modifications to the RNA, or carry it from place to place.
Curiously, LSm proteins are found in all types of organisms (eukaryotes, archaea, and bacteria) but they don’t all form hetero-heptameric rings. In archaea they form homo-heptameric (“seven of the same part”) rings, where multiple copies of the same protein bind each other. Since eukaryotes are thought to have evolved from archaea, this suggests an interesting evolutionary tale where the gene encoding the homo-heptameric LSm protein in archaea was duplicated and diversified to the point where each one of the seven proteins prefers to assemble with different partners rather than with identical copies of itself.
Designing protein assemblies at the IPD
Designing hetero-oligomers with more than a few chains is a difficult task because there is a lot that can go wrong when you design so many interfaces simultaneously. Instead, scientists at the IPD have taken inspiration from this evolutionary story of LSm and are taking a different, incremental approach. Rather than designing all the interfaces at once to make a hetero-oligomer in one step, we have broken the problem down into smaller sub-problems. We start with a single homo-oligomer and redesign both sides of the interface many different times, then make sure the interface forms correctly through experimentation, and finally we recombine the interfaces into a single protein. Using this strategy, we have already successfully transformed a homo-trimer into a hetero-trimer.
Figure 1. A schematic diagram shows how we create hetero-oligomers from a homo-oligomer starting structure. We can break the problem down into smaller pieces, and start by creating lots of different homo-oligomers. After we test the different homo-oligomers in the lab, we can recombine their different interfaces to create hetero-oligomers.
The two keys to this strategy are: (1) keeping the protein backbone fixed, except for the parts that make up the interface, so that the different interfaces can be copied and pasted onto the same starting structure; and (2) making the interfaces as diverse as possible in order to prevent unintended off-target assembly between the wrong interfaces.
Key #1 is easily accomplished by freezing the non-interface region to hold the backbone in place. We have been using the same alpha helical backbone for all interfaces, allowing only small tweaks to the starting structure. But key #2 is harder. So far, we have relied on H-bond networks to prevent unintended off-target assembly. This works because if two wrong chains attempt to bind, their non-matching arrangements of polar residues become energetically unfavorable BUNS and prevent binding. Only when the two correct chains come together, the H-bond network forms and produces a stable interface with zero BUNS.
Assemblies with increased complexity
On our path to increasingly more complicated hetero-oligomers, we are now trying to make a hetero-tetramer (“four different parts”). The more chains there are in an assembly, the harder it is to make them bind in the correct arrangement, because there are exponentially more chances for off-target assemblies. We believe we will need more than H-bond networks to prevent off-target assemblies.
One very good way of further diversifying the interfaces is to make them physically different by changing the protein backbone. Like bkoep mentioned in Lab Report #17, scientists’ designs use mostly alpha helices at the interfaces, because we have a good handle on how to generate alpha helical backbones, and we have a good understanding of how to design sequences for alpha helices. Unfortunately, we’ve observed that interfaces which are made entirely out of alpha helices are prone to off-target assembly with similar-looking alpha helical interfaces. If we want to get a large number of specific interfaces, it will not be enough to simply make small tweaks to our starting alpha helices.
Two-sided interface design puzzles
This is where Foldit comes in. Foldit players are extremely good at making diverse protein backbones, so we’re challenging you to redesign the interfaces of our homo-tetramers, in a series of Two-sided Interface Design puzzles!
The most recent and most promising starting structure is a homo-tetramer called RC4_20, a circular tandem repeat protein (known internally as a “donut”). This protein was originally designed at the IPD by Phil Bradley, PhD,2 and later modified by Alexis Courbet, PhD, to have a larger interface, making its tetrameric form more stable and also easier to see under an electron microscope.
Figure 2. This homo-tetramer "donut" protein was designed by IPD scientists. We can use it as a starting point and create a hetero-tetramer by redesigning the interface between chains. Highlighted in red are the frozen stubs that were provided in the starting structure of Puzzle 1959.
To work within the constraints of key #1 (making sure that different interfaces can be copied and pasted onto the same starting structure), our Foldit puzzles start with a section that is frozen and a section that you can refold. Keep in mind that this is only a small part of the larger assembly (Figure 2). To convert the homo-oligomer RC4_20 into a hetero-oligomer, we simply swap out the original interface with the different interfaces designed in Foldit.
Initial design results
We’ve been happy to see many players attempting to install beta sheets at the interface. This is exactly the sort of diversity we were looking for! The design below, from an anonymous player, has a beta sheet with the outer edge pointing out to surrounding water and the inner edge placed against the interface. The polar atoms on the inner edge can participate in a H-bond network if they are satisfied by polar residues from chain B. Unfortunately, this design didn’t satisfy all of these backbone polar atoms. In future puzzles, we would love to see players use the edge strand of a beta sheet as part of an H-bond network!
Figure 3. A creative design with a beta sheet at the interface. A variety of backbone shapes will help us create well-behaved assemblies that can avoid off-target binding. The highlighted inner edge strand has polar atoms along the backbone that are buried at the interface. Those polar atoms would be more stable as part of an H-bond network.
Another feature we like to see is tight packing between both starter alpha helices on chain A. While it’s hard to tell from the truncated starting structure we gave you, making good contacts with both of these alpha helices is important to maintain structural rigidity. If your refolded interface only contacts one of these alpha helices, that alpha helix could act like a hinge and cause your interface to swing around and not be in the correct position to bind the other half. The solution below from LociOiling has good packing between both alpha helices of the starting stub and an alpha helix backing up the interface. Also, we like the placement of a key tryptophan buried at the interface. It would be really great to see a H-bond network sprout out of that sidechain nitrogen in future designs!
Figure 4. A design with great sidechain packing by LociOiling. On the left, the highlighted alpha helix packs closely against both alpha helices on the chain A starting stub, and also backs up the alpha helices that actually contact chain B. On the right, a buried TRP sidechain is buried at the interface for tight binding, but the polar N atom is unsatisfied and would like to make a H-bond.
In the most recent Two-sided Interface Design puzzles, we’ve asked Foldit players to install H-bond networks across the interface, in addition to refolding the backbone, to make the interfaces as different as possible and avoid off-target assembly. We’ve also asked for one disulfide bond across the interface, and that’s because we know that a disulfide is important for encouraging assembly of this particular RC4_20 donut.
In solutions from Puzzle 1959, the disulfides you’ve designed look really good, but the H-bond networks need a little work. We’ve noticed that the H-bond networks are usually not very big, and often are at the protein surface, near the water that surrounds the protein. Instead, H-bond networks should be buried in the core of the interface where they can be most effective at discouraging off-target assembly. Remember, the polar network residues will form BUNS if they try to bind against a chain with non-matching polar residues. If you want some inspiration for making effective H-bond networks, you can review some of the published H-bond networks designed by IPD scientists.3
Thank you all for your solutions to these puzzles! We’re accumulating a list of designs that we plan to test in the wet lab in the coming months. But there’s still a lot of room, so keep up the good work! Check out the latest Puzzle 1963: Two-sided Interface Design now!
1. Goodsell, D. S. & Olson, A. J. Structural symmetry and protein function. Annu. Rev. Biophys. Biomol. Struct. 29, 105–153 (2000)
2. Correnti, C. E. et al. Engineering and functionalization of large circular tandem repeat protein nanoparticles. Nat. Struct. Mol. Biol. 27, 342–350 (2020)
3. Boyken, S. E. et al. De novo design of protein homo-oligomers with modular hydrogen-bond network-mediated specificity. Science 352, 680–687 (2016)