Competition results for influenza HA binder design

Friday, March 26 was the last day of our Influenza HA binder design competition! After Puzzle 1968 closed, we collected all of the solutions that were shared with scientists and tallied the valid submissions from each player.

The final rankings

LociOiling - 43 designs
CharlieFortsConscience - 32 designs
ucad - 21 designs
Dudit - 20 designs
spvincent - 10 designs
Bruno Kestemont - 10 designs
nspc - 8 designs
BootsMcGraw - 7 designs
silent gene - 7 designs
ichwilldiesennamen - 6 designs
akaaka - 5 designs
Enzyme - 5 designs
Galaxie - 5 designs
robgee - 3 designs
dcrwheeler - 3 designs
zippyc137 - 3 designs
Anfinsen_slept_here - 2 designs
OWM3 - 2 designs
irk-ele - 2 designs
NinjaGreg - 1 design
georg137 - 1 design
martinzblavy - 1 design
Jpilkington - 1 design
grogar7 - 1 design
alcor29 - 1 design
stomjoh - 1 design
Blipperman - 1 design
Norrjane - 1 design
phi16 - 1 design
infjamc - 1 design
sgeldhof - 1 design
blazegeek - 1 design

Congratulations to LociOiling, who submitted an astounding 43 designed binders for influenza HA!

What did we learn from this competition?

To recap, the aim of this competition was to trial an experimental reward system that encourages players to create the greatest number of quality designs, rather than focus on creating the single highest-scoring design (as in normal Foldit puzzles).

We think this could be a way to make Foldit more effective for protein design research problems, because Foldit is currently limited by design throughput (not by the quality of top-scoring designs). Optimizing for the highest Foldit score works well for protein prediction problems, but the problem of protein design is not so straightforward; a higher-scoring design is not always better. In addition, there is a secondary concern that competitive players tend to optimize solutions so tenaciously that late-game refinement exceeds the limits of our score function.

The competition puzzle was set up to mirror the previous Puzzle 1962: Influenza HA Binder Design: Round 3. Both puzzles used the same score function and Objectives. The only difference between the two puzzles was a scoring offset of 7,500 points (so a 10,000 point competition solution is equivalent to a 17,500 point solution in Puzzle 1962), and the competition puzzle ran for two weeks instead of just one. Using Puzzle 1962 as a control, we can look at the competition results to answer the two big questions about our experimental reward system:

1. Does the competition reward system actually increase throughput?
2. Are competition submissions still high-quality solutions?

Let’s start with question #2.

Are competition submissions still high-quality solutions?

Yes, competition designs appear just as promising as designs from regular puzzles.

This was largely enforced by rule #1 of the competition, which set a threshold of at least 10,000 points for all valid submissions. Foldit scientists chose this threshold based on the results of the previous Puzzle 1962. It seemed 10,000 points could be achieved only if you were able to satisfy most of the Objectives and also attain a reasonable base score.

Note that 10,000 points is still a very high bar for this puzzle, and most of the soloists in Puzzle 1968 were unable to reach this score. All of the players to reach this level have been playing Foldit for at least 6 months, and many of them are experienced veterans. (Bravo to akaaka, who joined Foldit in September 2020--the “youngest” Foldit player to submit a valid competition solution!)

We should also clarify that many solutions below the 10,000 point threshold are still scientifically valuable and will be analyzed by Foldit scientists as possible candidates for lab testing. The 10,000 point threshold does not represent a cutoff for “scientifically useful” solutions. Rather, past this threshold we think further optimization is not very helpful, and a player could contribute more to research by working on another solution.

So, we know that all of the valid submissions scored at least 10,000 points, which should correspond to promising designs. But let's spot check a couple of values to be certain they are reasonable...

Among valid solutions, the worst DDG value was -32.4 kcal/mol, and the worst Contact Surface value was 336. While these values do fall short of their targets (DDG < -40; Contact Surface > 400), these are still promising numbers that could indicate a successful binder. The majority of submissions met the targets for both of these difficult binder design Objectives.

This gives us confidence that the 10,000 point threshold was stringent enough to ensure that all submissions were high quality designs. Note that Foldit scientists will still run additional analyses on these solutions before selecting designs for lab testing.

Does the competition reward system actually increase throughput?

Yes, players created quality designs at almost triple the rate of a normal puzzle.

After any Foldit puzzle closes, we comb through all the puzzle solutions to pull out distinct designs, using protein sequence and structural alignment to sort out duplicate and unfinished solutions. After the competition puzzle ran for two weeks, we identified 242 distinct solutions with at least 10,000 points (this includes solutions from players who opted out of the competition and played Puzzle 1968 normally). By contrast, in one week our “control” Puzzle 1962 yielded 43 distinct protein designs above the equivalent score threshold. Accounting for the difference in puzzle duration, this works out to a rate increase by a factor of 2.8x.

This is a good sign! It indicates that Foldit does have the capacity for greater design throughput, and that a tweak to our reward system could make Foldit more effective for research in protein design. However, the experimental system used here may still need some adjustments...

Was the “puzzle reset” rule effective against duplicated work?

Mostly. But there were several instances where a player, after submitting a solution, restarted the puzzle and rebuilt almost the exact same solution from scratch!

The puzzle reset rule was intended to force players to make multiple distinct designs. Without this rule, we were afraid that each player would make only a single 10,000 point solution, and then repeatedly submit it with trivial changes. In effect, this would boost their competition standing without actually making a meaningful scientific contribution.

Nevertheless, there were some cases where a player submitted two valid solutions with almost the exact same sequence and structure, even though they were designed completely independently after a puzzle reset. This strategy circumvents the purpose of the puzzle reset rule. If we want a reward system that accurately reflects the scientific contribution of each player, we will need to make some changes to the system used in this competition.

A successful experiment

Congratulations again to our champion LociOiling and all of the players who participated in the competition!

One thing that is still missing from this analysis is player feedback. We invite all players (participants and observers) to leave a comment below with your thoughts about this competition. Was gameplay significantly different than in normal puzzles? Did you enjoy it more or less? Do you have suggestions that would make this kind of competition more fun, or more productive?

Keep up the great folding, and practice your binder design skills in the latest Puzzle 1973: Tie2 Binder Design: Round 1!

( Posted by  bkoep 85 714  |  Sun, 03/28/2021 - 20:59  |  17 comments )

Influenza HA binder design competition

We are announcing a special competition for the newest binder design puzzle! We are challenging players to design as many binders as possible for influenza hemagglutinin (HA).

Unlike puzzle rankings, your competition ranking will NOT be determined by your best score in the puzzle. Instead, the winner of the competition will be the soloist player that submits the greatest number of valid solutions before the puzzle closes March 26 at 23:00 GMT.

There are two rules for a valid submission:
1. The solution must have a score greater than 10,000.
2. You must reset the puzzle for each submission.

Rule #2 means that each submission must be restarted from scratch, and no work may be shared between submissions. Foldit keeps track of each solution's history, and we will reject multiple submissions that come from a common "intermediate" solution. Loading a saved solution or clicking on the Undo Graph will NOT reset the solution history. You must use the Reset Puzzle button to begin each new submission from scratch.

To participate in the competition, simply submit each 10,000 point solution using the Upload for Scientist button in the Save Menu, and include the word “submission” somewhere in the upload title. For logistical reasons, we will only consider soloist solutions in the special competition. Evolved solutions from two or more players will not count as valid submissions.

The competition rankings and submissions will be showcased in a special blog post after the competition ends. The winner will be highlighted in the April 2021 Lab Report, where bkoep will take a close look at the designs from the winning player.

Note that Puzzle 1968: Influenza HA Binder Design Competition will also function like a regular puzzle. If you do not want to participate in the special competition, the puzzle will still reward points as usual, based on your best score when the puzzle closes.

The backstory: Protein design throughput

This competition will serve as a kind of experiment for Foldit, as we think about ways to make Foldit more effective for scientific research.

Currently, one of the big problems facing protein design in Foldit is throughput. We simply aren't generating enough designs to test in the lab. For a typical binder design experiment, we can expect a success rate of about 0.1% for binders that satisfy all of our binder metrics. That means we need to test thousands of designs in order to find a hit, and a typical Foldit puzzle only produces a couple hundred designs.

At the same time, we suspect that a lot of late-game optimization in Foldit design puzzles is wasted effort, and this work may not actually improve the final protein design. We’ve noticed that, after initial construction and refinement of a protein design, many players resort to heavy-duty scripts that run for days on end, making tiny changes to squeeze out the last few points and get to the top of the puzzle leaderboards. If that late-game optimization does not lead to higher-quality designs, then we would like to redirect that effort towards new designs.

Move limits

In the past, we've experimented with the Move Limit Objective as a possible approach to this problem. The Move Limit prevents players from spending time running heavy-duty optimization scripts, because these scripts will quickly burn through the allotted moves. We had hoped this would refocus player efforts toward multiple puzzle attempts.

While this seems to be moderately effective, the Move Limit has some problems. There's no strong incentive to actually restart a puzzle once you hit the Move Limit. It's also difficult to calibrate the actual number of moves that should be allotted, since different players with different play styles will naturally require different numbers of moves to make a good protein design.

A different approach

A more radical, but more direct, approach is to revise the overall reward system in Foldit (at least for protein design problems) to encourage multiple solutions for each puzzle. In this kind of system, the goal of competitive players (make many good designs) would be better aligned with the goal of Foldit scientists (test many good designs). So, instead of awarding points based only on your best score, perhaps we should award points for multiple high-scoring solutions.

This competition will serve as a kind of pilot experiment for such a reward system, where rankings reflect the number of solutions contributed to each research problem. We’ll be looking to see how this system impacts puzzle results, and whether it has any unintended effects on gameplay. (Of course, we also hope this competition will produce lots of binder designs for influenza HA!)

The competition will remain open for two weeks. Players will have until March 26 to create as many 10,000 point solutions as possible. Play Puzzle 1968: Influenza HA Binder Design Competition now!

Edit: See the followup blog post for the final results of this competition!

( Posted by  bkoep 85 714  |  Fri, 03/12/2021 - 23:48  |  5 comments )

Two-sided protein interface design

This blog post explains some of the background science behind recent Two-sided Interface Design puzzles, like Puzzle 1963. IPD scientist Ryan Kibler elaborates on the goals of these puzzles, how they might be used, and the special challenges we face when designing proteins that bind each other to form organized protein assemblies.

Protein assemblies in nature

A major theme of recent Foldit puzzles is designing symmetric proteins. Through playing these puzzles, you have no doubt realized that symmetry allows you to build a large protein assembly by designing a single chain to bind with itself. Nature has apparently realized this, too. About 63% of the different kinds of proteins naturally produced by E. coli bacteria exist as symmetric homo-oligomers (“assemblies of the same chain”).1

Hetero-oligomers (“assemblies of different chains”) are much rarer but usually serve important biological functions. A great example of this is the LSm family of proteins. In humans and other eukaryotes, seven different chains assemble into a hetero-heptameric (“seven different parts”) LSm ring. These LSm rings have many functions, but they often involve binding to RNA and acting as hand-holds for other proteins to grab and make modifications to the RNA, or carry it from place to place.

Curiously, LSm proteins are found in all types of organisms (eukaryotes, archaea, and bacteria) but they don’t all form hetero-heptameric rings. In archaea they form homo-heptameric (“seven of the same part”) rings, where multiple copies of the same protein bind each other. Since eukaryotes are thought to have evolved from archaea, this suggests an interesting evolutionary tale where the gene encoding the homo-heptameric LSm protein in archaea was duplicated and diversified to the point where each one of the seven proteins prefers to assemble with different partners rather than with identical copies of itself.

Designing protein assemblies at the IPD

Designing hetero-oligomers with more than a few chains is a difficult task because there is a lot that can go wrong when you design so many interfaces simultaneously. Instead, scientists at the IPD have taken inspiration from this evolutionary story of LSm and are taking a different, incremental approach. Rather than designing all the interfaces at once to make a hetero-oligomer in one step, we have broken the problem down into smaller sub-problems. We start with a single homo-oligomer and redesign both sides of the interface many different times, then make sure the interface forms correctly through experimentation, and finally we recombine the interfaces into a single protein. Using this strategy, we have already successfully transformed a homo-trimer into a hetero-trimer.

Figure 1. A schematic diagram shows how we create hetero-oligomers from a homo-oligomer starting structure. We can break the problem down into smaller pieces, and start by creating lots of different homo-oligomers. After we test the different homo-oligomers in the lab, we can recombine their different interfaces to create hetero-oligomers.

The two keys to this strategy are: (1) keeping the protein backbone fixed, except for the parts that make up the interface, so that the different interfaces can be copied and pasted onto the same starting structure; and (2) making the interfaces as diverse as possible in order to prevent unintended off-target assembly between the wrong interfaces.

Key #1 is easily accomplished by freezing the non-interface region to hold the backbone in place. We have been using the same alpha helical backbone for all interfaces, allowing only small tweaks to the starting structure. But key #2 is harder. So far, we have relied on H-bond networks to prevent unintended off-target assembly. This works because if two wrong chains attempt to bind, their non-matching arrangements of polar residues become energetically unfavorable BUNS and prevent binding. Only when the two correct chains come together, the H-bond network forms and produces a stable interface with zero BUNS.

Assemblies with increased complexity

On our path to increasingly more complicated hetero-oligomers, we are now trying to make a hetero-tetramer (“four different parts”). The more chains there are in an assembly, the harder it is to make them bind in the correct arrangement, because there are exponentially more chances for off-target assemblies. We believe we will need more than H-bond networks to prevent off-target assemblies.

One very good way of further diversifying the interfaces is to make them physically different by changing the protein backbone. Like bkoep mentioned in Lab Report #17, scientists’ designs use mostly alpha helices at the interfaces, because we have a good handle on how to generate alpha helical backbones, and we have a good understanding of how to design sequences for alpha helices. Unfortunately, we’ve observed that interfaces which are made entirely out of alpha helices are prone to off-target assembly with similar-looking alpha helical interfaces. If we want to get a large number of specific interfaces, it will not be enough to simply make small tweaks to our starting alpha helices.

Two-sided interface design puzzles

This is where Foldit comes in. Foldit players are extremely good at making diverse protein backbones, so we’re challenging you to redesign the interfaces of our homo-tetramers, in a series of Two-sided Interface Design puzzles!

The most recent and most promising starting structure is a homo-tetramer called RC4_20, a circular tandem repeat protein (known internally as a “donut”). This protein was originally designed at the IPD by Phil Bradley, PhD,2 and later modified by Alexis Courbet, PhD, to have a larger interface, making its tetrameric form more stable and also easier to see under an electron microscope.

Figure 2. This homo-tetramer "donut" protein was designed by IPD scientists. We can use it as a starting point and create a hetero-tetramer by redesigning the interface between chains. Highlighted in red are the frozen stubs that were provided in the starting structure of Puzzle 1959.

To work within the constraints of key #1 (making sure that different interfaces can be copied and pasted onto the same starting structure), our Foldit puzzles start with a section that is frozen and a section that you can refold. Keep in mind that this is only a small part of the larger assembly (Figure 2). To convert the homo-oligomer RC4_20 into a hetero-oligomer, we simply swap out the original interface with the different interfaces designed in Foldit.

Initial design results

We’ve been happy to see many players attempting to install beta sheets at the interface. This is exactly the sort of diversity we were looking for! The design below, from an anonymous player, has a beta sheet with the outer edge pointing out to surrounding water and the inner edge placed against the interface. The polar atoms on the inner edge can participate in a H-bond network if they are satisfied by polar residues from chain B. Unfortunately, this design didn’t satisfy all of these backbone polar atoms. In future puzzles, we would love to see players use the edge strand of a beta sheet as part of an H-bond network!

Figure 3. A creative design with a beta sheet at the interface. A variety of backbone shapes will help us create well-behaved assemblies that can avoid off-target binding. The highlighted inner edge strand has polar atoms along the backbone that are buried at the interface. Those polar atoms would be more stable as part of an H-bond network.

Another feature we like to see is tight packing between both starter alpha helices on chain A. While it’s hard to tell from the truncated starting structure we gave you, making good contacts with both of these alpha helices is important to maintain structural rigidity. If your refolded interface only contacts one of these alpha helices, that alpha helix could act like a hinge and cause your interface to swing around and not be in the correct position to bind the other half. The solution below from LociOiling has good packing between both alpha helices of the starting stub and an alpha helix backing up the interface. Also, we like the placement of a key tryptophan buried at the interface. It would be really great to see a H-bond network sprout out of that sidechain nitrogen in future designs!

Figure 4. A design with great sidechain packing by LociOiling. On the left, the highlighted alpha helix packs closely against both alpha helices on the chain A starting stub, and also backs up the alpha helices that actually contact chain B. On the right, a buried TRP sidechain is buried at the interface for tight binding, but the polar N atom is unsatisfied and would like to make a H-bond.

Future work

In the most recent Two-sided Interface Design puzzles, we’ve asked Foldit players to install H-bond networks across the interface, in addition to refolding the backbone, to make the interfaces as different as possible and avoid off-target assembly. We’ve also asked for one disulfide bond across the interface, and that’s because we know that a disulfide is important for encouraging assembly of this particular RC4_20 donut.

In solutions from Puzzle 1959, the disulfides you’ve designed look really good, but the H-bond networks need a little work. We’ve noticed that the H-bond networks are usually not very big, and often are at the protein surface, near the water that surrounds the protein. Instead, H-bond networks should be buried in the core of the interface where they can be most effective at discouraging off-target assembly. Remember, the polar network residues will form BUNS if they try to bind against a chain with non-matching polar residues. If you want some inspiration for making effective H-bond networks, you can review some of the published H-bond networks designed by IPD scientists.3

Thank you all for your solutions to these puzzles! We’re accumulating a list of designs that we plan to test in the wet lab in the coming months. But there’s still a lot of room, so keep up the good work! Check out the latest Puzzle 1963: Two-sided Interface Design now!

1. Goodsell, D. S. & Olson, A. J. Structural symmetry and protein function. Annu. Rev. Biophys. Biomol. Struct. 29, 105–153 (2000)
2. Correnti, C. E. et al. Engineering and functionalization of large circular tandem repeat protein nanoparticles. Nat. Struct. Mol. Biol. 27, 342–350 (2020)
3. Boyken, S. E. et al. De novo design of protein homo-oligomers with modular hydrogen-bond network-mediated specificity. Science 352, 680–687 (2016)

( Posted by  rdkibler 85 2055  |  Tue, 03/02/2021 - 22:09  |  0 comments )

2020 Snowflake Challenge Results

Hey folders!

Dev Josh here with the results of our 2020 Snowflake Challenge! A big thanks to everyone who participated, I hope you had as much fun making these snowflakes as I had admiring them!

This challenge was a sequel to our 2014 Snowflake Challenge, and this year y'all crushed it again! Like last time, I tried to have each dev pick a favorite, but some of these snowflakes were so impressive, they got picked multiple times. Let's start with the runner-ups:

These are already so great, thanks to everyone who submitted a snowflake!

Next up, our Bronze winners:

Our community manager agcohn821 likes this snowflake by Evica:

Small molecule developer Sciren likes this one by phi16:

Our newest dev milkshake calls this one by WBarme1234 "Snowpizza":

Small molecule scientist rmoretti on the other hand prefers something that could potentially fold up in real life. He calls this piece by infjamc "Winter at 77 K":

Coming up next, our Silver winners, who received multiple mentions from the team:

The self-described "Summer Snowflake" by AlkiP0Ps:

And this masterpiece by cjddig, which beta_helix calls "Snowflake Under a Microscope":

Last but not least, our Gold winners!

This marvel by Vincero had my vote among others. Vincero provided a title and description for their own work:

Finally, for the very top favorite of the team and the winner of this year's Snowflake Challenge, we congratulate fiendish_ghoul for this gorgeous snowflake.

Thanks again to everyone who particated! This was a ton of fun and we hope to do it again soon!

Until next time, happy folding!

( Posted by  joshmiller 85 957  |  Tue, 01/19/2021 - 18:22  |  2 comments )

Coronavirus Designable Linker Puzzles

This week we are introducing a brand new Designable Linker Puzzle! This kind of puzzle involves two or more protein domains that are fixed in space, and players are challenged to link them with a rigid, well-folded linker that preserves the orientation of the starting domains.

Linking Coronavirus Spike Binders

Puzzle 1912b is the first puzzle of this type. This example is particularly special, as we are asking you to link two of the best known SARS-CoV-2 spike binders. These computationally-designed binders came from scientists at the Institute for Protein Design, and currently exhibit some of the best binding affinities for any known SARS-Cov-2 spike binder. The original binders are currently being developed for possible COVID-19 tests or therapeutics.

It took a large number of supercomputing hours to generate these binders, and less than 0.1% of those that were tested showed any binding affinity for the target. This goes to show just how hard binder design is! You can read more about these binders in this previous blog post. With these binders now in hand, we want to see how much we can improve them.

A model of how two designed proteins can bind the SARS-CoV-2 spike. The spike chains are shown in green, magenta and cyan. Colored in salmon are two designed binders, LCB1 and LCB3. The binders have been truncated and augmented with helices to bring their termini closer together.

The starting structure of Puzzle 1912 has a linker connecting two frozen α-helix bundles. These α-helix bundles are truncated parts of two spike binders designed by scientists at the IPD, LCB1 and LCB3. The puzzle also includes small sections of the target spike protein, although we don't need to make any more binding interactions with the spike. The goal of the puzzle is to design a rigid linker that keeps the binders in the starting orientation.

The binding affinity measures the tightness of binding between two chains, and is directly related to the change in free energy between the bound and unbound states (also known as DDG, described previously). If we can find a rigid linker that holds the two binders in a fixed orientation, we can roughly double the DDG to significantly increase the binding affinity of the linked binder.

The loopy linker in the starting structure won’t work because it will be too flexible in solution. The two binder domains will flop around and behave independently, like two separate binders. However, if the linker were well-folded and rigid, the two binder domains could behave like a single protein with double the binding surface.

The starting structure for Puzzle 1912. The helical binding domains and small portions of the spike are frozen. In green is the designable linker that needs to be folded.

How to Score Well

Designed linkers with lots of secondary structure (sheets or helices) will score better and be more likely to do the job. We're looking for linkers that hold the binders in the proper orientation with more rigidity than a flexible alanine chain. We are using a few objectives to encourage well-folded linkers, including the Core Exists and SS Design Objectives. The BUNS Objective is also active on the linker.

We look forward to seeing how Foldit players solve this problem! Promising designs may be tested at the IPD for improved binding to the spike. A tighter binder would be especially useful for detecting small amounts of coronavirus in a fast and sensitive diagnostic test.

The Future of Designable Linker Puzzles

Rigid linker design is an outstanding problem in protein design. It is made especially difficult by the fact that the connected domains are constrained to their starting position, and the designed linker cannot clash with other chains nearby (like the binding target).

Scientists have been trying to develop computational methods to design rigid linkers from scratch, but have not had much success. They suffer from limitations that don't apply to Foldit players, and we think that human ingenuity and hands-on problem solving might be the answer to this problem.

Check out the Designable Linker: Coronavirus Spike Binder puzzle now!

Happy Folding!

( Posted by  neilpg628 85 1050  |  Tue, 11/03/2020 - 21:43  |  1 comment )
