Coronavirus Designable Linker Puzzles
This week we are introducing a brand new Designable Linker Puzzle! This kind of puzzle involves two or more protein domains that are fixed in space, and players are challenged to link them with a rigid, well-folded linker that preserves the orientation of the starting domains.
Linking Coronavirus Spike Binders
Puzzle 1912b is the first puzzle of this type. This example is particularly special, as we are asking you to link two of the best known SARS-CoV-2 spike binders. These computationally-designed binders came from scientists at the Institute for Protein Design, and currently exhibit some of the best binding affinities for any known SARS-Cov-2 spike binder. The original binders are currently being developed for possible COVID-19 tests or therapeutics.
It took a large number of supercomputing hours to generate these binders, and less than 0.1% of those that were tested showed any binding affinity for the target. This goes to show just how hard binder design is! You can read more about these binders in this previous blog post. With these binders now in hand, we want to see how much we can improve them.
A model of how two designed proteins can bind the SARS-CoV-2 spike. The spike chains are shown in green, magenta and cyan. Colored in salmon are two designed binders, LCB1 and LCB3. The binders have been truncated and augmented with helices to bring their termini closer together.
The starting structure of Puzzle 1912 has a linker connecting two frozen α-helix bundles. These α-helix bundles are truncated parts of two spike binders designed by scientists at the IPD, LCB1 and LCB3. The puzzle also includes small sections of the target spike protein, although we don't need to make any more binding interactions with the spike. The goal of the puzzle is to design a rigid linker that keeps the binders in the starting orientation.
The binding affinity measures the tightness of binding between two chains, and is directly related to the change in free energy between the bound and unbound states (also known as DDG, described previously). If we can find a rigid linker that holds the two binders in a fixed orientation, we can roughly double the DDG to significantly increase the binding affinity of the linked binder.
The loopy linker in the starting structure won’t work because it will be too flexible in solution. The two binder domains will flop around and behave independently, like two separate binders. However, if the linker were well-folded and rigid, the two binder domains could behave like a single protein with double the binding surface.
The starting structure for Puzzle 1912. The helical binding domains and small portions of the spike are frozen. In green is the designable linker that needs to be folded.
How to Score Well
Designed linkers with lots of secondary structure (sheets or helices) will score better and be more likely to do the job. We're looking for linkers that hold the binders in the proper orientation with more rigidity than a flexible alanine chain. We are using a few objectives to encourage well-folded linkers, including the Core Exists and SS Design Objectives. The BUNS Objective is also active on the linker.
We look forward to seeing how Foldit players solve this problem! Promising designs may be tested at the IPD for improved binding to the spike. A tighter binder would be especially useful for detecting small amounts of coronavirus in a fast and sensitive diagnostic test.
The Future of Designable Linker Puzzles
Rigid linker design is an outstanding problem in protein design. It is made especially difficult by the fact that the connected domains are constrained to their starting position, and the designed linker cannot clash with other chains nearby (like the binding target).
Scientists have been trying to develop computational methods to design rigid linkers from scratch, but have not had much success. They suffer from limitations that don't apply to Foldit players, and we think that human ingenuity and hands-on problem solving might be the answer to this problem.
Check out the Designable Linker: Coronavirus Spike Binder puzzle now!
Happy Folding!( Posted by neilpg628 70 1811 | Tue, 11/03/2020 - 21:43 | 1 comment )
Update on Aflatoxin Challenge
The Siegel Lab is back again with an update on the Foldit Aflatoxin Challenge!
We were really thankful for all of your designs and even gave a few back to you as prediction-style puzzles in Rounds 14-17. These puzzles challenged you to predict the apo structure of the designs -- the protein structure when aflatoxin is not present. We have some news regarding our results, now that we have transferred them from in silico to in vitro!
Prediction of apo structures
When we compiled all of your puzzle solutions from Round 13 and narrowed it down to the most promising entries, we wanted to do a pilot study of a few of the most radical designs to determine how well the laccase starting protein would handle large, structural changes.
Many of these designs expressed and were active in our reporter assay, which we were happy to see; however, all had lost the ability to degrade aflatoxin. We believed that the active site was changed in a way that didn’t allow aflatoxin to fit for catalysis. Your solutions for our apo structure puzzles readily confirmed this.
Below, the original player design is shown in green, with the other colors depicting the best scoring player apo solutions. We can see that these top-scoring apo structures are crowding the position where aflatoxin is supposed to sit. Clearly aflatoxin would have a hard time fitting in those active sites, which was very consistent with what we were seeing in our assays.
New testing results
Using this information we ordered 56 player designs from Puzzle 1739 and tested them all using our high throughput methods. Fortunately, several had activity on aflatoxin and all of these were grown and assayed in larger scale to ensure accuracy.
Of the approximately 20 active designs, 3 were found to be the most active and highest expressing, making them excellent candidates for more design! Two of these come from LociOiling, and the third is a design by Phyx.
We want to understand how the active sites for these 3 designs may look when aflatoxin is not present, so we are releasing new apo prediction puzzles based on these designs in the coming weeks. We hope you will give these puzzles a try and help us in the next step of the Aflatoxin Degradation Project!bkoep 70 565 | Tue, 10/06/2020 - 21:52 | 0 comments )
Introducing Foldit Metrics
Foldit Metrics are a new kind of Objective. They will appear in the Objectives dropdown, under the score panel at the top of the screen.
Just like normal Objectives, Metrics calculate useful properties of your solution, and can award bonuses that boost your score. However, Metrics are different from other Objectives in that they are much slower to compute.
Normally we like to ensure that Foldit can calculate your score (the base Foldit score plus all Objectives) in less than 30 milliseconds -- brief enough that it appears to your brain as "immediate." That way Foldit can constantly update your score in real time as you fold your protein.
However, some kinds of protein calculations simply can’t be completed in that time. Anything that takes more than about 100 milliseconds would cause a noticeable delay, and Foldit gameplay would become frustratingly “choppy” as the scoring struggles to keep up with your folding. We’ve developed Foldit Metrics as a way to handle these slow calculations without interrupting regular gameplay.
Our latest devprev update includes support for Metrics, and we’ve posted a non-competitive puzzle for devprev users to try out the new features. After some time in devprev, we will release Metrics in a main update so we can start using them in our Science Puzzles.
Puzzles with Metrics will behave a little differently than other puzzles. Below we describe the Metrics features and discuss the new challenges they bring to Foldit gameplay.
Hand-folding with Metrics
Since Metrics are too slow to compute in real time, Foldit runs them in the background. Whenever you make a substantial change to your solution, the Metrics will start calculating in the background, while the rest of Foldit continues to respond to mouse clicks and keystrokes.
Until the calculation completes, your score at the top of the screen will be greyed out and will not update. When all Metric calculations are completed, the score will update and regain its usual color.
You can continue folding your solution while the Metrics are calculating.
When the Metrics finish, the calculations will automatically restart for your latest solution. Note that the Metrics will skip over any intermediate solutions, so you don’t have to worry about accumulating a backlog of Metrics to slog through. [CORRECTION: Metrics will continuously calculate in the background. When a calculation is complete, it will be permanently associated with the solution in case you want to go back.] If you want to see the Metrics for your current solution, you can just stop folding, and the Metrics should catch up in a second or two. If you don’t want to wait a second or two for score updates, you can disable Metrics while you are hand folding. [CORRECTON: You can disable Metrics while you are hand folding but your score will keep updating without the metric score while the metrics are running either way.] While a Metric is disabled, your score will update in real time like in regular puzzles, but the score will be invalid. To trigger a one-time Metric calculation while it is disabled, click the “Run” button next to the Metric.
Using recipes with Metrics
Existing recipes will ignore the new Metrics by default. You can run any normal recipe in a Metrics puzzle, and it should run just as fast as in any other puzzle.
This comes with an important caveat:
Existing Lua functions like current.GetScore do not include Metrics bonuses.
That means that the value returned by current.GetScore may not match the competitive score at the top of your screen. And the value returned by creditbest.GetScore may not match your competitive score on the Foldit leaderboards. Recipes will need to be modified to support Metrics.
In order to get your competitive score in a recipe, you will need to add together the value of current.GetScore and metric.GetBonusTotal. But be careful -- accessing Metric bonuses in a recipe can drastically increase the recipe’s run time! Every time you access a Metric bonus in a recipe, the recipe stops to wait for the Metric to compute.
Metrics are distinct from filters in Foldit recipes, and have separate Lua functions. Functions like filter.DisableAll will have no effect on Metrics, and filter.GetNames will not return the names of any Metrics. Our first release includes three new Lua functions for Metrics:
Return type: table
Description: Returns a table containing the names of all metrics in the puzzle.
Parameters: string name [only names of metrics are recognized, others produce Lua errors]
Return type: number
Description: Triggers the (slow) computation of the named metric. Blocks computation of the script until the metric is finished computing, then returns the metric score.
Return type: number
Description: Triggers the (slow) computation of all metrics. Blocks computation of the script until all metrics are finished computing, then returns the sum of all metric scores.
Learning to play with Metrics
It will take some time for us to figure out the best way to use Metrics in Foldit. We think that they will help us produce better solutions in Science Puzzles, but this has to be balanced with gameplay and fair competition in Foldit.
Compared to the base Foldit score, Metrics are much slower to compute, but the good news is that we don’t think they need to be calculated as frequently. Although we’d like to strive for solutions with decent Metrics, we don’t necessarily want to grind away at them to squeeze out tiny gains.
Likewise, we don’t want to invest too much importance in Metrics. The Foldit base score is still our primary tool for judging solutions, although we know from lab experiments that some Metrics have informative thresholds.
For example, we’ve seen that most successful binder designs tend to have a shape complementarity (SC) Metric > 0.60. However, it’s not clear that increasing SC beyond this threshold is helpful, and we certainly don’t want to sacrifice other design features (like a well-packed, hydrophobic core) for good SC.
With this in mind, we’ll be starting with Metrics that award a flat bonus at a threshold value. [NOTE: We are also trying out metrics that award increasing bonuses UP TO a threshold]. For example, we may award a set bonus for a binder with SC of at least 0.60, but you will not get a bigger bonus for increasing SC further than that. Once you find an initial solution that comfortably meets the threshold, we hope that you can turn off the Metric and only check it periodically while you optimize other features of your solution.
Beyond that, we’re not sure about the best strategies for folding with Metrics! Scientists traditionally use them to weed out poor designs from big batches, but never spend time tweaking those designs to improve their Metrics. This is an experiment and we don’t know where it will lead.
We’ll be counting on players for feedback about what works and what doesn’t. Please don’t hesitate to leave us feedback or suggestions, or to ask questions in the comments below!
Devprev users can check out the new Metrics now in the [DEVPREV] LCB1 Binder with Metrics puzzle.( Posted by bkoep 70 565 | Thu, 10/01/2020 - 09:02 | 6 comments )
Experiment results for coronavirus spike binders
The experimental results are in for Foldit player’s 99 binders against the coronavirus spike protein! If you’ve been following along, you know this experiment was planned for earlier this summer, but got held up by some technical problems with our DNA supplier. Well, we found a workaround, got new materials, and ran the binding experiment to test whether any of the 99 Foldit designs bind to the SARS-CoV-2 spike protein.
Unfortunately, we did not see appreciable binding from any of the 99 Foldit designs. Below we’ll walk through the details of the experiment, and we’ll also discuss some exciting news about a successful binder designed by IPD scientists.
Our binding experiment uses two techniques called yeast display and fluorescence activated cell sorting (FACS). You can read more about those techniques in a previous blog post.
In short, we put custom DNA into 100,000s of yeast cells, which then display our protein designs on their surface. After mixing our yeast with fluorescent target protein, we can quickly sort through the yeast cells and pick out those that bind to the target.
Figure 1. (A) Schematic of FACS experiment and (B) example scatter plot of fluorescence from a FACS sort. Each point is a yeast cell, with green fluorescence (expression) on the x-axis, and red fluorescence (binding) on the y-axis. Points in the top right corner represent cells with both red and green fluorescence, indicating good expression and binding.
After each sort, we sequence the DNA of just the collected cells (e.g. the cells that showed expression and binding signal). These DNA sequences can be mapped back to the protein designs that were displayed on the yeast cells.
We count how many times we read each design in the sequencing data. A design with a high number of sequencing counts means that a lot of yeast cells displaying this design were collected, and indicates a successful binder.
Below is a preview of the data. You can download the data for all 99 designs here.
pdb_id counts1 counts2 counts3* counts4 counts5 counts6 BUNS DDG SASA SC 2008926_c0022 10 0 0 0 0 0 7 -33.546 1314.890 0.661 2008926_c0023 21 1 0 0 0 0 8 -33.030 1391.938 0.663 2008926_c0026 30 13 0 0 0 0 12 -37.822 1621.635 0.584 2008926_c0034 1073 2357 0 0 0 1 12 -44.100 1656.985 0.648 2008926_c0036 3 3 0 0 0 0 9 -46.865 1574.854 0.648 2008926_c0037 590 4026 0 45 52 144 7 -36.222 1633.888 0.569 2008926_c0040 343 323 1 0 0 0 10 -35.853 1568.804 0.645 2008926_c0042 57 199 0 0 0 0 6 -31.511 1407.946 0.490 2008926_c0052 2 0 0 0 0 0 6 -31.936 1445.994 0.555 ...
*Note: There was a sequencing error for sort #3, which is why the counts are mostly zeros in the counts3 column. The counts3 numbers do not represent the actual collected fraction from sort #3, and we should disregard those numbers. Fortunately, since sort #3 was an enrichment sort and we have good data for later sorts, we don’t need those counts to interpret the experiment results.
We used a different sorting schedule here than we did in the previous IL6R experiment. In the IL6R experiment, Foldit designs were pooled with a number of IPD designs and were sorted together at the same time. We screened that entire pool against a range of binding conditions (target concentrations from 0.1 to 1000 nM).
In this spike binder experiment, we were able to purify the starting pool so that it was made up almost entirely of Foldit designs. We also took some extra steps to enrich the starting pool, and we only screened against high concentrations of target after enrichment.
- Enrichment at 1000 nM target
- Enrichment at 1000 nM target
- Enrichment at 1000 nM target
- Binding at 1000 nM target
- Binding at 100 nM target
Instead of going directly from the starting pool into binding sorts at different concentrations of target, we first carried out several rounds of enrichment sorting in order to amplify any potential binders. An enrichment sort is very similar to a binding sort, where we select yeast cells that have both expression and binding signal. The experimental conditions are a little more lenient for binding during an enrichment sort.
The important part of enrichment is that the selected fraction of each enrichment sort provides the input for the following sort. If we do this several times in a row, we can drastically enrich the composition of the pool to favor anything that binds even a little bit. This is a way to increase the presence of any weak binders, and helps to ensure we don’t miss anything that was underrepresented in the starting pool.
Figure 2. Diagram of sort procedure. Each bar represents a pool of cells that undergoes sorting. In sort #1, we collect only cells that show high expression (green fluorescence), and these cells become the input for sort #2. Sorts #2-4 are enrichment sorts which should exponentially increase the presence of any binders in the pool. After enrichment, sorts #5 and #6 screen for cells that show binding signal at different concentrations of target.
For each of the sorts in the figure above, we've also noted the percentage of cells that were collected from the sort. In expression sort #1, we collected cells based only on whether they display any protein on their surface (green fluorescence). In sorts #2-6, we collected cells based on whether they bind to the target (red fluorescence).
If there are any successful binders in the starting pool, their prevalence should increase exponentially during enrichment sorts. After a few rounds of enrichment, successful binders will grow to dominate the pool so that the majority of cells show binding.
Unfortunately, after three rounds of enrichment, we still see that <5% of cells show any binding signal at 1000 nM target concentration. This is a clear sign that nothing in the pool binds significantly at 1000 nM target ("easy" binding conditions).
Figure 3. FACS data for Foldit spike binders. Each point represents a single yeast cell displaying a Foldit binder on its surface. The x-axis is intensity of green fluorescence (how much binder is expressed on the cell surface) and the y-axis is red fluorescence (how much target is bound at the cell surface). If there were any successful binders in the pool, we would expect to see a large population in the top right corner of each plot.
Looking at the sequencing counts, we see that a handful of designs did become more prominent during enrichment and show up consistently in the final binding sorts. This does indicate that these designs tend stick to the target somewhat more than other designs in the pool. However, these low numbers are consistent with what we could expect from unfolded non-specific binding, or very weak binding. It is unlikely these designs are folding and sticking to the target as intended, and we cannot expect to improve them by optimization.
A successful IPD-designed binder
In separate news, scientists at the IPD have successfully designed a binder for the coronavirus spike protein! This result was recently posted as an online preprint (meaning the paper has not yet been peer-reviewed).
Rather than design individual proteins by hand, the IPD scientists used supercomputers to automatically generate millions of designs, then checked whether the designs had good binder metrics. Over 90% of the designs were thrown out because they didn’t meet binder metric criteria. The best designs were then tested for binding using the same kinds of FACS experiments we used to test Foldit designs.
Note that this design strategy is not very efficient and requires heavy computational resources. From the millions of initial designs and the 100,000 that were tested, the researchers found only about 100 designs that showed any binding in the lab.
Afterward, scientists did some additional optimization on the best binders, trying all different mutations at every site on the protein. The final optimized designs can bind to the coronavirus spike extremely tightly--even more tightly than natural antibodies!
Lab tests showed that the binders can stop live virus from infecting human cells in a test tube, but these binders still need to be tested in animals before they can be considered drug candidates for clinical trials.
Figure 4. Coronavirus spike binder designed by IPD scientists. On the left, the designed protein binder LCB1 sits at the receptor binding domain (RBD) of the coronavirus spike protein. On the right, lab tests show that this protein (pink trace) is a potent inhibitor of viral infection in human cell culture. Further tests are needed to determine efficacy and safety in whole organisms.
What does this mean for Foldit?
This binder from IPD scientists is great news, and these results help to outline the future direction of binder design in Foldit.
First, the scientists’ method gives us more confidence in Foldit design tools. The automated design methods use the same score function that is used to calculate your Foldit score. And the researchers selected designs using the same binder metrics we've discussed previously (DDG, SASA, and shape complementarity).
But the strategy of the IPD scientists has some shortcomings. Although these automated methods worked great against the coronavirus spike protein, there are many other binder targets that are poorly suited for this approach.
The automated methods work almost exclusively with small 3-helix bundle designs. Other binder targets have convex shapes that aren’t so compatible with a 3-helix bundle fold, or they have protrusions that require special attention. Some binder targets are covered with polar residues that are extremely difficult to satisfy using automatic design.
Those hard problems, where our algorithms fail, are precisely the problems where we think Foldit can excel. We’re looking forward to challenging Foldit players with those tricky problems, and we can get started once we’ve fully integrated the binder metrics into Foldit (we’re almost there -- we appreciate your patience!).
In the meantime, we’ve created a sandbox (non-scoring) puzzle so you can explore the IPD binder in Foldit. Check out the LCB1 Coronavirus Spike Binder puzzle, and get ready for binder metrics to come back in future puzzles!( Posted by bkoep 70 565 | Mon, 08/31/2020 - 15:57 | 4 comments )
Foldit Education Mode
Although Foldit was originally made for science, we always knew it had potential as a learning tool. Until recently, we haven’t done a lot to help teachers use Foldit in their classrooms. We added Custom Contests so teachers could make their own puzzles, but this still takes a lot of time and energy.
The Foldit team had been talking about making a version of Foldit for education, but when the pandemic hit it became clear students across the world needed more remote learning options. So we accelerated our plans, and today we are proud to announce the release of Education Mode!
Figure 1: The Education Mode version of Wiggle teaches you both how to wiggle, and what it’s actually doing.
Education Mode will be launched as a separate app from the main Foldit game. This may change in the future, but for now, if you want to use Education Mode you need to have it installed separately. The downloads can be found on our new educator’s page here.
The core idea of Education Mode is to teach a section of a protein biochemistry class through Foldit. We hope this is helpful not only for students, but for anyone curious about the basic science behind protein biochemistry. Even if you’ve been playing Foldit for a while, check out Education Mode for some bonus science and tutorials!
Figure 2: New Primary Structure Puzzle. This is a protein design puzzle, but the purpose is to help you think about which amino acids fit best where in a protein and why based on the underlying chemistry. You’ll notice that the design wheel has had all of the pictures removed to encourage you to visualize the amino acids.
Education Mode has 29 puzzles in 9 tiers. Many of these puzzles are variants of the campaign puzzles, which are designed to teach Foldit gameplay. However, you’re also likely going to learn some biochemistry along the way! In the typical campaign puzzles, we don’t emphasize the biochemistry learning part of it so that you can get to the game quicker and without having to feel like you’re going through a biochemistry class. In Education Mode, the tips focus on teaching you the biochemistry behind the puzzle while learning to play Foldit along the way.
Figure 3: New Idealizing Structure Angles Puzzle. This puzzle is an evolution of the Structure and Idealize campaign puzzle, but now expanded to relate secondary structure to the Rama map, and how to use it.
The Education Mode puzzles start on atomic interactions (like clashes and hydrogen bonds), then focus on amino acid structure before proceeding through different levels of protein structure (primary, secondary, and tertiary structure). Finally, there are a few puzzles on how proteins actually fold in nature, and a final puzzle on protein binding to DNA.
A new feature that you won’t find in the campaign levels is that on many of the puzzles, you can explore the puzzle before clicking through the tutorial, and then reset the puzzle to start scoring. This is so you can explore and experiment before attempting the puzzle for real. You’ll notice that the education tips have both forward and back buttons, and some of them now have pictures to illustrate more abstract concepts! Like the campaign mode, once you’ve completed a puzzle, it will prompt you to move to the next one, but you can also keep playing the puzzle to see if you can improve your score even more.
Some of these puzzles are intentionally hard. We’ve enabled the Save function so that you can take a break and reload your progress, and we have also made it so that you can skip puzzles, in case you want to move on to another topic.
Figure 4: New Tertiary Structure Puzzle. This puzzle is geared specifically to teach students about the difference between secondary and tertiary structure in proteins.
You might notice that we’ve disabled some popular tools (like Wiggle) in many of the Education Mode puzzles. This is to encourage more hand-folding and critical thinking about your choices as opposed to letting the computer do it for you.
Some tools, like Blueprint, are missing from Education Mode because we are still developing lessons for them. For now, the regular Campaign levels are still the best way to learn these tools.
One last feature that we added into Educational Mode is extra camera controls. By pressing Shift+Home, the camera will rock back and forth. Pressing Alt/Option+Home will set the camera into a spin motion. Press the hotkey again to stop the motion. We hope that these new features can help you better visualize the 3D space of your protein!horowsah 70 1811 | Sat, 08/01/2020 - 12:53 | 0 comments )