Experiment results for coronavirus spike binders
The experimental results are in for Foldit player’s 99 binders against the coronavirus spike protein! If you’ve been following along, you know this experiment was planned for earlier this summer, but got held up by some technical problems with our DNA supplier. Well, we found a workaround, got new materials, and ran the binding experiment to test whether any of the 99 Foldit designs bind to the SARS-CoV-2 spike protein.
Unfortunately, we did not see appreciable binding from any of the 99 Foldit designs. Below we’ll walk through the details of the experiment, and we’ll also discuss some exciting news about a successful binder designed by IPD scientists.
Our binding experiment uses two techniques called yeast display and fluorescence activated cell sorting (FACS). You can read more about those techniques in a previous blog post.
In short, we put custom DNA into 100,000s of yeast cells, which then display our protein designs on their surface. After mixing our yeast with fluorescent target protein, we can quickly sort through the yeast cells and pick out those that bind to the target.
Figure 1. (A) Schematic of FACS experiment and (B) example scatter plot of fluorescence from a FACS sort. Each point is a yeast cell, with green fluorescence (expression) on the x-axis, and red fluorescence (binding) on the y-axis. Points in the top right corner represent cells with both red and green fluorescence, indicating good expression and binding.
After each sort, we sequence the DNA of just the collected cells (e.g. the cells that showed expression and binding signal). These DNA sequences can be mapped back to the protein designs that were displayed on the yeast cells.
We count how many times we read each design in the sequencing data. A design with a high number of sequencing counts means that a lot of yeast cells displaying this design were collected, and indicates a successful binder.
Below is a preview of the data. You can download the data for all 99 designs here.
pdb_id counts1 counts2 counts3* counts4 counts5 counts6 BUNS DDG SASA SC 2008926_c0022 10 0 0 0 0 0 7 -33.546 1314.890 0.661 2008926_c0023 21 1 0 0 0 0 8 -33.030 1391.938 0.663 2008926_c0026 30 13 0 0 0 0 12 -37.822 1621.635 0.584 2008926_c0034 1073 2357 0 0 0 1 12 -44.100 1656.985 0.648 2008926_c0036 3 3 0 0 0 0 9 -46.865 1574.854 0.648 2008926_c0037 590 4026 0 45 52 144 7 -36.222 1633.888 0.569 2008926_c0040 343 323 1 0 0 0 10 -35.853 1568.804 0.645 2008926_c0042 57 199 0 0 0 0 6 -31.511 1407.946 0.490 2008926_c0052 2 0 0 0 0 0 6 -31.936 1445.994 0.555 ...
*Note: There was a sequencing error for sort #3, which is why the counts are mostly zeros in the counts3 column. The counts3 numbers do not represent the actual collected fraction from sort #3, and we should disregard those numbers. Fortunately, since sort #3 was an enrichment sort and we have good data for later sorts, we don’t need those counts to interpret the experiment results.
We used a different sorting schedule here than we did in the previous IL6R experiment. In the IL6R experiment, Foldit designs were pooled with a number of IPD designs and were sorted together at the same time. We screened that entire pool against a range of binding conditions (target concentrations from 0.1 to 1000 nM).
In this spike binder experiment, we were able to purify the starting pool so that it was made up almost entirely of Foldit designs. We also took some extra steps to enrich the starting pool, and we only screened against high concentrations of target after enrichment.
- Enrichment at 1000 nM target
- Enrichment at 1000 nM target
- Enrichment at 1000 nM target
- Binding at 1000 nM target
- Binding at 100 nM target
Instead of going directly from the starting pool into binding sorts at different concentrations of target, we first carried out several rounds of enrichment sorting in order to amplify any potential binders. An enrichment sort is very similar to a binding sort, where we select yeast cells that have both expression and binding signal. The experimental conditions are a little more lenient for binding during an enrichment sort.
The important part of enrichment is that the selected fraction of each enrichment sort provides the input for the following sort. If we do this several times in a row, we can drastically enrich the composition of the pool to favor anything that binds even a little bit. This is a way to increase the presence of any weak binders, and helps to ensure we don’t miss anything that was underrepresented in the starting pool.
Figure 2. Diagram of sort procedure. Each bar represents a pool of cells that undergoes sorting. In sort #1, we collect only cells that show high expression (green fluorescence), and these cells become the input for sort #2. Sorts #2-4 are enrichment sorts which should exponentially increase the presence of any binders in the pool. After enrichment, sorts #5 and #6 screen for cells that show binding signal at different concentrations of target.
For each of the sorts in the figure above, we've also noted the percentage of cells that were collected from the sort. In expression sort #1, we collected cells based only on whether they display any protein on their surface (green fluorescence). In sorts #2-6, we collected cells based on whether they bind to the target (red fluorescence).
If there are any successful binders in the starting pool, their prevalence should increase exponentially during enrichment sorts. After a few rounds of enrichment, successful binders will grow to dominate the pool so that the majority of cells show binding.
Unfortunately, after three rounds of enrichment, we still see that <5% of cells show any binding signal at 1000 nM target concentration. This is a clear sign that nothing in the pool binds significantly at 1000 nM target ("easy" binding conditions).
Figure 3. FACS data for Foldit spike binders. Each point represents a single yeast cell displaying a Foldit binder on its surface. The x-axis is intensity of green fluorescence (how much binder is expressed on the cell surface) and the y-axis is red fluorescence (how much target is bound at the cell surface). If there were any successful binders in the pool, we would expect to see a large population in the top right corner of each plot.
Looking at the sequencing counts, we see that a handful of designs did become more prominent during enrichment and show up consistently in the final binding sorts. This does indicate that these designs tend stick to the target somewhat more than other designs in the pool. However, these low numbers are consistent with what we could expect from unfolded non-specific binding, or very weak binding. It is unlikely these designs are folding and sticking to the target as intended, and we cannot expect to improve them by optimization.
A successful IPD-designed binder
In separate news, scientists at the IPD have successfully designed a binder for the coronavirus spike protein! This result was recently posted as an online preprint (meaning the paper has not yet been peer-reviewed).
Rather than design individual proteins by hand, the IPD scientists used supercomputers to automatically generate millions of designs, then checked whether the designs had good binder metrics. Over 90% of the designs were thrown out because they didn’t meet binder metric criteria. The best designs were then tested for binding using the same kinds of FACS experiments we used to test Foldit designs.
Note that this design strategy is not very efficient and requires heavy computational resources. From the millions of initial designs and the 100,000 that were tested, the researchers found only about 100 designs that showed any binding in the lab.
Afterward, scientists did some additional optimization on the best binders, trying all different mutations at every site on the protein. The final optimized designs can bind to the coronavirus spike extremely tightly--even more tightly than natural antibodies!
Lab tests showed that the binders can stop live virus from infecting human cells in a test tube, but these binders still need to be tested in animals before they can be considered drug candidates for clinical trials.
Figure 4. Coronavirus spike binder designed by IPD scientists. On the left, the designed protein binder LCB1 sits at the receptor binding domain (RBD) of the coronavirus spike protein. On the right, lab tests show that this protein (pink trace) is a potent inhibitor of viral infection in human cell culture. Further tests are needed to determine efficacy and safety in whole organisms.
What does this mean for Foldit?
This binder from IPD scientists is great news, and these results help to outline the future direction of binder design in Foldit.
First, the scientists’ method gives us more confidence in Foldit design tools. The automated design methods use the same score function that is used to calculate your Foldit score. And the researchers selected designs using the same binder metrics we've discussed previously (DDG, SASA, and shape complementarity).
But the strategy of the IPD scientists has some shortcomings. Although these automated methods worked great against the coronavirus spike protein, there are many other binder targets that are poorly suited for this approach.
The automated methods work almost exclusively with small 3-helix bundle designs. Other binder targets have convex shapes that aren’t so compatible with a 3-helix bundle fold, or they have protrusions that require special attention. Some binder targets are covered with polar residues that are extremely difficult to satisfy using automatic design.
Those hard problems, where our algorithms fail, are precisely the problems where we think Foldit can excel. We’re looking forward to challenging Foldit players with those tricky problems, and we can get started once we’ve fully integrated the binder metrics into Foldit (we’re almost there -- we appreciate your patience!).
In the meantime, we’ve created a sandbox (non-scoring) puzzle so you can explore the IPD binder in Foldit. Check out the LCB1 Coronavirus Spike Binder puzzle, and get ready for binder metrics to come back in future puzzles!( Posted by bkoep 89 842 | Mon, 08/31/2020 - 15:57 | 4 comments )