Experiment results for IL6R binders
The results from our IL6R binder experiment are back! This experiment tested 100 Foldit designs from the first two rounds of our Coronavirus Anti-inflammatory puzzles, to see if any of them bind to the IL6R target.
In short, we did not see any successful binding from the Foldit designs. This is unfortunate, but we should not be too discouraged! Read on for more details about the experiment, and what these results mean for Foldit (hint: more binder design puzzles!).
This is a long blog post, broken into a few different sections. First, we’ll explain some background about DNA libraries and fluorescence-activated cell sorting techniques that were used for this experiment. Then we’ll go over the experiment results for protein expression and target binding. Finally, we’ll close out with some discussion about these results, and thoughts about what’s next for Foldit.
In order to test lots of proteins at once, we order a custom DNA library. A DNA library is a mixed pool containing thousands of different DNA genes that encode our designed proteins.
In this experiment, the library includes genes for 100 Foldit player designs and thousands of designs from IPD researchers. All of these designs are intended to bind to the IL6R target.
We insert this mixture of genes into a yeast culture so that each yeast cell gets a gene for just one binder design.
We insert our designed gene alongside a companion gene that encodes a yeast membrane protein. When these genes are decoded, our designed protein is linked to the companion membrane protein. The yeast cell exports these to the cell membrane, so that our designed binder is displayed on the outside of the yeast cell, but is still tethered to the companion protein embedded in the membrane.
Although we expect the yeast cell to have lots of binders on the surface, those binders should all be identical since they came from the same gene.
Figure 1. A DNA library is a mixture with DNA genes encoding thousands of protein designs. The genes are inserted into yeast cells so that the yeast cells can decode the genes and express the designed proteins. The yeast cells export the designed proteins to the cell membrane so that they are displayed on the yeast surface.
Now we have a culture with millions and millions of yeast cells, which are displaying our library with thousands of different binder designs. Each yeast cell displays only one of the designs from the library; but there may be many identical yeast cells that each display the same design.
Fluorescence-activated cell sorting (FACS)
Now that our designed protein is displayed on the yeast surface, we tag the protein with a fluorescent molecule that emits green light. The intensity of green fluorescence corresponds to the amount of protein displayed on the yeast surface (higher intensity = more protein).
In a separate tube, our target protein (IL6R) is free-floating in solution, and we tag it with a different fluorescent molecule that emits red light.
Then we mix the free-floating target IL6R with our yeast cells. We expect the target will stick to binders that are displayed on the yeast surface. However, if one of our designed proteins does not bind the target, then no target molecules will stick to that yeast cell.
Now we'd like to measure how much target is stuck to each yeast cell. We use a microfluidics device to pass yeast cells, one at a time, in front of a sensitive photometer, which measures the intensity of green and red fluorescence in two separate measurements.
These two measurements are typically plotted as a scatter plot. Each point represents one yeast cell, where the x-axis is intensity of green fluorescence (the amount of displayed protein), and the y-axis is intensity of red fluorescence (the amount of bound target).
Figure 2. (A) Green-tagged designs are tethered to the yeast surface, while red-tagged target is free-floating. If a design successfully binds the target, then a yeast cell will have high-intensity green and red fluorescence. (B) FACS scatter plot of yeast fluorescence measurements. Each point is a yeast cell, with green fluorescence (expression) on the x-axis, and red fluorescence (binding) on the y-axis. Points in the top right corner represent cells with both red and green fluorescence, indicating good expression and binding. (Note that the colors in the plot represent point density; for example, the patch of red near the center of the plot means there are lots of overlapping points in this region.)
After taking these measurements, the cell sorter can redirect each individual yeast cell to one of two buckets (“select” or “reject”), based on their fluorescence. Normally, we are looking for cells that have strong expression (intense green) and strong binding (intense red). So we want to select the top right quadrant of the scatter plot, and reject everything else.
After sorting, we end up with a “select” bucket of all the yeast cells displaying successful binders (these were cells with intense red and green fluorescence, indicating that they express well and stick to the target).
The last step of this experiment is to figure out which proteins were displayed on those cells. There were thousands of designs in our library; which ones stick to the target?
For this, we use DNA sequencing to read the genes of everything in our “select” bucket. If we read a gene encoding one of our designs, then we know that a yeast cell displaying our design was sorted into the select bucket, and so it must have had strong red and green fluorescence.
The final output of our experiment is a list of genes that were found in the "select" bucket, and the number of times we read each gene. If our bucket contains multiple, identical yeast cells with the same gene, then we expect to see multiple reads of that gene.
Below is a preview of the data from this experiment. You can download the data for all 100 Foldit designs here.
design_id counts1 counts2 counts3 counts4 counts5 counts6 DDG SASA SC BUNS 2009432_c0003 21 0 0 0 0 0 -26.908 946.664 0.600 9 2009432_c0004 57 3 0 0 0 0 -35.443 1198.221 0.669 8 2009432_c0006 29 0 3 0 0 0 -40.365 1386.322 0.647 10 2009432_c0007 17 0 5 0 1 0 -53.948 1635.076 0.679 15 2009432_c0009 67 0 0 0 0 0 -31.730 1032.899 0.665 6 2009432_c0010 94 0 0 0 0 0 -31.894 1267.798 0.672 10 2009432_c0011 57 0 0 0 0 0 -30.796 1122.379 0.553 9 2009432_c0012 111 1 0 0 0 0 -37.067 1340.479 0.641 10 2009432_c0014 5 0 0 0 0 0 -44.323 1378.069 0.554 13 2009432_c0016 16 0 0 0 0 0 -39.257 1460.892 0.649 10 ...
In the table above, you can see that each design has six “counts” columns. These correspond to six different FACS experiments with the IL6R binder library, which we'll describe below:
- Binding at 1000 nM
- Binding at 100 nM
- Binding at 10 nM
- Binding at 1 nM
- Binding at 0.1 nM
Sorting for expression
In experiment #1, we try to measure how well the yeast can express and display our designed proteins. We don’t mix the target IL6R protein with our yeast and we don’t measure red fluorescence for binding. We only select yeast with strong green fluorescence, collecting cells that have lots of designed protein displayed on their surface.
The expression experiment is a helpful control for the later binding experiments, but it can also tell us something about how well our proteins behave. Stable, well folded proteins are easily displayed by the yeast, and these yeast will have strong green fluorescence. In contrast, unstable, poorly folded proteins are less likely to be displayed, and will show weaker fluorescence.
For many of the Foldit designs, the sequencing counts from experiment #1 are a little low. The median expression count for a design in this entire library was about 50, and only a third of the Foldit designs met this threshold. This suggests that some of these protein designs are not folding very well.
This is in line with our expectations. When Foldit players design monomer proteins from scratch, we see about a 50% success rate for good folding in the lab (50% is very good by protein design standards!). Binder design is harder than bare monomer design, because we generally have to sacrifice folding stability to optimize binding. So we should expect that <50% of binder designs will fold properly.
Sorting for binding
After selecting for expression, we can start selecting designs from our library based on binding.
This time we mix our yeast cells with red-tagged target IL6R that is free in solution. In the early experiments we mix with a high concentration of the target (1000 nM).
A binding measurement at high concentrations of target is a lenient test for binding. There are lots of target molecules floating around, so even weak binders are likely to have some target stuck to them.
After letting the yeast cells equilibrate with the target in solution, we pass the yeast through the cell sorter and measure the intensity of both red and green light. If a cell lights up for both expression and binding (in the top right quadrant), then we send it to the select bucket for sequencing.
Figure 3. FACS scatter plots. (A) The fluorescence measurements from expression experiment #1. We see two clusters of cells in the bottom left and bottom right quadrants, representing cells with poor expression and high expression, respectively. We select everything in the bottom right quadrant. Note that this experiment does not include any IL6R target, so there is no red fluorescent signal for binding (there are no cells in the top left or top right quadrants). (B) The fluorescence measurements from binding experiment #2. After incubating the yeast cells with target IL6R, we see that some cells have both green and red fluorescence (the top right quadrant). This indicates both strong expression and also strong binding.
We typically repeat the binding experiment, reducing the concentration of target each time. Binding measurements at low concentrations of target provide a stringent test for binding. At 0.1 nM target concentration, we are likely to see binder and target stuck together only if they bind very tightly.
We see very low sequencing counts for all of the Foldit designs--even at high concentration of target--which indicates zero binders. Some designs show a couple of reads in one or two of the binding experiments, but this is within the range of noise that we would expect for zero binders.
Why didn't the Foldit designs bind to the target?
These results are slightly disappointing, but we should not be too discouraged!
Although none of our Foldit designs bound to the IL6R target, we did see a few binders from the designs by IPD researchers. Below are the counts from the tightest IPD binder:
design_id counts1 counts2 counts3 counts4 counts5 counts6 DDG SASA SC BUNS IPD_design 144 38 69 56 13 52 -39.114 1720.442 0.640 9
Figure 4. An IPD-designed protein binder with exceptional binder metrics, which appears to bind IL6R. The IL6R library included thousands of proteins designed by IPD researchers with highly optimized binder metrics. Only a handful of designs successfully bound to the target.
Why did we see binding from IPD designs but not from Foldit designs? The IPD designs had exceptional binder metrics. Recall from our previous blogpost that certain metrics seem to correlate with good binding (DDG, SASA, BUNS, shape complementarity). If we rank the tested designs using these metrics, we find that this IPD design outranks all but three of our Foldit designs.
In order to design successful protein binders in Foldit, we will need to focus on these binder metrics. If we can make these metrics available in Foldit puzzles, we are confident that Foldit players will be able to optimize them just as well as IPD researchers. To that end, the Foldit team has been working to add new Objectives that can compute all of these metrics in Foldit. We should be able to release the first prototype Objectives in an update very soon!
Another important consideration here is the sheer number of IPD designs tested. The library for this experiment included thousands of IPD designs, and all of them had top-tier binding metrics like the one above. Even with those thousands of designs, we only got a few binder hits out of the library.
Unfortunately, such high failure rates are typical for protein binder experiments. We have to remember that protein design is a difficult challenge with many pitfalls, and our understanding of protein folding and binding is imperfect. To succeed in protein binder design, we will need to generate lots of designs to test.
What's next for Foldit?
The Foldit designs in this experiment came from just the first two rounds of the anti-inflammatory puzzles, back in April. Since then, we’ve seen even more great designs from Foldit players, and we’ll continue to run binder design puzzles as we work to improve the Foldit tools.
Soon Foldit will have prototype Objectives for calculating DDG, SASA, and shape complementarity. Already, it seems that players have been able to use the new BUNS Objective to improve designs in recent weeks.
We’re excited to keep pressing on the problem of protein binder design! We are used to tackling hard problems in Foldit, we tend to learn a lot about proteins in the process. We think that Foldit players have a lot to contribute in this arena, and we’ll be looking to tackle new (and harder) targets in the coming months.
Remember that we also have an experiment under way to test Foldit-designed binders for the coronavirus spike protein, and we should have results from that experiment soon. So stay tuned for more, and happy folding!( Posted by bkoep 87 723 | Tue, 06/30/2020 - 22:34 | 9 comments )