Experiment results for IL6R binders
The results from our IL6R binder experiment are back! This experiment tested 100 Foldit designs from the first two rounds of our Coronavirus Anti-inflammatory puzzles, to see if any of them bind to the IL6R target.
In short, we did not see any successful binding from the Foldit designs. This is unfortunate, but we should not be too discouraged! Read on for more details about the experiment, and what these results mean for Foldit (hint: more binder design puzzles!).
This is a long blog post, broken into a few different sections. First, we’ll explain some background about DNA libraries and fluorescence-activated cell sorting techniques that were used for this experiment. Then we’ll go over the experiment results for protein expression and target binding. Finally, we’ll close out with some discussion about these results, and thoughts about what’s next for Foldit.
In order to test lots of proteins at once, we order a custom DNA library. A DNA library is a mixed pool containing thousands of different DNA genes that encode our designed proteins.
In this experiment, the library includes genes for 100 Foldit player designs and thousands of designs from IPD researchers. All of these designs are intended to bind to the IL6R target.
We insert this mixture of genes into a yeast culture so that each yeast cell gets a gene for just one binder design.
We insert our designed gene alongside a companion gene that encodes a yeast membrane protein. When these genes are decoded, our designed protein is linked to the companion membrane protein. The yeast cell exports these to the cell membrane, so that our designed binder is displayed on the outside of the yeast cell, but is still tethered to the companion protein embedded in the membrane.
Although we expect the yeast cell to have lots of binders on the surface, those binders should all be identical since they came from the same gene.
Figure 1. A DNA library is a mixture with DNA genes encoding thousands of protein designs. The genes are inserted into yeast cells so that the yeast cells can decode the genes and express the designed proteins. The yeast cells export the designed proteins to the cell membrane so that they are displayed on the yeast surface.
Now we have a culture with millions and millions of yeast cells, which are displaying our library with thousands of different binder designs. Each yeast cell displays only one of the designs from the library; but there may be many identical yeast cells that each display the same design.
Fluorescence-activated cell sorting (FACS)
Now that our designed protein is displayed on the yeast surface, we tag the protein with a fluorescent molecule that emits green light. The intensity of green fluorescence corresponds to the amount of protein displayed on the yeast surface (higher intensity = more protein).
In a separate tube, our target protein (IL6R) is free-floating in solution, and we tag it with a different fluorescent molecule that emits red light.
Then we mix the free-floating target IL6R with our yeast cells. We expect the target will stick to binders that are displayed on the yeast surface. However, if one of our designed proteins does not bind the target, then no target molecules will stick to that yeast cell.
Now we'd like to measure how much target is stuck to each yeast cell. We use a microfluidics device to pass yeast cells, one at a time, in front of a sensitive photometer, which measures the intensity of green and red fluorescence in two separate measurements.
These two measurements are typically plotted as a scatter plot. Each point represents one yeast cell, where the x-axis is intensity of green fluorescence (the amount of displayed protein), and the y-axis is intensity of red fluorescence (the amount of bound target).
Figure 2. (A) Green-tagged designs are tethered to the yeast surface, while red-tagged target is free-floating. If a design successfully binds the target, then a yeast cell will have high-intensity green and red fluorescence. (B) FACS scatter plot of yeast fluorescence measurements. Each point is a yeast cell, with green fluorescence (expression) on the x-axis, and red fluorescence (binding) on the y-axis. Points in the top right corner represent cells with both red and green fluorescence, indicating good expression and binding. (Note that the colors in the plot represent point density; for example, the patch of red near the center of the plot means there are lots of overlapping points in this region.)
After taking these measurements, the cell sorter can redirect each individual yeast cell to one of two buckets (“select” or “reject”), based on their fluorescence. Normally, we are looking for cells that have strong expression (intense green) and strong binding (intense red). So we want to select the top right quadrant of the scatter plot, and reject everything else.
After sorting, we end up with a “select” bucket of all the yeast cells displaying successful binders (these were cells with intense red and green fluorescence, indicating that they express well and stick to the target).
The last step of this experiment is to figure out which proteins were displayed on those cells. There were thousands of designs in our library; which ones stick to the target?
For this, we use DNA sequencing to read the genes of everything in our “select” bucket. If we read a gene encoding one of our designs, then we know that a yeast cell displaying our design was sorted into the select bucket, and so it must have had strong red and green fluorescence.
The final output of our experiment is a list of genes that were found in the "select" bucket, and the number of times we read each gene. If our bucket contains multiple, identical yeast cells with the same gene, then we expect to see multiple reads of that gene.
Below is a preview of the data from this experiment. You can download the data for all 100 Foldit designs here.
design_id counts1 counts2 counts3 counts4 counts5 counts6 DDG SASA SC BUNS 2009432_c0003 21 0 0 0 0 0 -26.908 946.664 0.600 9 2009432_c0004 57 3 0 0 0 0 -35.443 1198.221 0.669 8 2009432_c0006 29 0 3 0 0 0 -40.365 1386.322 0.647 10 2009432_c0007 17 0 5 0 1 0 -53.948 1635.076 0.679 15 2009432_c0009 67 0 0 0 0 0 -31.730 1032.899 0.665 6 2009432_c0010 94 0 0 0 0 0 -31.894 1267.798 0.672 10 2009432_c0011 57 0 0 0 0 0 -30.796 1122.379 0.553 9 2009432_c0012 111 1 0 0 0 0 -37.067 1340.479 0.641 10 2009432_c0014 5 0 0 0 0 0 -44.323 1378.069 0.554 13 2009432_c0016 16 0 0 0 0 0 -39.257 1460.892 0.649 10 ...
In the table above, you can see that each design has six “counts” columns. These correspond to six different FACS experiments with the IL6R binder library, which we'll describe below:
- Binding at 1000 nM
- Binding at 100 nM
- Binding at 10 nM
- Binding at 1 nM
- Binding at 0.1 nM
Sorting for expression
In experiment #1, we try to measure how well the yeast can express and display our designed proteins. We don’t mix the target IL6R protein with our yeast and we don’t measure red fluorescence for binding. We only select yeast with strong green fluorescence, collecting cells that have lots of designed protein displayed on their surface.
The expression experiment is a helpful control for the later binding experiments, but it can also tell us something about how well our proteins behave. Stable, well folded proteins are easily displayed by the yeast, and these yeast will have strong green fluorescence. In contrast, unstable, poorly folded proteins are less likely to be displayed, and will show weaker fluorescence.
For many of the Foldit designs, the sequencing counts from experiment #1 are a little low. The median expression count for a design in this entire library was about 50, and only a third of the Foldit designs met this threshold. This suggests that some of these protein designs are not folding very well.
This is in line with our expectations. When Foldit players design monomer proteins from scratch, we see about a 50% success rate for good folding in the lab (50% is very good by protein design standards!). Binder design is harder than bare monomer design, because we generally have to sacrifice folding stability to optimize binding. So we should expect that <50% of binder designs will fold properly.
Sorting for binding
After selecting for expression, we can start selecting designs from our library based on binding.
This time we mix our yeast cells with red-tagged target IL6R that is free in solution. In the early experiments we mix with a high concentration of the target (1000 nM).
A binding measurement at high concentrations of target is a lenient test for binding. There are lots of target molecules floating around, so even weak binders are likely to have some target stuck to them.
After letting the yeast cells equilibrate with the target in solution, we pass the yeast through the cell sorter and measure the intensity of both red and green light. If a cell lights up for both expression and binding (in the top right quadrant), then we send it to the select bucket for sequencing.
Figure 3. FACS scatter plots. (A) The fluorescence measurements from expression experiment #1. We see two clusters of cells in the bottom left and bottom right quadrants, representing cells with poor expression and high expression, respectively. We select everything in the bottom right quadrant. Note that this experiment does not include any IL6R target, so there is no red fluorescent signal for binding (there are no cells in the top left or top right quadrants). (B) The fluorescence measurements from binding experiment #2. After incubating the yeast cells with target IL6R, we see that some cells have both green and red fluorescence (the top right quadrant). This indicates both strong expression and also strong binding.
We typically repeat the binding experiment, reducing the concentration of target each time. Binding measurements at low concentrations of target provide a stringent test for binding. At 0.1 nM target concentration, we are likely to see binder and target stuck together only if they bind very tightly.
We see very low sequencing counts for all of the Foldit designs--even at high concentration of target--which indicates zero binders. Some designs show a couple of reads in one or two of the binding experiments, but this is within the range of noise that we would expect for zero binders.
Why didn't the Foldit designs bind to the target?
These results are slightly disappointing, but we should not be too discouraged!
Although none of our Foldit designs bound to the IL6R target, we did see a few binders from the designs by IPD researchers. Below are the counts from the tightest IPD binder:
design_id counts1 counts2 counts3 counts4 counts5 counts6 DDG SASA SC BUNS IPD_design 144 38 69 56 13 52 -39.114 1720.442 0.640 9
Figure 4. An IPD-designed protein binder with exceptional binder metrics, which appears to bind IL6R. The IL6R library included thousands of proteins designed by IPD researchers with highly optimized binder metrics. Only a handful of designs successfully bound to the target.
Why did we see binding from IPD designs but not from Foldit designs? The IPD designs had exceptional binder metrics. Recall from our previous blogpost that certain metrics seem to correlate with good binding (DDG, SASA, BUNS, shape complementarity). If we rank the tested designs using these metrics, we find that this IPD design outranks all but three of our Foldit designs.
In order to design successful protein binders in Foldit, we will need to focus on these binder metrics. If we can make these metrics available in Foldit puzzles, we are confident that Foldit players will be able to optimize them just as well as IPD researchers. To that end, the Foldit team has been working to add new Objectives that can compute all of these metrics in Foldit. We should be able to release the first prototype Objectives in an update very soon!
Another important consideration here is the sheer number of IPD designs tested. The library for this experiment included thousands of IPD designs, and all of them had top-tier binding metrics like the one above. Even with those thousands of designs, we only got a few binder hits out of the library.
Unfortunately, such high failure rates are typical for protein binder experiments. We have to remember that protein design is a difficult challenge with many pitfalls, and our understanding of protein folding and binding is imperfect. To succeed in protein binder design, we will need to generate lots of designs to test.
What's next for Foldit?
The Foldit designs in this experiment came from just the first two rounds of the anti-inflammatory puzzles, back in April. Since then, we’ve seen even more great designs from Foldit players, and we’ll continue to run binder design puzzles as we work to improve the Foldit tools.
Soon Foldit will have prototype Objectives for calculating DDG, SASA, and shape complementarity. Already, it seems that players have been able to use the new BUNS Objective to improve designs in recent weeks.
We’re excited to keep pressing on the problem of protein binder design! We are used to tackling hard problems in Foldit, we tend to learn a lot about proteins in the process. We think that Foldit players have a lot to contribute in this arena, and we’ll be looking to tackle new (and harder) targets in the coming months.
Remember that we also have an experiment under way to test Foldit-designed binders for the coronavirus spike protein, and we should have results from that experiment soon. So stay tuned for more, and happy folding!( Posted by bkoep 83 833 | Tue, 06/30/2020 - 22:34 | 9 comments )
Reaction Design Tool
The new Reaction Design Tool is live, and we have a puzzle headed your way! First a little bit about the tool itself. You all have been amazing in the realm of protein design, and now its time to step into the world of small molecule design. One approach in small molecule design is to modify or individually place each atom. This is a great approach, but it can have some short comings like the creation of chemically infeasible molecules, and the last thing we want is to create a wonderfully scoring small molecule that wouldn’t be possible in the real world, or worse would explode! So, the way around this is to use a reaction-based approach. With this approach you will be given fragments of a small molecule and its up to you to find the best way to combine them. The great thing about these fragments, or reactants as they will be known in the game, is that they are already determined to be synthesizable. Meaning that the small molecules you create can be produced in a lab, and possibly used for therapeutics.
The layout of the tool is in three major parts. First, at the top of the tool is the Reaction Panel. This panel allows you to choose the base of your new small molecule, or ligand as it will be known in game. These reaction options are the center of you new ligand. The reactions are surrounded by black spheres. Let’s call these linking atoms. These linking atoms are simply there to denote where your chosen fragments will connect to your reaction base. Note: linking atoms will mostly appear this way, but not always. The second major part of the tool is the Reactant Panel. The reactant panel is where each of the fragments are stored. In some puzzles you will only have one reactant to choose from, while in others you will be able to combine two or three. These reactants also display linking atoms so this way you can see how the reactant you choose will connect with the reaction base. The last part of the tool is the Accept Button. The accept button allows you to realize your creation in the context of the protein. Once you have selected your reaction base and your reactants, you will notice that not only is your ligand glowing blue, but it is now in the shape of the ligand you are trying to create. Once you are satisfied, click the accept button, and you will have created your new ligand! To get the best results you will need to optimize your newly created ligand, just like you would optimize a rebuilt/remixed loop. Wiggle, shake, and move your ligand to discover if it really is the best design for the protein.
Here are some tips to get you started. Just like the Genie from Aladdin, ligands have “phenomenal cosmic power!” Ok well maybe not, but they are quite powerful when it comes to their inactions with proteins. However, just like the Genie they have itty bitty living spaces. This living space is known as the activity pocket. You will need to design a ligand that best fits in this activity pocket. One way to do this is to look for hydrogen bonding. Hydrogen bonds help the ligand bind to the protein and therefore are immensely important. If after wiggling, the ligand gets pushed out of the pocket, or if it appears to be bending and stretching in odd ways, try lowering the wiggle power. A little nudge can go a long way. Also, the Reaction Design Tool tries its best to fit your newly designed ligand to its starting structure. This means that a different starting structure could produce a better resulting ligand, because it is oriented differently in the activity pocket.
We really hope you all enjoy working with this new tool and are extremely excited to see what you all come up with. Expect more small molecule/ligand design puzzles in the future.
Be sure to check out the new Reaction Design tutorial in the Campaign Menu.
Happy folding everyone!
( Posted by jtscott 83 2324 | Wed, 06/24/2020 - 20:03 | 7 comments )
Anti-inflammatory designs queued for testing!
In our last blog post, we announced a similar experiment for binders to the coronavirus spike protein. (We had some issues getting the necessary materials for that experiment, but we've come up with a workaround and that experiment is back on track!) These latest designs will be tested using the same kind of experiment, using yeast display and flow cytometry techniques, but swapping in the IL6R target instead of spike protein. See our previous YouTube video for more.
IL6R is a protein found on human immune cells, and plays a role in the "cytokine storm" that can cause dangerous inflammation in severe cases of COVID-19. A protein that binds to IL6R might be useful as a drug to temper this inflammation. We'll be testing these 100 Foldit player-designed proteins to see if they bind to the IL6R in a controlled lab setting. It will be several weeks before the experiment results come in, but we'll continue running more anti-inflammatory puzzles to try and develop even better designs.
2009432_c0004 ZeroLeak7, puxatudo, Phyx, PLAYER_21, w1seguy, Bruno Kestemont
2009432_c0009 PLAYER_5, TheGUmmer
2009432_c0012 CharlieFortsConscience, Bletchley Park, georg137, spvincent
2009432_c0016 Bruno Kestemont
2009432_c0030 Steven Pletsch, AntiVaccine
2009432_c0034 Crossed Sticks
2009432_c0096 silent gene
2009432_c0101 PLAYER_4, spdenne
2009432_y6445 silent gene
2009565_c0001 ZeroLeak7, Bruno Kestemont, mirp, RockOn
2009565_c0002 Mike Lewis, Formula350, Skippysk8s, actiasluna, Joanna_H, Jpilkington
2009565_c0025 Scopper, NinjaGreg, RockOn
2009565_c0063 Mike Lewis
2009565_c0067 Bletchley Park
2009565_c0074 actiasluna, Formula350
2009565_c0075 Mike Lewis, Joanna_H, Jpilkington, ManVsYard
2009565_c0115 actiasluna, Formula350
2009565_y1631 silent gene
Coronavirus binder designs queued for testing!
After the first three rounds of our Coronavirus Binder Design challenge, we've selected 99 of the most promising Foldit player solutions for experimental testing!
Once a Foldit puzzle closes, we run some further analysis to figure out which designs are the most likely to fold and bind to the target. You can read more about some of that analysis on our previous blog post. To select promising designs, we consider Foldit score in addition to metrics that correlate with proper folding and others that correlate with binding.
We've combined those metrics to choose 33 designs from each of rounds one, two, and three of the Coronavirus Binder Design challenge. In total, 99 Foldit binder designs will be tested at the UW Institute for Protein Design, with the same experiments that have already begun for computationally-designed binders.
It will be a few more weeks before genes arrive and we can begin experiments on the Foldit designs. In the meantime, we'll continue to work on designing better binders in Foldit, so stay tuned for more puzzles! Be sure to review our tips for designing successful binders and watch coronavirus expert Lexi Walls, Ph.D. discuss early Foldit designs!
Below are the 99 designed proteins that we'll test for binding to the SARS-CoV-2 spike protein (click to view the full-size image). Remember to fill out our username sharing form if you want to see your username in Foldit updates!
2008926_c0069 Galaxie, robgee, alwen
2008926_c0071 silent gene
2008926_c0193 silent gene
2008984_c0002 Caraline_nelson, Phyx, mirp, PLAYER_17, jeff101, silent gene
2008984_c0003 Bletchley Park, spvincent
2008984_c0036 silent gene, edpalas
2008984_c0046 Steven Pletsch, PLAYER_18
2008984_c0058 PLAYER_12, frood66
2008984_c0239 PLAYER_15, frood66
2008984_y9747 silent gene
2008984_y9800 silent gene
2009030_c0002 Bletchley Park, PLAYER_5, georg137, spvincent
2009030_c0005 Bletchley Park
2009030_c0009 Steven Pletsch, PLAYER_18
2009030_c0020 Galaxie, jamiexq
2009030_c0049 actiasluna, Jpilkington, ManVsYard
2009030_c0073 Crossed Sticks
2009030_c0105 Steven Pletsch
2009030_y0378 silent gene
2009030_y3873 PLAYER_10, PLAYER_17, Bruno Kestemont
2009030_y5708 Bruno Kestemont
Analysis of protein binder designs
Today, the Coronavirus Binder Design: Round 3 puzzle closed, and now Foldit scientists will carry out further computational analysis to try and pick out the most promising designs!
This blog post digs into some of the analysis we do after a Foldit puzzle closes, and how we select the most promising Foldit player designs for testing in the lab.
As you know, the goal of Foldit is to fold your protein to optimize the score, which consists of a base score plus any bonuses or penalties from the Objectives.
The base Foldit score comes from a sophisticated energy function which takes into account things like clashing, electrostatics, and H-bonding. This is used to compute the energy of a solution. In structure prediction puzzles, the base score is all we need to optimize, since we know that a real protein will fold into the shape with the optimal energy.
Objectives add to the base score, rewarding features of a solution that are not accounted for in our energy function. This is especially helpful in protein design puzzles, which are a bit more complicated than structure prediction. In protein design, it is not enough to simply optimize energy — we have to think about the entire energy landscape of our designed protein. We use Objectives to promote features (like a buried core) that are known to improve the energy landscape of a designed protein.
Similarly, when we design protein binders, we like to calculate additional metrics that are not in the base score but that tend to correlate with strong binding. These metrics are not currently available as Foldit Objectives (we are working on it!), so this analysis is carried out by Foldit scientists after a puzzle closes.
Note that the following binder metrics only address the interactions between two folded proteins. They assume that the designed protein will be correctly folded, which is not always a given. We run a different set of analyses (discussed previously) to predict whether the binder will fold properly. However, we already have ample evidence that Foldit players can design well-folded proteins!
Binding Energy (DDG)
This calculates how the energy of the entire system is affected by binding and best reflects the actual physics of a molecular binding interaction. A more negative DDG (or ΔΔG) indicates stronger binding.
We start by calculating the energy of both proteins in the bound state (ΔGbound), with the binder and target in contact. Then we calculate the energy of both proteins in the unbound state (ΔGunbound), with the binder and target free in solution. The DDG is the difference, or delta (Δ), between these two numbers (ΔGbound - ΔGunbound). If the DDG is negative, it means that the bound state is more stable than the unbound state, so the binder should spontaneously stick to the target.
Interface Surface Area (SASA)
We also see that tight binding is correlated with the size of the binding interface. The larger the interface between two proteins, the tighter they tend to bind one another.
Our main concern here is the amount of water that is liberated from the protein surface upon binding. Normally the surface of every protein is surrounded by a “shell” of water molecules that have limited ways to make H-bonds with the protein surface. These water molecules have lower entropy than water molecules in bulk solvent. When two proteins bind together, they hide some of the protein surface that was previously exposed to shell water molecules. Those low-entropy waters are now free to diffuse into bulk solvent, thus increasing the entropy of the system and stabilizing the bound state.
For this reason, we measure the size of the interface in terms of solvent-accessible surface area, or SASA. This measures ONLY the part of the surface that is accessible to water (so small nooks and crannies are omitted). Similar to the DDG calculation above, we first measure the total SASA for the binder and target in the bound state, and then again in the unbound state. The difference in SASA between the bound and unbound states is proportional to the amount of water that is freed when the binder and target come into contact.
Shape Complementarity (SC)
Shape complementarity (SC) measures how well two objects fit together. A glove, for example, has very high shape complementarity for a hand. If two proteins have complementary shapes (SC approaching 1.0), then they will fit together snuggly, making close packing interactions and efficiently displacing surface water molecules.
We measure the SC of two proteins by comparing their surface contours along the interface (as defined in this 1993 paper). Mathematically speaking, we consider a vector that is perpendicular to the surface of the binder, and a corresponding vector at the surface of the target. If these two vectors point in the same direction, then the surface contours of binder and target are similar at this region. By comparing vector pairs spread across the interface, we arrive at a single number describing how well the shape of the binder fits against the shape of the target.
Shape complementarity. The upper part of this interface has a high shape complementarity, and corresponding pairs of vectors (like a and a') point in the same direction. The lower part of this interface has low shape complementarity; vector pairs in this region (like c and c') point in different directions.
Buried Unsatisfied Polar Atoms (BUNS)
Polar atoms like oxygens and nitrogens are most stable when they make hydrogen bonds, either with the water surrounding the protein, or with other polar atoms in the protein. If the interface between binder and target has polar atoms that cannot make hydrogen bonds, then binding is very unlikely.
We recently devoted an entire blog post just about BUNS, so we won’t go into the details here. The important thing is that all polar atoms at the binding interface should make hydrogen bonds!
Binders against SARS-CoV-2 spike protein
In rounds one and two of the Coronavirus Binder Design challenge, Foldit players came up with thousands of solutions that achieve high scores within Foldit. This means they already have highly optimized energies and satisfy our protein design Objectives.
We’ve been calculating the binding metrics described above for those designs to see which ones are most likely to actually bind the target. Since we have a high-resolution crystal structure of the CoV spike protein target bound to the human ACE2 receptor, we can also calculate these binder metrics for the natural ACE2 interface.
This is an excellent binder design! Compared to the natural ACE2 receptor, this design is predicted to bind even more tightly, with a DDG of -45.0 kcal/mol! This interface has a slightly smaller surface area than ACE2, but 1794 Å is still impressive. The natural ACE2 interface has a very high shape complementarity score of 0.73, but this Foldit player design is able to match it! And finally, we see that this design has fewer unsatisfied polar atoms at the interface, which should also work in our favor.
We’d like to caution readers that, even with these metrics, we are still not very good at predicting binders. Protein binder design is a very hard problem — one at the forefront of computational biology — and there are other physical factors that are difficult to account for. Even if our metrics look good on paper or on a computer, only laboratory testing will tell us whether these designer proteins actually fold and bind to the target.
Now that the Round 3 puzzle has closed, we will calculate binder metrics for those results as well. Then we will order genes for the best designs so that we can test them in the lab for binding! Meanwhile, check out the
new newer newest Coronavirus Binder Design: Round 4 puzzle, online now!
IMPORTANT: Please fill out the Foldit usernames and data analysis form, if you have not already! Out of concern for players’ privacy, we will not share the Foldit usernames associated with tested designs unless those players have given us permission in the form.( Posted by bkoep 83 833 | Thu, 03/19/2020 - 23:22 | 17 comments )