Analysis of protein binder designs
Today, the Coronavirus Binder Design: Round 3 puzzle closed, and now Foldit scientists will carry out further computational analysis to try and pick out the most promising designs!
This blog post digs into some of the analysis we do after a Foldit puzzle closes, and how we select the most promising Foldit player designs for testing in the lab.
As you know, the goal of Foldit is to fold your protein to optimize the score, which consists of a base score plus any bonuses or penalties from the Objectives.
The base Foldit score comes from a sophisticated energy function which takes into account things like clashing, electrostatics, and H-bonding. This is used to compute the energy of a solution. In structure prediction puzzles, the base score is all we need to optimize, since we know that a real protein will fold into the shape with the optimal energy.
Objectives add to the base score, rewarding features of a solution that are not accounted for in our energy function. This is especially helpful in protein design puzzles, which are a bit more complicated than structure prediction. In protein design, it is not enough to simply optimize energy — we have to think about the entire energy landscape of our designed protein. We use Objectives to promote features (like a buried core) that are known to improve the energy landscape of a designed protein.
Similarly, when we design protein binders, we like to calculate additional metrics that are not in the base score but that tend to correlate with strong binding. These metrics are not currently available as Foldit Objectives (we are working on it!), so this analysis is carried out by Foldit scientists after a puzzle closes.
Note that the following binder metrics only address the interactions between two folded proteins. They assume that the designed protein will be correctly folded, which is not always a given. We run a different set of analyses (discussed previously) to predict whether the binder will fold properly. However, we already have ample evidence that Foldit players can design well-folded proteins!
Binding Energy (DDG)
This calculates how the energy of the entire system is affected by binding and best reflects the actual physics of a molecular binding interaction. A more negative DDG (or ΔΔG) indicates stronger binding.
We start by calculating the energy of both proteins in the bound state (ΔGbound), with the binder and target in contact. Then we calculate the energy of both proteins in the unbound state (ΔGunbound), with the binder and target free in solution. The DDG is the difference, or delta (Δ), between these two numbers (ΔGbound - ΔGunbound). If the DDG is negative, it means that the bound state is more stable than the unbound state, so the binder should spontaneously stick to the target.
Interface Surface Area (SASA)
We also see that tight binding is correlated with the size of the binding interface. The larger the interface between two proteins, the tighter they tend to bind one another.
Our main concern here is the amount of water that is liberated from the protein surface upon binding. Normally the surface of every protein is surrounded by a “shell” of water molecules that have limited ways to make H-bonds with the protein surface. These water molecules have lower entropy than water molecules in bulk solvent. When two proteins bind together, they hide some of the protein surface that was previously exposed to shell water molecules. Those low-entropy waters are now free to diffuse into bulk solvent, thus increasing the entropy of the system and stabilizing the bound state.
For this reason, we measure the size of the interface in terms of solvent-accessible surface area, or SASA. This measures ONLY the part of the surface that is accessible to water (so small nooks and crannies are omitted). Similar to the DDG calculation above, we first measure the total SASA for the binder and target in the bound state, and then again in the unbound state. The difference in SASA between the bound and unbound states is proportional to the amount of water that is freed when the binder and target come into contact.
Shape Complementarity (SC)
Shape complementarity (SC) measures how well two objects fit together. A glove, for example, has very high shape complementarity for a hand. If two proteins have complementary shapes (SC approaching 1.0), then they will fit together snuggly, making close packing interactions and efficiently displacing surface water molecules.
We measure the SC of two proteins by comparing their surface contours along the interface (as defined in this 1993 paper). Mathematically speaking, we consider a vector that is perpendicular to the surface of the binder, and a corresponding vector at the surface of the target. If these two vectors point in the same direction, then the surface contours of binder and target are similar at this region. By comparing vector pairs spread across the interface, we arrive at a single number describing how well the shape of the binder fits against the shape of the target.
Shape complementarity. The upper part of this interface has a high shape complementarity, and corresponding pairs of vectors (like a and a') point in the same direction. The lower part of this interface has low shape complementarity; vector pairs in this region (like c and c') point in different directions.
Buried Unsatisfied Polar Atoms (BUNS)
Polar atoms like oxygens and nitrogens are most stable when they make hydrogen bonds, either with the water surrounding the protein, or with other polar atoms in the protein. If the interface between binder and target has polar atoms that cannot make hydrogen bonds, then binding is very unlikely.
We recently devoted an entire blog post just about BUNS, so we won’t go into the details here. The important thing is that all polar atoms at the binding interface should make hydrogen bonds!
Binders against SARS-CoV-2 spike protein
In rounds one and two of the Coronavirus Binder Design challenge, Foldit players came up with thousands of solutions that achieve high scores within Foldit. This means they already have highly optimized energies and satisfy our protein design Objectives.
We’ve been calculating the binding metrics described above for those designs to see which ones are most likely to actually bind the target. Since we have a high-resolution crystal structure of the CoV spike protein target bound to the human ACE2 receptor, we can also calculate these binder metrics for the natural ACE2 interface.
This is an excellent binder design! Compared to the natural ACE2 receptor, this design is predicted to bind even more tightly, with a DDG of -45.0 kcal/mol! This interface has a slightly smaller surface area than ACE2, but 1794 Å is still impressive. The natural ACE2 interface has a very high shape complementarity score of 0.73, but this Foldit player design is able to match it! And finally, we see that this design has fewer unsatisfied polar atoms at the interface, which should also work in our favor.
We’d like to caution readers that, even with these metrics, we are still not very good at predicting binders. Protein binder design is a very hard problem — one at the forefront of computational biology — and there are other physical factors that are difficult to account for. Even if our metrics look good on paper or on a computer, only laboratory testing will tell us whether these designer proteins actually fold and bind to the target.
Now that the Round 3 puzzle has closed, we will calculate binder metrics for those results as well. Then we will order genes for the best designs so that we can test them in the lab for binding! Meanwhile, check out the
new newer newest Coronavirus Binder Design: Round 4 puzzle, online now!
IMPORTANT: Please fill out the Foldit usernames and data analysis form, if you have not already! Out of concern for players’ privacy, we will not share the Foldit usernames associated with tested designs unless those players have given us permission in the form.( Posted by bkoep 70 476 | Thu, 03/19/2020 - 23:22 | 19 comments )