Analysis of protein binder designs

Today, the Coronavirus Binder Design: Round 3 puzzle closed, and now Foldit scientists will carry out further computational analysis to try and pick out the most promising designs!

This blog post digs into some of the analysis we do after a Foldit puzzle closes, and how we select the most promising Foldit player designs for testing in the lab.

Binder metrics

As you know, the goal of Foldit is to fold your protein to optimize the score, which consists of a base score plus any bonuses or penalties from the Objectives.

The base Foldit score comes from a sophisticated energy function which takes into account things like clashing, electrostatics, and H-bonding. This is used to compute the energy of a solution. In structure prediction puzzles, the base score is all we need to optimize, since we know that a real protein will fold into the shape with the optimal energy.

Objectives add to the base score, rewarding features of a solution that are not accounted for in our energy function. This is especially helpful in protein design puzzles, which are a bit more complicated than structure prediction. In protein design, it is not enough to simply optimize energy — we have to think about the entire energy landscape of our designed protein. We use Objectives to promote features (like a buried core) that are known to improve the energy landscape of a designed protein.

Similarly, when we design protein binders, we like to calculate additional metrics that are not in the base score but that tend to correlate with strong binding. These metrics are not currently available as Foldit Objectives (we are working on it!), so this analysis is carried out by Foldit scientists after a puzzle closes.

Note that the following binder metrics only address the interactions between two folded proteins. They assume that the designed protein will be correctly folded, which is not always a given. We run a different set of analyses (discussed previously) to predict whether the binder will fold properly. However, we already have ample evidence that Foldit players can design well-folded proteins!

Binding Energy (DDG)

This calculates how the energy of the entire system is affected by binding and best reflects the actual physics of a molecular binding interaction. A more negative DDG (or ΔΔG) indicates stronger binding.

We start by calculating the energy of both proteins in the bound state (ΔGbound), with the binder and target in contact. Then we calculate the energy of both proteins in the unbound state (ΔGunbound), with the binder and target free in solution. The DDG is the difference, or delta (Δ), between these two numbers (ΔGbound - ΔGbound). If the DDG is negative, it means that the bound state is more stable than the unbound state, so the binder should spontaneously stick to the target.

Interface Surface Area (SASA)

We also see that tight binding is correlated with the size of the binding interface. The larger the interface between two proteins, the tighter they tend to bind one another.

Our main concern here is the amount of water that is liberated from the protein surface upon binding. Normally the surface of every protein is surrounded by a “shell” of water molecules that have limited ways to make H-bonds with the protein surface. These water molecules have lower entropy than water molecules in bulk solvent. When two proteins bind together, they hide some of the protein surface that was previously exposed to shell water molecules. Those low-entropy waters are now free to diffuse into bulk solvent, thus increasing the entropy of the system and stabilizing the bound state.

For this reason, we measure the size of the interface in terms of solvent-accessible surface area, or SASA. This measures ONLY the part of the surface that is accessible to water (so small nooks and crannies are omitted). Similar to the DDG calculation above, we first measure the total SASA for the binder and target in the bound state, and then again in the unbound state. The difference in SASA between the bound and unbound states is proportional to the amount of water that is freed when the binder and target come into contact.

Shape Complementarity (SC)

Shape complementarity (SC) measures how well two objects fit together. A glove, for example, has very high shape complementarity for a hand. If two proteins have complementary shapes (SC approaching 1.0), then they will fit together snuggly, making close packing interactions and efficiently displacing surface water molecules.

We measure the SC of two proteins by comparing their surface contours along the interface (as defined in this 1993 paper). Mathematically speaking, we consider a vector that is perpendicular to the surface of the binder, and a corresponding vector at the surface of the target. If these two vectors point in the same direction, then the surface contours of binder and target are similar at this region. By comparing vector pairs spread across the interface, we arrive at a single number describing how well the shape of the binder fits against the shape of the target.

Shape complementarity. The upper part of this interface has a high shape complementarity, and corresponding pairs of vectors (like a and a') point in the same direction. The lower part of this interface has low shape complementarity; vector pairs in this region (like c and c') point in different directions.

Buried Unsatisfied Polar Atoms (BUNS)

Polar atoms like oxygens and nitrogens are most stable when they make hydrogen bonds, either with the water surrounding the protein, or with other polar atoms in the protein. If the interface between binder and target has polar atoms that cannot make hydrogen bonds, then binding is very unlikely.

We recently devoted an entire blog post just about BUNS, so we won’t go into the details here. The important thing is that all polar atoms at the binding interface should make hydrogen bonds!

Binders against SARS-CoV-2 spike protein

In rounds one and two of the Coronavirus Binder Design challenge, Foldit players came up with thousands of solutions that achieve high scores within Foldit. This means they already have highly optimized energies and satisfy our protein design Objectives.

We’ve been calculating the binding metrics described above for those designs to see which ones are most likely to actually bind the target. Since we have a high-resolution crystal structure of the CoV spike protein target bound to the human ACE2 receptor, we can also calculate these binder metrics for the natural ACE2 interface.

Below is one exceptional design by a Foldit player stomjoh, from 1808: Coronavirus Binder Design: Round 2, that scores well in all of our binder metrics!

This is an excellent binder design! Compared to the natural ACE2 receptor, this design is predicted to bind even more tightly, with a DDG of -45.0 kcal/mol! This interface has a slightly smaller surface area than ACE2, but 1794 Å is still impressive. The natural ACE2 interface has a very high shape complementarity score of 0.73, but this Foldit player design is able to match it! And finally, we see that this design has fewer unsatisfied polar atoms at the interface, which should also work in our favor.

We’d like to caution readers that, even with these metrics, we are still not very good at predicting binders. Protein binder design is a very hard problem — one at the forefront of computational biology — and there are other physical factors that are difficult to account for. Even if our metrics look good on paper or on a computer, only laboratory testing will tell us whether these designer proteins actually fold and bind to the target.

Now that the Round 3 puzzle has closed, we will calculate binder metrics for those results as well. Then we will order genes for the best designs so that we can test them in the lab for binding! Meanwhile, check out the new newer newest Coronavirus Binder Design: Round 4 puzzle, online now!

IMPORTANT: Please fill out the Foldit usernames and data analysis form, if you have not already! Out of concern for players’ privacy, we will not share the Foldit usernames associated with tested designs unless those players have given us permission in the form.

( Posted by  bkoep 166 3250  |  Thu, 03/19/2020 - 23:22  |  8 comments )
spvincent's picture
User offline. Last seen 11 hours 48 min ago. Offline
Joined: 12/07/2007
Groups: Contenders
Might this solution be a

Might this solution be a potential candidate for an "all hands" puzzle, as suggested by O Seki To in this forum post?

bkoep's picture
User offline. Last seen 17 hours 50 min ago. Offline
Joined: 11/15/2012
Groups: None
Good suggestion!

We've thought about that, but I don't actually think that would lead to better results. In short, we're currently limited by quantity of solutions—not quality.

It seems that individual players and teams are perfectly capable of coming up with valid models that cover wide diversity of structures (which is important). I'm afraid that in an "all-hands" puzzle, many players would spend their efforts refining models that don't need it, and in the end we would sacrifice diversity of models.

There's a big question about how important "late-game" optimization is in Foldit. You can always grind away at optimizing a solution and squeaking out a few more points (with diminishing returns, of course)—but at a certain point, this optimization doesn't actually improve the scientific validity of the solution. From a scientific standpoint, we think players' time would be better spent developing another solution from scratch. This is why we introduced the Sketchbook puzzles, that require players to restart the puzzle after hitting a move limit.

Susume's picture
User offline. Last seen 7 hours 18 min ago. Offline
Joined: 10/02/2011
How is surface calculated?

Is the surface calculated as a mesh of polygons? Intersecting spheres? Something else?

Is it based on all atoms? Heavy atoms? Sidechain lengths?

bkoep's picture
User offline. Last seen 17 hours 50 min ago. Offline
Joined: 11/15/2012
Groups: None
Great question!

Basically, we model all atoms as intersecting spheres and calculate their accessible surface analytically. This is pretty slow to do, and is one of the main reasons we haven't rolled out a SASA Objective in Foldit (yet). There are faster alternatives that use a voxel grid to approximate the protein surface, but these are numerically unstable and could be easily exploited by a clever script.

The citation for the intersecting spheres software we use (Dalphaball) is here. However, I recommend this review by the same authors, which gives some more background and is easier to read (but somewhat longer).

Joined: 09/24/2012
Groups: Go Science
A suggestion

You could implement an optional tool to calculate SASA on demand. It would then not count in the scoring system, neither slow down our computational capacity. I would find it interesting in order to guide me to select between different starting designs before to rise their score by recipes.

As you wrote in the landscape blog, players don't hesitate to loose points in order to (visually) select a better pattern. Any tool that could help us to select the best science design would be susefull. It happens that we hesitate between different possible strategies (like putting an helix on one side or another of sheets, or to additionally bind on one side or another of a central binding helix). Visual intuition is not enough, even with the best visual views. And bonusses are slow to compute.

I would be interested to invest in a 30 minutes calculation of some additional complex metrics in order to help me choose where I put most CPU resources.

Personally, I use to work on different tracks in parallel. The best scoring track + additional parallel tracks based on "intuition" of promising designs. I do the same when choosing a share to evolve, at start to mid game: between the shares by other players, I select the most "beautifull" one and I might invest several days in evolving it even if the group top design currently has much more points.

Susume's picture
User offline. Last seen 7 hours 18 min ago. Offline
Joined: 10/02/2011
Great idea

I love this idea. I would absolutely run a SASA calculation that takes several minutes just a few times during the course of a puzzle, to see how well I am matching the target molecule.

Joined: 04/15/2020
Groups: None
Foldit logic

I don't have a great understanding of how the game works. But in theory, if you're able to create a game that tests combinations against a virus to determine if it's a viable antiviral, then I would think you would be able to create an algorithm that automatically generates all possible combinations and runs a comparison to check for viable combinations. From a development standpoint it sounds like you have all the actual information you would need to create the logic, but i would guess the problem is the amount of computing required to run all possible combinations and comparisons would be astronomical. This would be a great project to try and run against quantum processors.

Joined: 05/23/2020
Groups: None
similarly to what

similarly to what thanrahan1515 said. I was wondering if you could implement a reinforcement machine learning nn in the likes of alpha zero to solve puzzles like these (since they come in the form of a game with all the variables given) instead of brute forcing.

Download links:
  Windows    OSX    Linux  
(10.12 or later)

Are you new to Foldit? Click here.

Are you a student? Click here.

Are you an educator? Click here.
Other Games: Mozak
Only search
Recommend Foldit
User login
Top New Users

Developed by: UW Center for Game Science, UW Institute for Protein Design, Northeastern University, Vanderbilt University Meiler Lab, UC Davis
Supported by: DARPA, NSF, NIH, HHMI, Amazon, Microsoft, Adobe, RosettaCommons