Analysis of protein binder designs

Today, the Coronavirus Binder Design: Round 3 puzzle closed, and now Foldit scientists will carry out further computational analysis to try and pick out the most promising designs!

This blog post digs into some of the analysis we do after a Foldit puzzle closes, and how we select the most promising Foldit player designs for testing in the lab.

Binder metrics

As you know, the goal of Foldit is to fold your protein to optimize the score, which consists of a base score plus any bonuses or penalties from the Objectives.

The base Foldit score comes from a sophisticated energy function which takes into account things like clashing, electrostatics, and H-bonding. This is used to compute the energy of a solution. In structure prediction puzzles, the base score is all we need to optimize, since we know that a real protein will fold into the shape with the optimal energy.

Objectives add to the base score, rewarding features of a solution that are not accounted for in our energy function. This is especially helpful in protein design puzzles, which are a bit more complicated than structure prediction. In protein design, it is not enough to simply optimize energy — we have to think about the entire energy landscape of our designed protein. We use Objectives to promote features (like a buried core) that are known to improve the energy landscape of a designed protein.

Similarly, when we design protein binders, we like to calculate additional metrics that are not in the base score but that tend to correlate with strong binding. These metrics are not currently available as Foldit Objectives (we are working on it!), so this analysis is carried out by Foldit scientists after a puzzle closes.

Note that the following binder metrics only address the interactions between two folded proteins. They assume that the designed protein will be correctly folded, which is not always a given. We run a different set of analyses (discussed previously) to predict whether the binder will fold properly. However, we already have ample evidence that Foldit players can design well-folded proteins!

Binding Energy (DDG)

This calculates how the energy of the entire system is affected by binding and best reflects the actual physics of a molecular binding interaction. A more negative DDG (or ΔΔG) indicates stronger binding.

We start by calculating the energy of both proteins in the bound state (ΔGbound), with the binder and target in contact. Then we calculate the energy of both proteins in the unbound state (ΔGunbound), with the binder and target free in solution. The DDG is the difference, or delta (Δ), between these two numbers (ΔGbound - ΔGunbound). If the DDG is negative, it means that the bound state is more stable than the unbound state, so the binder should spontaneously stick to the target.

Interface Surface Area (SASA)

We also see that tight binding is correlated with the size of the binding interface. The larger the interface between two proteins, the tighter they tend to bind one another.

Our main concern here is the amount of water that is liberated from the protein surface upon binding. Normally the surface of every protein is surrounded by a “shell” of water molecules that have limited ways to make H-bonds with the protein surface. These water molecules have lower entropy than water molecules in bulk solvent. When two proteins bind together, they hide some of the protein surface that was previously exposed to shell water molecules. Those low-entropy waters are now free to diffuse into bulk solvent, thus increasing the entropy of the system and stabilizing the bound state.

For this reason, we measure the size of the interface in terms of solvent-accessible surface area, or SASA. This measures ONLY the part of the surface that is accessible to water (so small nooks and crannies are omitted). Similar to the DDG calculation above, we first measure the total SASA for the binder and target in the bound state, and then again in the unbound state. The difference in SASA between the bound and unbound states is proportional to the amount of water that is freed when the binder and target come into contact.

Shape Complementarity (SC)

Shape complementarity (SC) measures how well two objects fit together. A glove, for example, has very high shape complementarity for a hand. If two proteins have complementary shapes (SC approaching 1.0), then they will fit together snuggly, making close packing interactions and efficiently displacing surface water molecules.

We measure the SC of two proteins by comparing their surface contours along the interface (as defined in this 1993 paper). Mathematically speaking, we consider a vector that is perpendicular to the surface of the binder, and a corresponding vector at the surface of the target. If these two vectors point in the same direction, then the surface contours of binder and target are similar at this region. By comparing vector pairs spread across the interface, we arrive at a single number describing how well the shape of the binder fits against the shape of the target.

Shape complementarity. The upper part of this interface has a high shape complementarity, and corresponding pairs of vectors (like a and a') point in the same direction. The lower part of this interface has low shape complementarity; vector pairs in this region (like c and c') point in different directions.

Buried Unsatisfied Polar Atoms (BUNS)

Polar atoms like oxygens and nitrogens are most stable when they make hydrogen bonds, either with the water surrounding the protein, or with other polar atoms in the protein. If the interface between binder and target has polar atoms that cannot make hydrogen bonds, then binding is very unlikely.

We recently devoted an entire blog post just about BUNS, so we won’t go into the details here. The important thing is that all polar atoms at the binding interface should make hydrogen bonds!

Binders against SARS-CoV-2 spike protein

In rounds one and two of the Coronavirus Binder Design challenge, Foldit players came up with thousands of solutions that achieve high scores within Foldit. This means they already have highly optimized energies and satisfy our protein design Objectives.

We’ve been calculating the binding metrics described above for those designs to see which ones are most likely to actually bind the target. Since we have a high-resolution crystal structure of the CoV spike protein target bound to the human ACE2 receptor, we can also calculate these binder metrics for the natural ACE2 interface.

Below is one exceptional design by a Foldit player stomjoh, from 1808: Coronavirus Binder Design: Round 2, that scores well in all of our binder metrics!

This is an excellent binder design! Compared to the natural ACE2 receptor, this design is predicted to bind even more tightly, with a DDG of -45.0 kcal/mol! This interface has a slightly smaller surface area than ACE2, but 1794 Å is still impressive. The natural ACE2 interface has a very high shape complementarity score of 0.73, but this Foldit player design is able to match it! And finally, we see that this design has fewer unsatisfied polar atoms at the interface, which should also work in our favor.

We’d like to caution readers that, even with these metrics, we are still not very good at predicting binders. Protein binder design is a very hard problem — one at the forefront of computational biology — and there are other physical factors that are difficult to account for. Even if our metrics look good on paper or on a computer, only laboratory testing will tell us whether these designer proteins actually fold and bind to the target.

Now that the Round 3 puzzle has closed, we will calculate binder metrics for those results as well. Then we will order genes for the best designs so that we can test them in the lab for binding! Meanwhile, check out the new newer newest Coronavirus Binder Design: Round 4 puzzle, online now!

IMPORTANT: Please fill out the Foldit usernames and data analysis form, if you have not already! Out of concern for players’ privacy, we will not share the Foldit usernames associated with tested designs unless those players have given us permission in the form.

( Posted by  bkoep 46 251  |  Thu, 03/19/2020 - 23:22  |  19 comments )
spvincent's picture
User offline. Last seen 8 hours 18 min ago. Offline
Joined: 12/07/2007
Groups: Contenders
Might this solution be a

Might this solution be a potential candidate for an "all hands" puzzle, as suggested by O Seki To in this forum post?

bkoep's picture
User offline. Last seen 1 day 12 hours ago. Offline
Joined: 11/15/2012
Groups: Foldit Staff
Good suggestion!

We've thought about that, but I don't actually think that would lead to better results. In short, we're currently limited by quantity of solutions—not quality.

It seems that individual players and teams are perfectly capable of coming up with valid models that cover wide diversity of structures (which is important). I'm afraid that in an "all-hands" puzzle, many players would spend their efforts refining models that don't need it, and in the end we would sacrifice diversity of models.

There's a big question about how important "late-game" optimization is in Foldit. You can always grind away at optimizing a solution and squeaking out a few more points (with diminishing returns, of course)—but at a certain point, this optimization doesn't actually improve the scientific validity of the solution. From a scientific standpoint, we think players' time would be better spent developing another solution from scratch. This is why we introduced the Sketchbook puzzles, that require players to restart the puzzle after hitting a move limit.

Susume's picture
User offline. Last seen 13 hours 53 min ago. Offline
Joined: 10/02/2011
How is surface calculated?

Is the surface calculated as a mesh of polygons? Intersecting spheres? Something else?

Is it based on all atoms? Heavy atoms? Sidechain lengths?

bkoep's picture
User offline. Last seen 1 day 12 hours ago. Offline
Joined: 11/15/2012
Groups: Foldit Staff
Great question!

Basically, we model all atoms as intersecting spheres and calculate their accessible surface analytically. This is pretty slow to do, and is one of the main reasons we haven't rolled out a SASA Objective in Foldit (yet). There are faster alternatives that use a voxel grid to approximate the protein surface, but these are numerically unstable and could be easily exploited by a clever script.

The citation for the intersecting spheres software we use (Dalphaball) is here. However, I recommend this review by the same authors, which gives some more background and is easier to read (but somewhat longer).

Joined: 09/24/2012
Groups: Go Science
A suggestion

You could implement an optional tool to calculate SASA on demand. It would then not count in the scoring system, neither slow down our computational capacity. I would find it interesting in order to guide me to select between different starting designs before to rise their score by recipes.

As you wrote in the landscape blog, players don't hesitate to loose points in order to (visually) select a better pattern. Any tool that could help us to select the best science design would be susefull. It happens that we hesitate between different possible strategies (like putting an helix on one side or another of sheets, or to additionally bind on one side or another of a central binding helix). Visual intuition is not enough, even with the best visual views. And bonusses are slow to compute.

I would be interested to invest in a 30 minutes calculation of some additional complex metrics in order to help me choose where I put most CPU resources.

Personally, I use to work on different tracks in parallel. The best scoring track + additional parallel tracks based on "intuition" of promising designs. I do the same when choosing a share to evolve, at start to mid game: between the shares by other players, I select the most "beautifull" one and I might invest several days in evolving it even if the group top design currently has much more points.

Susume's picture
User offline. Last seen 13 hours 53 min ago. Offline
Joined: 10/02/2011
Great idea

I love this idea. I would absolutely run a SASA calculation that takes several minutes just a few times during the course of a puzzle, to see how well I am matching the target molecule.

Joined: 05/23/2020
Groups: None
similarly to what

similarly to what thanrahan1515 said. I was wondering if you could implement a reinforcement machine learning nn in the likes of alpha zero to solve puzzles like these (since they come in the form of a game with all the variables given) instead of brute forcing.

Joined: 09/24/2012
Groups: Go Science
Hi neckro178

Sorry I deleted the message from thanrahan1515 by error (basically because it's not a player). Such questions on "why don't you simply run your algorythm" have answers in the introduction to Foldit, the wiki, the FAQ, the Blog and elsewhere. You can feel it when playing tutorials: wiggle runs The Foldit Algorythm. But human 'visual' intervention is necessary to help the computer search in the right direction.

A 'human made' intuitive design can save a lot of CPU time. Algorythms can be stuck in an "energy valley", trying and trying to go deep when the sollution is actually in another valley.

See here some explanation of the difficulty to simply run an algorythm on supercomputers:

Joined: 09/24/2012
Groups: Go Science
Question on SASA

Should it be small or high ?

From the text, I understood that it should be small (binding proteins have smaler area to water than separate proteins).

But in the comparison of ACE2 and foldit design, it sounds that foldit design does worst (with a smaller SASA).

bkoep's picture
User offline. Last seen 1 day 12 hours ago. Offline
Joined: 11/15/2012
Groups: Foldit Staff
High SASA is better

We are interested in the surface area of the binding interface only. This is the amount of area that is exposed in the unbound state, and buried in the bound state. In other words, how much does SASA change upon binding.

If the binding interface buries a lot of surface area (high SASA), a lot of water is released from the protein surface and this leads to tighter binding. If the binding interface has a small surface area (low SASA), a small amount of water is released upon binding and this contributes very little to binding strength.

LociOiling's picture
User offline. Last seen 9 hours 23 min ago. Offline
Joined: 12/27/2012
DDG unbound

The description of DDG says:

ΔGbound - ΔGbound

I think that should be:

ΔGbound - ΔGunbound
bkoep's picture
User offline. Last seen 1 day 12 hours ago. Offline
Joined: 11/15/2012
Groups: Foldit Staff
Good catch


LociOiling's picture
User offline. Last seen 9 hours 23 min ago. Offline
Joined: 12/27/2012
shape complementarity UOM?

Unfortunately, the link to the 1993 article mentioned gets just the abstract for most of us. The abstract does mention a "new statistic Sc", but that's about it.

It's possible to get a shape complementarity of -1 when starting with an extended chain. Otherwise, what's the range of possible values?

LociOiling's picture
User offline. Last seen 9 hours 23 min ago. Offline
Joined: 12/27/2012
some more info on shape complementarity

I found this letter somewhat helpful:

Under heading "Cross comparison of Sc values", there's a list of the parameters that affect the value of Sc. So it seems there's no true unit of measure, just the results of a calculation done a particular way.

Still, it would be helpful to know what values to expect in Foldit.

bkoep's picture
User offline. Last seen 1 day 12 hours ago. Offline
Joined: 11/15/2012
Groups: Foldit Staff
Good question!

The true Shape Complementarity ranges from 0.0 to 1.0, where 0.0 is a poor fit and 1.0 is perfect hand-in-glove. In practice, I haven't seen anything above 0.80 for a protein-protein binding interface.

The Objective will report an "error" value of -1 when your binder is too far away and is not making contact with the target. Mathematically speaking, it's still possible calculate an SC value for two distant surfaces, but the result would be nonsensical and possibly misleading.

The SC value comes from the dot product of two vectors, so it is unitless.

LociOiling's picture
User offline. Last seen 9 hours 23 min ago. Offline
Joined: 12/27/2012
target values?

Hovering over the new metrics once they are calculated gives you suggested targets.

For example, on puzzle 1880, the hover bubbles show these values:

* "target DDG is -40 or less"
* "target SASA: 1500 or greater"
* "target shape complementarity is 0.6 or greater"

Are these values set as part of the puzzle design? Or are they just fixed values? I see the puzzle 1877 devprev shadow had the same values.

The SASA value in particular seems like it may be a stretch on the coronavirus puzzles, but then I haven't hit the DDG target either. Shape complementarity is easy by comparison.

bkoep's picture
User offline. Last seen 1 day 12 hours ago. Offline
Joined: 11/15/2012
Groups: Foldit Staff
Set values

We may set these targets specifically for each puzzle. But the values you listed are a good rule of thumb for all binder design problems.

For what it's worth, it does seem that the SASA Objective is producing unrealistically low numbers on Puzzle 1877 and 1880. This is probably a bug, but we're still looking into it!

User login
Download links:
  Windows    OSX    Linux  
(10.12 or later)

Are you new to Foldit? Click here.

Are you a student? Click here.

Are you an educator? Click here.
Social Media

Only search
Other Games: Mozak
Recommend Foldit
Top New Users

Developed by: UW Center for Game Science, UW Institute for Protein Design, Northeastern University, Vanderbilt University Meiler Lab, UC Davis
Supported by: DARPA, NSF, NIH, HHMI, Amazon, Microsoft, Adobe, Boehringer Ingelheim, RosettaCommons