Foldit Blog
Blog Feed
This is the place where we will describe some of the outcomes and results of your folding work, provide a glimpse of future challenges and developments, and in general give you a better sense of where we are and where foldit hopes to go in the future.

Coronavirus binder designs queued for testing!

After the first three rounds of our Coronavirus Binder Design challenge, we've selected 99 of the most promising Foldit player solutions for experimental testing!

Once a Foldit puzzle closes, we run some further analysis to figure out which designs are the most likely to fold and bind to the target. You can read more about some of that analysis on our previous blog post. To select promising designs, we consider Foldit score in addition to metrics that correlate with proper folding and others that correlate with binding.

We've combined those metrics to choose 33 designs from each of rounds one, two, and three of the Coronavirus Binder Design challenge. In total, 99 Foldit binder designs will be tested at the UW Institute for Protein Design, with the same experiments that have already begun for computationally-designed binders.

It will be a few more weeks before genes arrive and we can begin experiments on the Foldit designs. In the meantime, we'll continue to work on designing better binders in Foldit, so stay tuned for more puzzles! Be sure to review our tips for designing successful binders and watch coronavirus expert Lexi Walls, Ph.D. discuss early Foldit designs!

Below are the 99 designed proteins that we'll test for binding to the SARS-CoV-2 spike protein (click to view the full-size image). Remember to fill out our username sharing form if you want to see your username in Foldit updates!

2008926_c0022 johnmitch
2008926_c0023 johnmitch
2008926_c0026 PLAYER_7
2008926_c0034 PLAYER_4
2008926_c0036 ZeroLeak7
2008926_c0037 NinjaGreg
2008926_c0040 Galaxie
2008926_c0042 bertro
2008926_c0052 Galaxie
2008926_c0056 PLAYER_9
2008926_c0059 stomjoh
2008926_c0063 PLAYER_1
2008926_c0066 PLAYER_4
2008926_c0069 Galaxie, robgee, alwen
2008926_c0071 silent gene
2008926_c0079 Galaxie
2008926_c0108 mirp
2008926_c0113 LociOiling
2008926_c0141 PLAYER_6
2008926_c0142 dcrwheeler
2008926_c0175 spvincent
2008926_c0193 silent gene
2008926_c0197 PLAYER_10
2008926_c0200 PLAYER_4
2008926_c0252 christioanchauvin
2008926_c0253 Waya
2008926_c0352 actiasluna
2008926_c0382 PLAYER_4
2008926_c0425 PLAYER_4
2008926_c0429 stomjoh
2008926_c0450 stomjoh
2008926_c0474 robgee
2008926_y3560 PLAYER_14
2008984_c0001 actiasluna
2008984_c0002 Caraline_nelson, Phyx, mirp, PLAYER_17, jeff101, silent gene
2008984_c0003 Bletchley Park, spvincent
2008984_c0007 johnmitch
2008984_c0017 Migi
2008984_c0019 Waya
2008984_c0023 stomjoh
2008984_c0030 Phyx
2008984_c0034 Aubade01
2008984_c0036 silent gene, edpalas
2008984_c0043 ZeroLeak7
2008984_c0044 PLAYER_19
2008984_c0046 Steven Pletsch, PLAYER_18
2008984_c0058 PLAYER_12, frood66
2008984_c0087 Skippysk8s
2008984_c0089 actiasluna
2008984_c0103 mirp
2008984_c0129 MurloW

2008984_c0140 robgee
2008984_c0147 spmm
2008984_c0167 PLAYER_13
2008984_c0182 ZeroLeak7
2008984_c0205 NinjaGreg
2008984_c0239 PLAYER_15, frood66
2008984_c0250 PLAYER_9
2008984_c0313 spvincent
2008984_c0580 robgee
2008984_c0626 PLAYER_19
2008984_y5208 spvincent
2008984_y5300 spvincent
2008984_y7170 mirp
2008984_y9747 silent gene
2008984_y9800 silent gene
2009030_c0002 Bletchley Park, PLAYER_5, georg137, spvincent
2009030_c0005 Bletchley Park
2009030_c0008 ZeroLeak7
2009030_c0009 Steven Pletsch, PLAYER_18
2009030_c0015 Tehnologik1
2009030_c0017 actiasluna
2009030_c0020 Galaxie, jamiexq
2009030_c0021 PLAYER_16
2009030_c0027 dcrwheeler
2009030_c0030 Susume
2009030_c0031 crpainter
2009030_c0040 PLAYER_3
2009030_c0049 actiasluna, Jpilkington, ManVsYard
2009030_c0073 Crossed Sticks
2009030_c0083 alcor29
2009030_c0085 Waya
2009030_c0102 retiredmichael
2009030_c0105 Steven Pletsch
2009030_c0115 PLAYER_13
2009030_c0116 PLAYER_20
2009030_c0147 PLAYER_21
2009030_c0226 actiasluna
2009030_c0242 PLAYER_2
2009030_c0271 NinjaGreg
2009030_c0314 zeroblue
2009030_c0346 PLAYER_19
2009030_c0554 PLAYER_22
2009030_y0378 silent gene
2009030_y0533 PLAYER_21
2009030_y0703 fiendish_ghoul
2009030_y3873 PLAYER_10, PLAYER_17, Bruno Kestemont
2009030_y5708 Bruno Kestemont
2009030_y6236 Keresto

( Posted by  bkoep 70 476  |  Wed, 03/25/2020 - 01:17  |  0 comments )

Analysis of protein binder designs

Today, the Coronavirus Binder Design: Round 3 puzzle closed, and now Foldit scientists will carry out further computational analysis to try and pick out the most promising designs!

This blog post digs into some of the analysis we do after a Foldit puzzle closes, and how we select the most promising Foldit player designs for testing in the lab.

Binder metrics

As you know, the goal of Foldit is to fold your protein to optimize the score, which consists of a base score plus any bonuses or penalties from the Objectives.

The base Foldit score comes from a sophisticated energy function which takes into account things like clashing, electrostatics, and H-bonding. This is used to compute the energy of a solution. In structure prediction puzzles, the base score is all we need to optimize, since we know that a real protein will fold into the shape with the optimal energy.

Objectives add to the base score, rewarding features of a solution that are not accounted for in our energy function. This is especially helpful in protein design puzzles, which are a bit more complicated than structure prediction. In protein design, it is not enough to simply optimize energy — we have to think about the entire energy landscape of our designed protein. We use Objectives to promote features (like a buried core) that are known to improve the energy landscape of a designed protein.

Similarly, when we design protein binders, we like to calculate additional metrics that are not in the base score but that tend to correlate with strong binding. These metrics are not currently available as Foldit Objectives (we are working on it!), so this analysis is carried out by Foldit scientists after a puzzle closes.

Note that the following binder metrics only address the interactions between two folded proteins. They assume that the designed protein will be correctly folded, which is not always a given. We run a different set of analyses (discussed previously) to predict whether the binder will fold properly. However, we already have ample evidence that Foldit players can design well-folded proteins!

Binding Energy (DDG)

This calculates how the energy of the entire system is affected by binding and best reflects the actual physics of a molecular binding interaction. A more negative DDG (or ΔΔG) indicates stronger binding.

We start by calculating the energy of both proteins in the bound state (ΔGbound), with the binder and target in contact. Then we calculate the energy of both proteins in the unbound state (ΔGunbound), with the binder and target free in solution. The DDG is the difference, or delta (Δ), between these two numbers (ΔGbound - ΔGunbound). If the DDG is negative, it means that the bound state is more stable than the unbound state, so the binder should spontaneously stick to the target.

Interface Surface Area (SASA)

We also see that tight binding is correlated with the size of the binding interface. The larger the interface between two proteins, the tighter they tend to bind one another.

Our main concern here is the amount of water that is liberated from the protein surface upon binding. Normally the surface of every protein is surrounded by a “shell” of water molecules that have limited ways to make H-bonds with the protein surface. These water molecules have lower entropy than water molecules in bulk solvent. When two proteins bind together, they hide some of the protein surface that was previously exposed to shell water molecules. Those low-entropy waters are now free to diffuse into bulk solvent, thus increasing the entropy of the system and stabilizing the bound state.

For this reason, we measure the size of the interface in terms of solvent-accessible surface area, or SASA. This measures ONLY the part of the surface that is accessible to water (so small nooks and crannies are omitted). Similar to the DDG calculation above, we first measure the total SASA for the binder and target in the bound state, and then again in the unbound state. The difference in SASA between the bound and unbound states is proportional to the amount of water that is freed when the binder and target come into contact.

Shape Complementarity (SC)

Shape complementarity (SC) measures how well two objects fit together. A glove, for example, has very high shape complementarity for a hand. If two proteins have complementary shapes (SC approaching 1.0), then they will fit together snuggly, making close packing interactions and efficiently displacing surface water molecules.

We measure the SC of two proteins by comparing their surface contours along the interface (as defined in this 1993 paper). Mathematically speaking, we consider a vector that is perpendicular to the surface of the binder, and a corresponding vector at the surface of the target. If these two vectors point in the same direction, then the surface contours of binder and target are similar at this region. By comparing vector pairs spread across the interface, we arrive at a single number describing how well the shape of the binder fits against the shape of the target.

Shape complementarity. The upper part of this interface has a high shape complementarity, and corresponding pairs of vectors (like a and a') point in the same direction. The lower part of this interface has low shape complementarity; vector pairs in this region (like c and c') point in different directions.

Buried Unsatisfied Polar Atoms (BUNS)

Polar atoms like oxygens and nitrogens are most stable when they make hydrogen bonds, either with the water surrounding the protein, or with other polar atoms in the protein. If the interface between binder and target has polar atoms that cannot make hydrogen bonds, then binding is very unlikely.

We recently devoted an entire blog post just about BUNS, so we won’t go into the details here. The important thing is that all polar atoms at the binding interface should make hydrogen bonds!

Binders against SARS-CoV-2 spike protein

In rounds one and two of the Coronavirus Binder Design challenge, Foldit players came up with thousands of solutions that achieve high scores within Foldit. This means they already have highly optimized energies and satisfy our protein design Objectives.

We’ve been calculating the binding metrics described above for those designs to see which ones are most likely to actually bind the target. Since we have a high-resolution crystal structure of the CoV spike protein target bound to the human ACE2 receptor, we can also calculate these binder metrics for the natural ACE2 interface.

Below is one exceptional design by a Foldit player stomjoh, from 1808: Coronavirus Binder Design: Round 2, that scores well in all of our binder metrics!

This is an excellent binder design! Compared to the natural ACE2 receptor, this design is predicted to bind even more tightly, with a DDG of -45.0 kcal/mol! This interface has a slightly smaller surface area than ACE2, but 1794 Å is still impressive. The natural ACE2 interface has a very high shape complementarity score of 0.73, but this Foldit player design is able to match it! And finally, we see that this design has fewer unsatisfied polar atoms at the interface, which should also work in our favor.

We’d like to caution readers that, even with these metrics, we are still not very good at predicting binders. Protein binder design is a very hard problem — one at the forefront of computational biology — and there are other physical factors that are difficult to account for. Even if our metrics look good on paper or on a computer, only laboratory testing will tell us whether these designer proteins actually fold and bind to the target.

Now that the Round 3 puzzle has closed, we will calculate binder metrics for those results as well. Then we will order genes for the best designs so that we can test them in the lab for binding! Meanwhile, check out the new newer newest Coronavirus Binder Design: Round 4 puzzle, online now!

IMPORTANT: Please fill out the Foldit usernames and data analysis form, if you have not already! Out of concern for players’ privacy, we will not share the Foldit usernames associated with tested designs unless those players have given us permission in the form.

( Posted by  bkoep 70 476  |  Thu, 03/19/2020 - 23:22  |  19 comments )

The BUNS Objective

Buried unsatisfied polar atoms (also "Buried Unsats" or BUNS) are oxygen and nitrogen atoms that don’t make hydrogen bonds. These atoms can prevent a protein design from folding correctly, so we’re introducing a new Objective to reward protein designs with no BUNS! The Buried Unsats Objective is currently available to devprev users for testing, but soon it will be released to all players!

Polar atoms and satisfaction

Most proteins are made up of just a few different elements: carbon, oxygen, nitrogen, a bit of sulfur, and lots of hydrogen. In Foldit we separate these elements into two groups: polar and nonpolar.

Carbon and sulfur atoms are nonpolar. This means they are very good about sharing electrons with neighboring atoms, so electrical charge is balanced.

Oxygen and nitrogen (and hydrogens attached to these) are polar. When an oxygen has a hydrogen attached, it shares its electrons unevenly. The electrons tend to hang out closer to the oxygen than to the hydrogen, so that there is a slight imbalance of charges.

Polar atoms. (a) Orange carbons share electrons evenly with their partners, which is why orange sidechains are nonpolar. (b) Red oxygen atoms tend to hoard electrons, leading to an uneven distribution of charges. Chemists use the Greek letter δ to show unabalanced "partial charges." (c) Blue nitrogen atoms also hoard electrons, and will draw electrons away from an attached hydrogen, giving the hydrogen a positive partial charge. (d) A polar hydrogen is most stable when shared between two polar atoms, creating a hydrogen bond between the hydrogen and a neighboring oxygen.

This imbalance of charges is a little unstable on its own, but another polar atom nearby can help out to balance the charges! This is the basis of a hydrogen bond. The hydrogen is less stable on its own, but becomes more stable when shared between two polar atoms.

In protein science, we tend to use the word “satisfaction” to talk about polar atoms. An unsatisfied polar atom does not make hydrogen bonds, and it has unstable, unbalanced charges. A polar atom is satisfied when it makes hydrogen bonds, and these charges are balanced.

Unsatisfied polar atoms in proteins

Polar satisfaction is very important in protein folding! When we look at natural protein structures, we see that nearly every polar atom is completely satisfied.

This makes sense, when we consider that a protein naturally folds into its most stable structure. There are lots of ways to fold up a protein with unsatisfied polar atoms, but these will be less stable than a fold where every polar atom is satisfied. A structure with many buried unsats might not even fold at all because polar atoms would rather make hydrogen bonds with the water in the unfolded state.

Polar satisfaction is the reason why helix and sheet structures are so common in proteins! These structures are able to satisfy the polar nitrogens and oxygens in the protein backbone.

This is also the reason that blue sidechains like to be on the outside of the protein, exposed to surrounding water! All of the blue sidechains have polar oxygens and nitrogens on them. Since water is also polar, the blue sidechains can make hydrogen bonds with the surrounding water.

It is possible for blue sidechains to fold in the core of the protein, but only if every polar atom makes hydrogen bonds.

Polar satisfaction. Helices (a) and sheets (b) are stable because they can satisfy all of the polar atoms on the protein backbone. (c) Blue sidechains like to be on the protein surface, but can be buried if they satisfy all of their polar atoms. This buried ARG makes 5 hydrogen bonds to satisfy all of its polar atoms.

This is especially important to remember in protein design! Sometimes a blue sidechain seems to fit really well in the protein core, even though it fails to make enough hydrogen bond to be satisfied.

It can be difficult to keep track of all the polar atoms that need to make hydrogen bonds in a protein. Unsatisfied polar atoms are one of the main reasons that Foldit player designs fail computational analysis before lab testing. We often see amazing protein designs from Foldit players that look very promising—except for one or two buried polar atoms that can’t make hydrogen bonds.

The Buried Unsats Objective

The Buried Unsats Objective detects unsatisfied polar atoms in a Foldit solution. It only detects polar atoms in the protein core, and ignores polar atoms on the protein surface (because these can make hydrogen bonds with the surrounding water).

Solutions with zero Buried Unsats get a score bonus; the bonus decreases with the number of buried polar atoms in a solution. In the Objectives panel, click ‘Show’ to highlight all of the unsatisfied polar atoms in your solution.

Polar atoms with an unbalanced hydrogen will glow blue; polar atoms that can help out will glow red. Match a blue- with a red-glowing atom to create a hydrogen bond and satisfy the polar atoms!

Some polar atoms are attached to multiple hydrogens that need to be satisfied with hydrogen bonds. When you click ‘Show’, the Buried Unsats Objective will change some of your View Settings to display all sidechains and bondable hydrogens. You can reset these options in the View tab.

This filter will display sidechains and show all hydrogens and bondable atoms. These can be reset by toggling the boxes in the view tab.

( Posted by  neilpg628 70 1855  |  Fri, 03/13/2020 - 01:44  |  5 comments )

First look at coronavirus solutions

Today was the final day for Puzzle 1805b: Coronavirus Spike Protein Binder Design! If you didn't get a chance to participate, or if you have more ideas to try, don't worry. There is a new and improved Round 2 puzzle where you can continue to work on your designs!

Now that our first puzzle has closed, scientists at the Institute for Protein Design at the University of Washington School of Medicine will take a close look at Foldit players' solutions. At first this will involve an intensive computational analysis of Foldit players' designs. We will try to assess both whether the designed proteins will fold correctly and if they might bind to the coronavirus target.

Promising solutions will then advance to laboratory testing, where we will manufacture select Foldit player-designed proteins and test to see if they stick to coronavirus spike protein. (Don't worry, scientists can safely experiment with the coronavirus spike protein without exposing ourselves to live virus)

In the mean time, we've already had a chance to look at some initial solutions from Foldit players. Below, we take a brief look at some of our favorite solutions so far, what we like about them, and what can be improved.

Hydrophobic packing

In Foldit, orange sidechains are hydrophobic. This means that they like to be buried, away from the water that surrounds the outside of the protein. Proteins naturally tend to fold up in ways that bury these orange hydrophobics in the protein core. This is called the hydrophobic effect.

This same hydrophobic effect can also drive protein molecules to stick together! If two proteins each have small, complementary patches of orange hydrophobic sidechains on their surface (exposed to surrounding water), then the two proteins will tend to stick together in order to hide these sidechains.

For a designed protein to fold, we need to make sure it has a significant core with lots of buried orange sidechains. And for it to bind against the coronavirus target, it will also need to bury orange sidechains at the binding interface. However, if the designed protein has too many orange sidechains on its surface, it will misfold.

Below is an excellent designed protein! It has a significant core of buried orange hydrophobics, and the surface mostly consists of blue sidechains. In addition, almost all of the residues in the design form alpha-helices; this is a very stable configuration for the protein backbone. So, if we were to synthesize this protein in the lab, there's a good chance it would fold up into this desired shape!

At the binding interface (on the right), we can see that this design makes close contacts with two bulky hydrophobic sidechains (highlighted in purple) on the coronavirus target. This will likely help to bury the bulky hydrophobics away from the surrounding water, and may result in tight binding between the designed protein and coronavirus target!

Hydrogen bonds

The coronavirus spike protein is an especially difficult target to bind because there are not very many orange hydrophobics on its surface. In Foldit, blue sidechains are hydrophilic. These sidechains have polar oxygen and nitrogen atoms that can make very stable hydrogen bonds with the water surrounding the protein. For this reason, blue sidechains normally like to be exposed on the surface of the protein.

If we want to bind to the coronavirus target, our designed protein will probably have to bury some of these blue sidechains away from water. In other words, our designed binder will disrupt the stable hydrogen bonds that the blue sidechains normally make with water. The only way to compensate for this is by making hydrogen bonds to all of the oxygens and nitrogens that are buried at the interface.

In Foldit, you can see polar oxygen and nitrogen atoms by setting your View Settings to Hydro/Score+CPK coloring. This will color all oxygen and nitrogen atoms red and blue. Polar oxygen and nitrogens on the coronavirus target will need to be matched with polar atoms on the design to make hydrogen bonds!

Below is another excellent design from Foldit players! Again, this design has lots of orange sidechains buried in the core of the protein, with blue sidechains on the surface. This design also has lots of structure: the protein forms beta-sheets in addition to alpha-helices, which is another stable arrangement. So we think this design is likely to fold up correctly if tested in the lab!

If we look at the binding interface for this design (on the right), we see some very nice hydrogen bonding with some polar oxygens on the coronavirus target! Since these oxygens can normally make hydrogen bonds with surrounding water, it's very important that our designed binder can make these replacement hydrogen bonds.

However, if we look closer, we can see that our design introduces new polar atoms that are buried at the binding site, and not all of them make hydrogen bonds! The hydrogens numbered 1 through 4 are not making hydrogen bonds, and this could interfere with binding. This designed binder will tend to float away from the coronavirus target, so that all of these polar atoms can make hydrogen bonds with the surrounding water.

Binding site

This last binder design also looks very promising at first glance. We see that the designed protein itself has lots of orange sidechains buried in the core, with blue sidechains on the surface, so it is likely to fold up correctly. We also see that there are lots of orange sidechains that are buried at the interface with the coronavirus target, so this should result in really tight binding between the design and the target!

However, this design binds to the wrong side of the coronavirus target! We see that this protein is designed next to the frozen section of the coronavirus protein, away from the flexible sidechains at the target binding site. If we overlay this design with the normal human receptor (highlighted in purple), we see that there is no overlap between the design and the human receptor!

This means that the coronavirus protein is capable of binding to both the design and the human receptor at the same time. So even if this design binds to coronavirus protein, it will probably not block the infection pathway of the virus.

In the new Round 2 puzzle, we amended the coronavirus target so that these off-target residues do not contribute to your Foldit score. In order to get the best score (and design an effective antiviral protein) players should focus on the flexible blue and orange sidechains at the normal binding interface.

Good luck, and happy folding!

( Posted by  bkoep 70 476  |  Thu, 03/05/2020 - 23:16  |  4 comments )

AlphaFold: Machine learning for protein structure prediction

In 2018, a group of computer scientists at DeepMind revealed a new method for protein structure prediction, called AlphaFold. In that year’s CASP competition, which benchmarks the state-of-the-art for protein structure prediction, AlphaFold swept the competition, generating more accurate predictions than any other research group.

AlphaFold has received considerable attention for this achievement, and a few weeks ago they published a scientific paper with the details of their new method. Since protein structure prediction often appears in Foldit puzzles, we wanted to review the AlphaFold method with Foldit players!

This blogpost is meant to summarize this exciting progress from AlphaFold, with an overview of their method, and some thoughts about the expected impact on protein research.

Machine Learning and Neural Nets

AlphaFold comes from DeepMind, a company well-known for tackling hard problems with machine learning algorithms. In 2016, a DeepMind program called AlphaGo famously beat a world-champion player of Go, a classic Chinese board game that is notoriously difficult for computer programs.

Machine learning (ML) is a branch of computer science that deals with self-improving algorithms. An ML algorithm is set up to perform a well-defined task, with a well-defined measure of performance. Over a “training” period, the algorithm is able to evaluate its own performance at the task and iteratively make changes that improve its performance.

One popular type of ML algorithm is a neural net, so called because it is inspired by the organization of neurons in the brain. Just like a web of neurons that communicate through synapses, a neural net is a web of virtual “nodes” that pass signals to one another. Typically, each node performs a simple mathematical operation on received signals (for example, testing if the sum of the signals exceeds some threshold), then passes on the new signal to downstream nodes. Training a neural net involves tuning the operations at each node so that the entire network produces the desired output from the training inputs.

A diagram of a simple neural network (from WikiMedia Commons). Signals are passed between nodes, each of which performs some simple (nonlinear) operation on the received signal and passes on the result. This network contains a single hidden layer of 4 nodes; the AlphaFold neural net contains hundreds of layers with thousands of nodes.

Neural nets have been very useful for abstracting information from complex inputs. A popular application of neural nets is the image recognition problem: the input is a 2D array of colored pixels, and the task is to classify the depicted object.

The AlphaFold algorithm is a neural net, very similar to the kind used for image recognition. In this case, the input is information about the protein sequence, and the task is to predict the distance between each residue in the folded protein.

Predicted Contacts vs. Predicted Distances

Many Foldit players will already be familiar with the concept of predicted contacts. These are residues in a protein that are predicted to be close to one another (“in contact”) in the folded structure, even if they are not neighbors in the protein sequence.

These predictions come from covariance patterns that emerge during evolution. We can observe these patterns by comparing very similar protein sequences in different organisms. For instance, we could compare the hemoglobin sequence in humans, chimps, dogs, mice, etc., and look for positions that tend to co-vary (i.e. two residues that seem to change together, as if they depend on one another). Strong covariance between two residues usually suggests that those residues interact with one another in the folded structure, through side-chain packing, H-bonding, electrostatics, etc.

Cartoon diagram of covariance (from GREMLIN). (Left) In these two related protein structures, the red and green residues interact with one another. When one of these mutates during the course of evolution, its partner may also have to mutate to maintain the interaction. (Right) Even when we don’t know the structure of these proteins, we can see evidence of this interaction when we compare lots of related protein sequences. The two positions in the dashed boxes display strong covariance.

One of the key insights of the AlphaFold group was to take these predictions a step further: Instead of using covariance to predict whether a two residues are “in contact” (a simple yes/no), AlphaFold attempts to predict the distance between the two residues (a range of values between 2 and 20 Å). These predictions are more difficult to make, but successful predictions provide much richer information about the folded protein structure.

We should note that, in 2018, a few other research groups were also using neural networks to predict distances—not just AlphaFold. The second insight of AlphaFold concerns their ability to generate a folded protein structure from predicted distances. They represent each distance prediction as a smooth restraint function, which allows them to employ a simple technique called gradient descent, directly folding the protein into a structure compatible with their predicted distances.

Predicted distances for residue pairs. (a) Similar to a contact map, this plot shows the predicted distance between every pair of residues in the structure. (b) For each pair of residues, the neural net produces a probability distribution of distances for each pair of residues. For the pair of residues marked by the blue star in (a), we can see the probability distribution favors a distance of about 8 Å. (c) The probability distribution is converted to a smooth restraint function, where the lowest point of the function corresponds to the favored distance (in this case, 8 Å). A simple gradient descent algorithm allows AlphaFold to efficiently fold a protein structure that optimizes all of their distance predictions.

Finally, AlphaFold combines their distance predictions with the Rosetta energy function (the same energy function used by Foldit) to refine their final folded structure.

AlphaFold Performance in CASP

The Critical Assessment of protein Structure Prediction (CASP) is an opportunity for different researchers to compare their structure prediction methods in a head-to-head competition. The CASP organizers collect unpublished protein structures and challenge researchers to predict the structures based on their protein sequence. Because the true protein structures are unpublished, all the predictions are “blind,” and all the participants can evaluate their methods on a level playing field, starting from the same information.

AlphaFold’s neural net was able to make remarkably accurate distance predictions for many of the targets of the 2018 CASP competition, and this led them toward protein models that were very similar to the true structure. The best way to visualize AlphaFold’s success is to look at their summed Z-score for all targets in the Free Modeling category.

Rankings from the 2018 CASP Free Modeling category (from CASP13). The y-axis shows the summed Z-score across all targets in the category, with all competing groups on the x-axis. The leftmost bar represents the AlphaFold group.

This is an incredible achievement, and AlphaFold represents a significant step forward in protein structure prediction, but the structure prediction problem is still far from “solved.” For most natural proteins, AlphaFold relies heavily on covariance patterns, and often struggles when the target has very few related sequences (covariance is harder to detect with just a few related sequences). However, even with zero related sequences AlphaFold can still make distance predictions, albeit with lower confidence. AlphaFold showed this by correctly predicting the structure of Foldit3, a protein designed by Foldit players, with no related sequences and no co-variance information!

One scientific limit of AlphaFold is that it suffers from the “black box” problem. Neural nets like the AlphaFold algorithm are considered “black box” techniques because their inner workings are hard to interpret. It is very difficult for us to deconstruct a neural network to figure out exactly what concepts the algorithm is “learning” about proteins. In other words, AlphaFold has improved our ability to predict a protein structure from its sequence; but hasn’t directly increased our understanding of how protein sequence relates to structure.

Impact of AlphaFold

Since AlphaFold’s debut in 2018, many other research groups have begun experimenting with machine learning for predicting residue distances. Just this month, shortly after AlphaFold published their method, researchers at the Baker Lab published trRosetta, which builds on the AlphaFold method (see PDF from the Baker Lab website).

The Baker Lab researchers realized that a neural net could be trained to predict not just the distance between two residues, but also the relative orientation of those two residues. By training an algorithm to predict both distance and orientation between residues, the Baker group was able to make protein models with even greater accuracy!

Building on AlphaFold with trRosetta. (a) The AlphaFold neural net predicts only the distance between residues pairs. We can also train the neural net to predict the orientation of residue pairs (defined by several angles and torsions). (b) These angle and torsion predictions can also be converted into smooth restraint functions, which is key for applying the predictions to a protein model. (c) The orientation predictions improve the accuracy of final protein models for a set of CASP targets.

The CASP competition returns in the summer of 2020, and it will be very exciting to see how other groups have incorporated AlphaFold’s progress into their own prediction methods!

However, Foldit is unlikely to see any immediate changes as a direct result of AlphaFold’s success.

Since Foldit was launched in 2008, our focus has been gradually shifting away from protein structure prediction. The main reason for this is that we think Foldit players have more to contribute in other problems, like protein design or building models into cryoEM data. It’s likely that we can use distance predictions to help with these tasks (for example, to check if distance predictions for a designed sequence are compatible the designed structure), but for now we are still evaluating the most effective ways to use neural nets for these kinds of problems!

Special thanks goes to Baker Lab scientist Ivan Anishchenko for contributions to this blog post!

( Posted by  bkoep 70 476  |  Fri, 01/31/2020 - 00:25  |  8 comments )
User login
Download links:
  Windows    OSX    Linux  
(10.12 or later)

Are you new to Foldit? Click here.

Are you a student? Click here.

Are you an educator? Click here.
Social Media

Only search
Other Games: Mozak
Recommend Foldit
Top New Users

Developed by: UW Center for Game Science, UW Institute for Protein Design, Northeastern University, Vanderbilt University Meiler Lab, UC Davis
Supported by: DARPA, NSF, NIH, HHMI, Amazon, Microsoft, Adobe, Boehringer Ingelheim, RosettaCommons