The Foldit cryo-EM paper
The latest Foldit research paper, about Cryo-EM Density puzzles, was published today in the journal PLOS Biology! The paper is open-access, meaning that anybody can read and share it for free, from the journal website.
The paper is a formal research article, so it is written in technical language meant for other scientists, and skips over some background info. Below, we cover the main points so that everyone can appreciate this accomplishment by Foldit players!
Electron density in Foldit
The paper is about recent Foldit puzzles in the Electron Density category, where players fold the target protein into a 3D “cloud” of density that maps the shape of the folded protein. The paper reports solutions from Puzzles 1572, 1588, 1598, and 1606.
Foldit Puzzle 1598: Cryo-EM Freestyle with Density
This is not the first time Foldit players have wowed us in an electron density puzzle! Some of you may remember Puzzle 1152: Foldit vs. UMich Electron Density Challenge from back in 2015. In that contest, players built solutions into a high-resolution (1.9 Å) density map from x-ray diffraction experiments. Foldit players outperformed UMich undergraduates, expert crystallographers, and state-of-the-art computer algorithms! Those results were published in a previous paper.
This previous result gave us a clue that electron density might be a sweet spot for Foldit players, so we started to look at other kinds of density maps...
Cryo-electron microscopy (cryo-EM) is another technique for getting density maps and solving protein structures. In a cryo-EM experiment, a sample of protein in solution is spread on a thin metal wafer and quickly cooled to cryogenic temperatures to quench all molecular motion, freezing all of the protein atoms in a sheet of vitreous ice. Then we bombard the frozen sample with a beam of high-energy electrons, which scatter when they collide with the atoms of the protein. A detector measures the electron scattering, and the result is a grainy 2D “micrograph” of the wafer and any proteins on its surface.
Example cryo-EM micrograph of the S. entomophila antifeeding prophage, used to generate the maps for the puzzles in this paper. Used with permission of Ambroise Desfosses and Irina Gutsche (source).
If we collect enough of these raw micrographs (think millions), then we can align all of the individual protein molecules and average them together to get a clearer 2D picture of the protein. Finally, we combine all the 2D images to arrive at a 3D reconstruction of the protein, in the form of a density cloud—very similar to the electron density clouds that we get from x-ray diffraction experiments!
Unlike x-ray diffraction, cryo-EM experiments are fairly easy to set up (no protein crystals needed!). But cryo-EM has been unpopular for protein structure research because it yields a lower-resolution, “blobbier” density cloud than x-ray diffraction. However, that started to change around 2012, when a technological breakthrough gave us improved electron-scattering detectors and higher resolution maps. Since then, cryo-EM has taken off, and the number of new cryo-EM protein structures has been doubling every 2 years (by contrast, new x-ray diffraction structures have plateaued since 2013).
Cryo-EM and Foldit players
Even with the recent improvements, cryo-EM maps are not quite as clear as x-ray diffraction maps. The highest resolution typically achieved by cryo-EM is about 3.0 Å. Since covalently bonded atoms are separated by < 2 Å, that means we still can’t make out the positions of individual atoms simply by looking at the map. Instead, we have to infer the positions of the atoms, using our knowledge of physics and protein structure to find a plausible model that fits the map.
Building a plausible protein structure into a low-resolution map is difficult and prone to errors. If a microscopist focuses too much on fitting the density cloud, they might end up with a strained (high energy) model that is physically unrealistic. On the other hand, a computer algorithm that optimizes energy can have a hard time fitting a model into the density map.
This is where Foldit players come in! We know from previous work that Foldit players are adept at interpreting density maps; and the Foldit score function should help guide players toward plausible, low energy models.
In Puzzles 1572, 1588, 1598, and 1606, we provided Foldit players with cryo-EM maps for four proteins that make up the S. entomophila antifeeding prophage (a complex needle-like structure used by bacteria to inject toxins into a target cell). We then compared Foldit player solutions with those of expert microscopists and a handful of automated algorithms.
Comparison of solutions from different methods. (Top) The top Foldit solution from Puzzle 1588 and the model built by the scientist. They look pretty similar when you look this zoomed out, but looking closer: (Bottom) Subtle deviations in the models can yield significant results. In the bottom-right image, an automated algorithm (magenta) had trouble matching the density, and left some regions of the map completely empty.
Foldit players take gold!
In each of the four puzzles, Foldit player solutions had the best balance of plausibility and fit-to-density! If you’re curious, the scientists came in second, and the algorithms came in last (but there was a lot of variance between different algorithms).
Foldit players achieve plausibility and high fit-to-density for AFP7 (Puzzle 1588). (Left) Microscopists build strained models that have many clashes. (Right) Automatic algorithms like Rosetta and Phenix build models with poor fit-to-density (according to three different measures of map correlation). Foldit players build realistic models with few clashes, and still fit the density with a high map correlation.
We also want to point out that the Foldit rankings were incredibly accurate in these puzzles! As most players are aware, the best-scoring solution in Foldit is not necessarily the most accurate scientifically (because the Foldit score function is not a perfect reflection of reality). This is why we run our scientific analysis on all of the high-scoring solutions, to see what actually looks best against the scientific data: sometimes it’s rank #2, and sometimes it’s rank #20. However, in all four of these cryo-EM puzzles, the #1 top-scoring Foldit solution also had the best scientific evaluation!
This is important because it supports the accuracy of the Foldit score function. Foldit players can have more confidence that when their score goes up, so does the scientific value of their solution. It should also give more confidence to other scientists that might want to collaborate with Foldit players in the future. We hope this is just the beginning for Foldit cryo-EM!
Finally, we want to thank all the Foldit players that participated in these cryo-EM puzzles! Even if you didn’t work directly on the models presented in the paper, your folding helps to drive the competition that leads to high-scoring solutions. We love to see Foldit players continuing to share ideas and set high standards for each other! Some of the Foldit players who worked on the solutions in the paper have written up their folding strategies, which you can read in the paper supplement.( Posted by bkoep 149 2966 | Mon, 11/11/2019 - 19:29 | 6 comments )
The Aflatoxin Challenge Returns!
The Aflatoxin Challenge is back! Since we left off last November, the Siegel Lab at UC Davis has been hard at work testing designs from Foldit players. Unfortunately, they ran into a major setback (all too common in scientific research), and had to go back to the drawing board to rethink their strategy. But they are back now with a new enzyme scaffold that is better suited to degrade the aflatoxin molecule, and they're asking Foldit players to redesign the enzyme so that it can bind aflatoxin more strongly!
Aflatoxin contamination in the food supply chain has resulted in health issues approaching epidemic status in developing countries, and vast food stores are deemed unsafe for consumption in regulated markets. There remain no effective means of aflatoxin removal that also maintain the food quality required for commercial products. Using modern synthetic biology tools, a UC Davis team of scientists in collaboration with the Mars Global Food Safety Center have spearheaded efforts to develop novel remediation tools.
In 2017, the Siegel Lab characterized a diverse panel of ~50 hydrolytic enzymes for expression and solubility. Then, a consortium with Mars, UC Davis, UW, Northeastern, ThermoFisher, FAO and PACA was developed around Foldit, so that citizen-scientist Foldit players might engineer new functionality into these hydrolytic enzymes and allow them to degrade the harmful aflatoxin molecule.
After the first 12 design rounds in Foldit, >500 designed proteins were tested—but not a single active enzyme was found! The UC Davis team went back and retested some fundamental assumptions that had been made when looking at the hydrolytic enzymes. They found that, at neutral pH, hydrolysis is not thermodynamically favorable for aflatoxin B1, and therefore it would have been impossible to develop a hydrolytic catalyst.
An alternative reaction
With this knowledge in hand, a new class of enzymes was targeted that catalyzes oxidative reactions, and requires nothing beyond O2. A set of ~20 diverse naturally occurring oxidative enzymes were synthesized and characterized. In initial activity screens, 2 of these were found to degrade all detectable aflatoxin. Today, we are restarting the Aflatoxin Challenge with a new Round 13 puzzle, in which players can redesign one of these active enzymes to improve hypothesized interactions with the aflatoxin molecule.
There is still a long way to go before this enzyme is efficient and specific enough for use in industrial settings. We are looking to the Foldit community to help us redesign the binding pocket. We hope Foldit players can introduce new packing interactions and hydrogen bonds with aflatoxin, to stabilize its hypothesized orientation and prime it for oxidation. We look forward to seeing what Foldit players can come up with! Play the new Aflatoxin Challenge: Round 13 puzzle now!
As in the previous aflatoxin puzzles, all Foldit player designs will be public domain. By participating in these Aflatoxin Challenge puzzles, the players agree that all player designs will be available permanently in the public domain, and the players will not seek intellectual property protection over the designs created as part of the challenge.( Posted by bkoep 149 2966 | Mon, 09/30/2019 - 16:42 | 0 comments )
Protein Design Critique: IL-7R Binder Redesign
You’re doing great so far! I've looked at your solutions from first 4 puzzle rounds, and I think a lot of your designs are going to work! I just wanted to remind everyone that, in addition to the Foldit score you get on each puzzle, in the end you'll also get a binding score based on our testing of these designs in the lab!
Designing a protein to fold precisely is a difficult problem! When we test your protein, we are testing whether the sequence you chose folds into the shape of your solution. In Foldit, you can change your solution into whatever shape you want, but in the lab your sequence might not fold into the shape you wanted. It took scientists decades to figure out the shape a given protein sequence folds into (they call this the Protein Folding Problem). The good news for you though is that the Protein Folding Problem has a really simple answer:
I want to emphasize a few guidelines you can use to ensure your designed fold is the most favorable state:
· Secondary structure - use lots of alpha-helices or beta sheets
· Puzzle score - try to have the best score for your chosen fold
· Short loops - you'll need to use loops, but keep them as short as possible
Next I’ll show some examples and give my thoughts on a few designs from Foldit players. Please note that all of these designs have been chosen because they showcase a single weakness in an otherwise excellent design. We don't mean to disparage anyone's designs—on the contrary, the solutions highlighted in this critique are among our favorites!
A study of two 3-helix bundles
While both of these structures emphasize secondary structure and well-packed cores, design A is more likely to fold because of its shorter loops.
The reason we prefer secondary structure to loops is that loops typically have many alternate conformations (decoys) that score the same or even better than the design model. Shorter loops mean fewer decoys and a better chance of folding as intended. For instance, one can imagine how the loop of design B could misfold so that the third helix is on the wrong side of the bundle.
Bad beta-sheet, better beta-sheet, best beta-sheet
Beta sheets are a tricky secondary structure, because they require distant parts of the protein chain to come together. The point I want to highlight here again is that shorter loops are almost always better. In design C, there are too many loop residues between the helices and sheets. These loop residues are likely to rearrange themselves in real life.
Design D has shorter loops, but I still see a few backbone H-bond pairs that are unsatisfied here. (Also, I'm not so sure about that ARG / GLU zipper there. ARG / GLU like to form helices, so I'd probably go with HIS / THR...)
Design E is an optimized Baker Lab design (not from the IL-7R series), but I wanted to include it to demonstrate my point. Look at how short those loops are! This is a difficult fold to master, but FoldIt players like challenges, right?
4-helix bundles, the good, the bad, and the ugly
When it comes to 4-helical bundles (and really all designed proteins), the name of the game is compact. You want your design to resemble a ball with all portions stabilized by at least 2 other secondary structures. Design H fails just that; it's too long and unsupported. This structure will almost certainly fold into something more compact in real life.
Design G also fails this rule, as it's leaving a large portion of the structure thin and unsupported. Those two helices would have been better on top of the protein like the good example is doing here.
Yes, design F would be better if the helices were longer, but we didn't give players enough residues for that (unfortunately, we're limited to small proteins for our lab experiment). If you run out of residues for good helix packing, you can try beta-sheets. Although, previous experiments have shown that helices are more robust than beta-sheets. So if the choice is between an okay beta-sheet and an okay helix, I'd go for the helix.
Don't try to make additional target contacts
First let me say that these designs are very interesting in that they make additional contacts with the target. Especially in design I, I'm not even sure I could design that with all the tools I have! But, I want to remind everyone that in this design challenge, folding is more important than binding.
You've already been given two helices that are guaranteed to bind the IL-7R. If you can just fold the rest of the protein into a stable fold then you'll have a binder!
Great 3-helix bundle, but that long loop isn't going to fly
Finally, one more design to really hammer home the message of shorter loops. Design K looks great with three well packed helices, but look a little closer and you'll see that a long loop is required to stretch back and meet the third helix. I'll admit, this protein has a chance to work, but with a loop that long, who knows where the final helix will actually fold...
Posted by bcov 149 13788 |
Fri, 08/16/2019 - 17:57 |
We have a lot more puzzles planned for this series, and we look forward to seeing more designs from Foldit players! Round 5 just closed, and we'll get started on the analysis of those solutions right away. In the mean time, check out the Round 6 puzzle, which is online now!
Redesigning IL-7R Binders
Hi Foldit players! We need your help redesigning protein binders!
I'm bcov, a graduate student in the Baker Lab. My PhD project is to make proteins that stick to other proteins. In my work, I’m given the model of a natural target protein and my task is to design a new protein that will bind to it. It turns out this problem is really hard because not only do my designed proteins need to bind to the target, but they have to properly fold first! Fortunately, I can use a high-throughput binding experiment that allows me to test 100,000 different proteins at once.
At the moment, I’m interested in studying the folding aspect of this problem. I have a clever experiment planned where I should be able to confirm the atomic accuracy of a designed protein even when it’s mixed with thousands of other proteins. For this experiment, I will need lots of binder designs that have different folds, but that share a common binding interface. I'm planning a series of Foldit puzzles in which players can redesign my binders while preserving the binding interface.
My designed binders target a protein called interleukin 7 receptor (IL-7R), which helps to regulate the human immune system, and is an important target for cancer therapy.
Here are the details of the experiment:
· I have 11 designed proteins that are confirmed to bind the target IL-7R
· I want to leave my designed binding interface the same, but redesign the rest of the protein
· In each puzzle, your task is to design the rest of the protein so that it folds the interface-side in precisely the right conformation
· Your designs will be tested for binding against IL-7R
· You will get a binding score based on how well your design binds in the wet lab
The binding score here is really cool actually. After we run the binding experiments at the end of the puzzle series, you will receive cold-hard data from the biochemistry lab about the binding strength of your design. Well-folded proteins that fold precisely into the puzzle structure will likely score the highest. Details about the binding score will be released later, but in general, there are three categories:
1. Your design did not bind to IL-7R
2. Your design bound to IL-7R but was worse than my design
3. Your design bound to IL-7R and was better than my design
If you end up in category 3, congrats! You beat me :P
Nearly all Foldit player designs will be tested experimentally. This is possible because we can test all the designs at the same time in our high-throughput binding experiment. Designs that look especially good will be tested multiple times with various mutations to increase data consistency.
Due to time constraints, puzzles in this series will be shorter than our normal week-long puzzles, and will only run for 4 days at a time. We'd like to generate as many variants as possible for the original 11 binders. So, don't worry if you miss a puzzle; there will be plenty more to follow up!
Check out the first puzzle of the series, Puzzle 1704: IL-7R Binder Redesign: Round 1, which is out now! Happy folding!
Update (8/16/2019): Read the followup post Protein Design Critique: IL-7R Binder Redesign
Protein Design Critique: Cubane FeS Binder
A few weeks ago, we challenged Foldit players to design a protein that could bind an iron-sulfur (FeS) cluster, in Puzzle 1688: Cubane FeS Binder Design. A cubane type [4Fe-4S] iron-sulfur cluster is a "cube" made out of alternating iron and sulfur atoms, and is bound by carefully-placed cysteine residues in a protein. Iron-sulfur metallo-proteins are responsible for electron transfer in light-harvesting, cellular respiration, many other processes. We'd like to design an iron-sulfur protein so that we can better understand electron transfer in proteins. By changing the environment around the iron-sulfur cluster, we could tune the electron transfer properties of the protein, which could open the door to metabolic engineering and new chemistry!
We asked Dr. Anindya Roy, the Baker Lab’s expert on redox proteins, to take a look at Foldit players’ designs from Puzzle 1688. Below are some comments from Anindya, which we hope players will take into account for the Round 2 puzzle, which is online now!
We were very excited about the structural diversity of designs by Foldit players, who developed a variety of different protein folds! Many natural redox proteins adopt a ferredoxin fold, with a secondary structure pattern of (β-α-β)2, and we were worried that Foldit players might also favor the same ferredoxin fold. We were happy to see lots of helical bundles and other α/β folds with different secondary structure patterns, because these folds might have properties that are not possible with the ferredoxins typically found in nature. We encourage players to keep exploring helical bundles and other folds!
Room for improvement
In these initial designs, the two main areas for improvement are excessive loops and incomplete burial of the FeS cluster.
The cubane FeS cluster should be buried inside the protein core as much as possible. If the FeS cluster is to be used to catalyze a chemical reaction, then we want the active site to be protected from the water surrounding the protein. The top-scoring design by toshiue and Wilm, shown below, does a good job of burying the FeS cluster. The frozen FeS-binding loop is highlighted in blue and purple, with helices packing nicely against the cluster on three sides, shielding it from water.
If we zoom in on the FeS cluster, we can see some other nice features of this design. We like to see large, aromatic residues packed near the FeS cluster, like the TRP residue at the left of this protein. Players should try to design aromatic PHE, TRP, and TYR residues around the FeS cluster. Also, because the FeS cluster is negatively charged, it can be stabilized with complementary positively-charged residues, like the LYS residue shown beneath the cluster here. Players should also design positively charged LYS and ARG residues near the FeS cluster.
Unfortunately, we’re afraid this design has too many residues in loops, and not enough secondary structure. We can see in the first image that the frozen loop has been extended to make an even longer loop, which is unlikely to fold as intended. In order for these protein designs to fold up with high stability, we want to minimize the amount of loops in the structure. The more residues in helices and sheets, the better!
Below is a design by Galaxie and grogar7 that has a much smaller proportion of loop residues. The FeS-binding loop is flanked closely by long, stable helices on either side, and all of the other helices are connected by minimal loops. This design would have a much better chance of folding up into a stable structure.
However, in this design the FeS cluster is not completely buried, and will be exposed to the water surrounding the protein. This means we have less control over the electron transfer properties of the FeS cluster, which makes it harder to design an enzyme that can catalyze chemical reactions. One way to improve this design would be to extend the helices on either side of the FeS cluster in order to bury it away from the surrounding solvent.
This design also features lots of positively charged LYS and ARG residues at the binding site, which help to stabilize the negatively charged FeS cluster. Keep in mind that these charged residues have polar atoms that like to make hydrogen bonds. On the protein surface, they can make hydrogen bonds with the surrounding water; but if they’re buried away from solvent then they need to make hydrogen bonds within the protein!
Posted by bkoep 149 2966 |
Fri, 07/19/2019 - 22:57 |
In summary, we encourage Foldit players to design more helical bundles and other folds, with a focus on minimizing loop residues and burying the FeS cluster in the protein core! Large aromatic residues (PHE, TRP, TYR) and positively charged residues (LYS, ARG) help to stabilize the FeS binding site! Play Puzzle 1701: Cubane FeS Binder Design: Round 2 now!