Continuing the battle against Ebola
As you know, the current Ebola outbreak in Western Africa has now claimed over 1,000 lives, making it the worst Ebola outbreak to date. Currently, no proven treatment or vaccine exists to combat Ebola. It has been some time since our last Ebola design puzzles (http://fold.it/portal/node/997612 and https://fold.it/portal/node/997525) were posted. These, in combination with the initial hotspot-finding puzzle (https://fold.it/portal/node/996919), have both yielded some great leads that we are currently working to test in the wet lab. In particular, some of the best player hotspots yielded excellent starting points for the design of small, cyclic peptides that we're quite excited about.
The process of going from a candidate design to a drug ready for deployment in the field is a long one. We start by screening a large number of candidates for binding to the Ebola surface glycoprotein, using high-throughput methods. Because the Ebola virus is obviously quite dangerous, we can't work with it ourselves in our wet lab. We therefore use Ebola proteins that have been made in harmless strains of bacteria, using recombinant DNA technology. As for the candidate designs, we express these in baker's yeast, also using recombinant techniques. Because we are using high-throughput methods at this stage, we expect some false positives from our screen. Some candidates, for example, are just generally "sticky", and bind to pretty much anything. These would not be useful as drugs, since they'd stick to all sorts of other things in the body, but they show up in the initial screen as a hit.
Once we have some possible hits from our initial screen, we express and purify larger amounts of these second-round candidates for more careful experimental validation using lower-throughput techniques. At this stage, we need to do careful controls to confirm that the second-round candidates aren't just generally sticky. We also start to get quantitative at this point, to see whether a design binds tightly enough to be a useful drug, or whether it needs further redesign to improve its binding affinity.
Through collaborators, we will ultimately test candidates that pass all of our tests in cell culture, then in animals infected with the virus. The road to human trials is longer still, since it's necessary to show that a drug is safe before it can be given to sick people. Even with a disease as horrible as Ebola, one has to be sure that the treatment isn't worse than the disease. We're not yet anywhere near the stage at which candidates could be administered to people suffering from the Ebola virus -- but with time, effort, and patience, we hope to get there.
In the mean time, now that CASP is winding down, we will be posting a few more Ebola puzzles in the next few days. In particular, I'm curious about whether players could do better than the automated algorithms at designing a heavily disulfide cross-linked peptide to bind to the Ebola glycoprotein... We look forward to finding out!( Posted by v_mulligan 134 3362 | Thu, 08/14/2014 - 04:32 | 0 comments )
The story of a Foldit design
As discussed in previous blog posts, some Foldit player solutions from design puzzles are chosen for synthesis in the Baker Lab. One design in particular, from Puzzle 854, has recently yielded some promising results in the wet lab. Below, we follow this design's journey from video game to test tube:
1. Once we select a design to be synthesized in the lab, we extract the amino acid sequence from the design and reverse transcribe this into a sequence of DNA bases (i.e. a gene), adding a special tag that will come in handy later.
2. We order this gene as a DNA molecule from a gene synthesis company, and splice the gene into a larger, circular piece of DNA called a plasmid. The plasmid now containing our gene is inserted into E. coli bacteria, which we allow to grow and reproduce in an incubator. These bacteria will transcribe and translate our gene as if it were one of their own, producing our design as a polypeptide chain; if the protein design is good, the polypeptide chain will naturally fold up into the design structure.
3. Once the bacteria have grown to saturation and produced a large amount of our protein, we break open the bacteria cells and separate our protein from the other bits of E. coli using the special tag we added in step 1. At this stage, we can see whether the bacteria were able to produce our protein and whether the protein is soluble. Unstructured proteins will usually be degraded by the E. coli or otherwise form insoluble aggregates.
SDS-PAGE. A mixed protein sample (stained blue) is passed through a polyacrylamide gel from top to bottom, such that smaller proteins travel faster through the gel. Here, three samples are shown: all soluble proteins from a bacterial cell (left), proteins lacking the special tag we added in step 1 (middle), and proteins with the special tag (right). Although the first two samples have many bands (different proteins) spread across the length of the gel, the sample on the right is dominated by a single large blot (our protein) near the bottom of the gel.
4. We use size-exclusion chromatography (SEC), which separates proteins based on their size, to get rid of other protein impurities. This step also gives us information about the oligomeric state of our protein (unstable proteins with exposed hydrophobic residues tend to self-associate into oligomers). Structured monomers will behave differently on the column than oligomers or unstructured aggregates.
SEC Trace. Proteins are passed through a matrix such that larger proteins travel faster through the matrix and are collected sooner (at the left end of the x-axis); absorbance of UV light is used to measure the protein concentration (y-axis) of samples as they are collected. You can see that in the case of our protein, this step was hardly necessary because the protein is unusually pure, evidenced by a single dominant peak at 14 mL. Furthermore, the placement of the peak at 14 mL corresponds precisely to the expected size of the design, indicating the protein is monomeric.
5. After we have purified our protein, we can use circular dichroism (CD) to measure its secondary structure content. This technique measures a protein's absorption of circularly-polarized light and can tell us about the amount of α-helix or β-sheet in the protein. This measure also allows us to monitor how the protein unfolds when we raise the temperature.
Circular Dichroism. Different elements of protein structure interact differently with circularly-polarized light. At 25°C (blue trace, top) our protein shows a CD profile characteristic of a protein with a large α-helical content. At 95°C (red trace, top), the shape of the profile is less pronounced, indicating a loss of secondary structure; the secondary structure is recovered upon cooling back to 25°C (green trace, top). The bottom trace shows how the CD signal at 220 nm changes as we raise the temperature of the protein sample. The gradual, broad slope of this trace indicates noncooperative, multi-state unfolding.
The protein described above is being prepared for crystallization. If successful, we may be able to obtain a high-resolution crystal structure of the protein and make a comparison with the designed structure. Please note that we are continuing to work with other proteins that may be less well-behaved, and hope to order new designs soon!
Check out the latest design puzzle here!( Posted by bkoep 134 1051 | Wed, 06/18/2014 - 00:25 | 3 comments )
Dartmouth College Scientists need your help with a Brain Cancer-related puzzle!
Having been a (novice) Foldit player myself I was aware of the potential utility of this powerful computational platform in modeling proteins even before I joined my current laboratory. When the topic of needing a structure for our protein came up in a meeting, of course Foldit was on the top of my mind. We hope that the players of Foldit can help us uncover the structural basis of Id2 activity and thereby inform the development of novel targeted therapies for Glioblastoma.
Inhibitor of DNA Binding 2 (Id2), a helix-loop-helix (HLH) protein, inhibits normal gene expression by binding and suppressing other transcription factors. Recent data from our laboratory show Id2 is a key protein in the pathogenesis of a subset of aggressive brain cancers (Glioblastoma). The structure of the HLH domain of Id2 is well characterized. However, the unknown terminal regions are very important for regulation of the protein. Having the structures for the terminal regions will help us understand how the regulation of Id2 alters its structure and function. This will help us identify regions of Id2, or proteins that
interact with Id2, which are important for degradation. We could use this knowledge to develop drugs that promote the degradation of Id2 to treat Glioblastoma.
Try out our Brain Cancer-Related Phosphorylated Id2 puzzle here. Thanks!( Posted by cymbal_king 134 1942 | Thu, 05/29/2014 - 16:06 | 1 comment )
Wiggle Power Results
We recently posted a bunch of De-novo puzzles where the "High Wiggle Power" option was disabled and hopefully the results from those puzzles will explain why we've given High Wiggle Power a time-out during CASP11.
Below are RMSD plots for De-novo Freestyle 36 puzzles 864: Low Power & 868: High Power. The green dots represent your many different Foldit predictions, and for all these RMSD plots, you want to be as close to the left as possible (an RMSD=0.0 would be a perfect match to the native).
You can see that the top-scoring Foldit solution (the lowest Rosetta energy) doesn't change much between puzzles RMSD-wise. So although the high-scoring Foldit solution for puzzle 868: High Power was 9,208 (compared to 9,098 in puzzle 864: Low Power), it is not any closer to the native.
In general (and this was the case for all the Low Power/High Power plot comparisons) although the scores were better in the High Power rounds, the models were not any more accurate. We hypothesized that this could be happening because we allowed you to load in solutions from the Low Power rounds, and therefore the High Power round was mostly "drilling" down the energy landscape of those previous models (since doing that would obviously improve the in-game score!).
This is why for 880: De-novo Freestyle 38: High Power we did not let you load in solutions from the previous Low Power round.
You can see in the plots below that this High Power round had fewer green dots, but unfortunately the results are actually much worse than the Low Power round:
On the left, 876: De-novo Freestyle 38: Low Power has a very nice plot where the top-scoring Foldit model is one of the left-most points. This is not the case on the right, where the top-scoring Foldit model from 880: De-novo Freestyle 38: High Power is much further from the native than in the Low Power round.
The exciting news, however, is that the results from the Predicted Contacts rounds have been very promising!
You can see this below for the De-novo Freestyle 37 puzzles:
On the left, the top-scoring solutions for 867: De-novo Freestyle 37: Low Power are not the left-most points on the plot (they are quite far away from the native topology) but given predicted contacts, your results on the right for 875b: De-novo Freestyle 37: Predicted Contacts look great!
So we are looking forward to the Contact-assisted CASP11 targets, and hopefully this post explains why we'll give High Wiggle Power a rest during CASP11. The CASP season is long and busy enough that we don't want to waste your time gaining Foldit points, but not getting more accurate solutions!
Lastly, Seth and I wanted to thank all of you in the DC area who stopped by when we presented Foldit (and debuted nanocrafter) at the 3rd USA Science & Engineering Festival.
Next time we promise to give everyone a little bit more advanced notice, and we'll make sure to have a camera ready from day 1. At least we managed to snap a photo with Galaxie on the last day:
Thanks for all your hard work, everybody... and keep up the great folding!( Posted by beta_helix 134 3362 | Tue, 05/13/2014 - 16:10 | 8 comments )