Mini CASP-like competition starts today
We have exciting news!
We were able to obtain sequences for proteins that are currently
unsolved, but will be released to the Protein Data Bank very soon!
In fact, today's first puzzle will be released to the PDB next week,
so we need you to start folding quickly!
After the first puzzle (released today) expires on August 8th, we will
release another puzzle where you will have more time. The second
protein will not be deposited into the PDB until the end of the month.
We will give you starting models outputted from Rosetta@home, and
since we have no idea which predictions are correct we will try to
give you as diverse a set as possible.
We are very excited to be able to run this blind experiment where the
proteins will be released so soon!
This will be perfect practice for everyone as CASP9 starts next May,
and it will help us figure out what are the best Rosetta models to use
as starting points for Foldit.
The key secret to finding best structures: Exploration
This is story about the Puzzle 138 (Rosetta Decoy 2). It is fascinating because it uncovers where we as developers should be focusing our design energy, and it also indicates that a change of game playing strategy may benefit all results. The short story is pay close attention to solutions that have relatively good score (but perhaps not the best), but appear noticeably different in structure from other solutions.
To demonstrate the point here is the plot of the initial solutions of puzzle 138. Vertical axis is the foldit score (expressed as a rosetta energy function where lower is better). The horizontal axis is the root mean square distance of backbone molecules, so the smaller it is the closer we are to the solution in the sense of per-molecule distance. The blue dot is the native structure from x-ray crystallography, so it could still have some slight errors, but it's the closest we know ho to get to the real thing. The graph shows that there is a large groups of solutions that are slight modifications of the starting solution, that is the molecules don't really move a lot. Still we see that to get to the native, lots of molecule movement needs to happen (towards the left on the graph). In fact we see that foldit solutions are hitting a virtual wall, where the score function lowers without approaching the solution. You will also notice that a lone exploerer has a solution that according to the foldit score is not the best, but on the graph appears much closer to the solution.
The question in our mind was this: if we started from this lone island discovered by foldit players, could the players do much better? So we posed another puzzle with the starting point shown in the black in the second image. The answer is emphatically yes! The Cyan points on the graph show the solutions after the second puzzle, which effectively shows that the solution is found.
This finding hold a key to all successful strategies. It's good to explore, and to actually move the protein in significant ways. you may get more points with hard-to-notice shakes and wiggles, but to truly explore better solutions one must venture into the unknown. It also shows that although the solution may not be good right away, after a bit more game play the significantly better solution may be found. So, exploration is the key.
In the near future we will focus on ways to encourage exploration within the game play. For the foldit players I would suggest paying close attention to the solutions that appear relatively close and likely worse in score to the best known solution, but that are significantly different in shape. Every time one of these solutions is found, it should be carefully explored. Chances are the true native solution may be nearby.( Posted by zoran 93 2664 | Thu, 07/16/2009 - 19:21 | 5 comments )
Design of inhibitors of proteins from pathogens
This is an exciting time in computational protein design. We now have atomically detailed structures of many of the proteins that promote virulence of human pathogens. In some rare cases, we even have the structures of these proteins bound to inhibitors that reduce or abolish virulence. Such structures provide us with a unique window through which we can see the pathogen's Achilles' heel. If we could graft the core parts of the interaction surface from the inhibitor onto a protein that is easy to manufacture and robust to the vagaries of storage and drug administration, we should have a good chance of combating serious illness and its transmission resulting from these pathogens.
Pandemic influenza of the H5N1 and H1N1 strains, responsible for such scourges as Spanish, avian, and swine flu, presents one of several such well-characterized cases. Key to flu's virulence are a pair of proteins named hemagglutinin and neuraminidase -the H and N in the strain names, respectively. Recently, the structure of hemagglutinin has been revealed bound to an inhibitory antibody. This structure shows in atomic detail how and why this antibody inhibits influenza, whereas most of the antibodies produced by our bodies against hemagglutinin fail to inhibit it. Trouble is that antibodies are prohibitively expensive to manufacture and are fragile, so this antibody cannot be administered as a drug to patients. However, by revealing hemagglutinin's soft spot, we can now design other proteins that would be able to present the same key surface as the antibody's but on a different, easier to handle protein scaffold.
As a first step in any protein design work that we do, we try to recapitulate the key interactions that are known to stabilize the protein complex. Think of this as being a sanity check for us to ensure that if we were 'lucky' and happened to get just the right protein scaffold for inhibiting our target, we would in fact be able to redesign it correctly as an inhibitor. To do this, we strip off the residues that form the key interactions with the target protein. We then go one by one and attempt to redesign them, not knowing what the true identity is. You can now try out this process for yourselves! The new puzzle entitled "Flu Virus Design" will present you with the antibody bound to hemagglutinin, but some of its surface positions will be shaved to alanine. Can you use Rosetta and your intuition to rebuild these sidechains correctly? Let us know by playing!
Finally, we are now using advanced protein-redesign strategies to produce proteins that would present the same key surfaces that interact with hemagglutinin. Once we have a few that we think are acceptable we would love to let you have a go at those as well! We'll keep you posted.
For more reading material on influenza hemagglutinin, visit this wikipaedia page:
Can we design?
This last round of design puzzles is setup to test some specific aspect of FoldIt. It is a test for us and the players. From prediction puzzles, we know players can find the target fold when a sequence is given, and now we are wondering if players can find the right structure AND the right sequence. The task for the players is to rebuild a beta-strand and choose a new set of amino acids on it in the context of a partially ripped-open core, namely the three positions highlighted in the first figure. And the task for us is to figure out what needs to change in FoldIt to allow players to distinguish desirable features.
In the second figure below are comparisons between the native structure, a highly ranked player solution, and another player solution at ~1000th rank. This puzzle is interesting because we know the native protein can be improved to be more stable. The target is a small 56 amino acid protein called protein G and is used widely in protein biochemistry labs as a model system for studying proteins. Players generally do well and place the strand in the right place, and it is clear that people come up with different solutions to the problem. Now the question is: the native protein has a small void at the end of the chain, and one of the best player solutions captures that. But is it actually a better structure than the 1000th ranked solution that uses a different strategy to push out the void? It is very hard to say one way or the other. From a designer's perspective, the structure with no void is better because if you look more closely, there are also fewer buried unsatisfied polar atoms. Yet we know the native works, so whatever is closer to it has very good odds.
In the third figure below, I am showing the comparison between the starting puzzle (in dark grey), Native in magenta, and the two aforementioned solutions (high score in green, 1000th in blue). It shows that the models only differ subtly, with the 1000th ranked structure being slightly more compact. To really call this, we'll have to test these sequences in the wet lab, and this may happen in the future. The task for us is to understand what's lacking in our ability to differentiate the good designs from the bad -- knowing that the score isn't a perfect model of the real world.( Posted by possu 93 2664 | Fri, 06/19/2009 - 01:52 | 5 comments )
Quest to Native results!
The results from the recent Quest to the Native puzzles have been very interesting. In these challenges we gave you a starting structure generated by rosetta@home that was far from the native structure, but included a ghost of the native protein to guide your folding.
The figure below shows the final results for "Quest to the Native 1".
On the y-axis is the Foldit score and on the x-axis distance from the native structure (closer to the left is better). The green dots represent the over 13,000 predictions that Foldit players have generated. The red dot at the bottom right is the starting Foldit puzzle,
There are two notable things about these results. First, it is clear that you have collectively mastered the tools implemented in fold.it so that you can generate a very accurate model starting with a quite inaccurate one. Second, there is an important lesson in the distributions of scores in the figure. Starting from the puzzle (the red dot), many players were able to increase the score without making very large changes in the structure (the score increases, but the distance from the native structure stays roughly the same). A second group of players made much larger changes in the structure, and were able to achieve much higher scores in doing so.
Of course, in real life puzzles the structure won’t be known already, so there won’t be a guide. But the two points above still hold. First, it is almost certain that you can get to the correct structure from the puzzle starting points with the tools in fold.it—but of course you have try out a number of possibilities because there is no guide. Second, while you can likely improve your score by searching close by the starting point, this is probably not where the real jackpot is—to get very high scores and to reach the native structure is likely to require more substantial changes in structure, similar to those in the Quest to the Native puzzles.
So the most important take home message is explore as much as possible, and don’t be afraid to stray from the starting puzzle conformation. In all hands competitions, consider the various possible starting points, not just the highest scoring, and again don’t spend most of your time really close to the starting point.
Like in science and many other aspects of life, innovation is the key to success!
I want to thank all of you again for playing fold.it—you are I believe showing the way to a completely new and powerful approach to key biomedical research problems and scientific problems more generally. We will soon be writing a paper describing your collective efforts that will announce this new approach to the scientific community.( Posted by David Baker 93 2664 | Mon, 06/01/2009 - 17:54 | 5 comments )