The key secret to finding best structures: Exploration
This is story about the Puzzle 138 (Rosetta Decoy 2). It is fascinating because it uncovers where we as developers should be focusing our design energy, and it also indicates that a change of game playing strategy may benefit all results. The short story is pay close attention to solutions that have relatively good score (but perhaps not the best), but appear noticeably different in structure from other solutions.
To demonstrate the point here is the plot of the initial solutions of puzzle 138. Vertical axis is the foldit score (expressed as a rosetta energy function where lower is better). The horizontal axis is the root mean square distance of backbone molecules, so the smaller it is the closer we are to the solution in the sense of per-molecule distance. The blue dot is the native structure from x-ray crystallography, so it could still have some slight errors, but it's the closest we know ho to get to the real thing. The graph shows that there is a large groups of solutions that are slight modifications of the starting solution, that is the molecules don't really move a lot. Still we see that to get to the native, lots of molecule movement needs to happen (towards the left on the graph). In fact we see that foldit solutions are hitting a virtual wall, where the score function lowers without approaching the solution. You will also notice that a lone exploerer has a solution that according to the foldit score is not the best, but on the graph appears much closer to the solution.
The question in our mind was this: if we started from this lone island discovered by foldit players, could the players do much better? So we posed another puzzle with the starting point shown in the black in the second image. The answer is emphatically yes! The Cyan points on the graph show the solutions after the second puzzle, which effectively shows that the solution is found.
This finding hold a key to all successful strategies. It's good to explore, and to actually move the protein in significant ways. you may get more points with hard-to-notice shakes and wiggles, but to truly explore better solutions one must venture into the unknown. It also shows that although the solution may not be good right away, after a bit more game play the significantly better solution may be found. So, exploration is the key.
In the near future we will focus on ways to encourage exploration within the game play. For the foldit players I would suggest paying close attention to the solutions that appear relatively close and likely worse in score to the best known solution, but that are significantly different in shape. Every time one of these solutions is found, it should be carefully explored. Chances are the true native solution may be nearby.( Posted by zoran 73 1944 | Thu, 07/16/2009 - 19:21 | 5 comments )
Design of inhibitors of proteins from pathogens
This is an exciting time in computational protein design. We now have atomically detailed structures of many of the proteins that promote virulence of human pathogens. In some rare cases, we even have the structures of these proteins bound to inhibitors that reduce or abolish virulence. Such structures provide us with a unique window through which we can see the pathogen's Achilles' heel. If we could graft the core parts of the interaction surface from the inhibitor onto a protein that is easy to manufacture and robust to the vagaries of storage and drug administration, we should have a good chance of combating serious illness and its transmission resulting from these pathogens.
Pandemic influenza of the H5N1 and H1N1 strains, responsible for such scourges as Spanish, avian, and swine flu, presents one of several such well-characterized cases. Key to flu's virulence are a pair of proteins named hemagglutinin and neuraminidase -the H and N in the strain names, respectively. Recently, the structure of hemagglutinin has been revealed bound to an inhibitory antibody. This structure shows in atomic detail how and why this antibody inhibits influenza, whereas most of the antibodies produced by our bodies against hemagglutinin fail to inhibit it. Trouble is that antibodies are prohibitively expensive to manufacture and are fragile, so this antibody cannot be administered as a drug to patients. However, by revealing hemagglutinin's soft spot, we can now design other proteins that would be able to present the same key surface as the antibody's but on a different, easier to handle protein scaffold.
As a first step in any protein design work that we do, we try to recapitulate the key interactions that are known to stabilize the protein complex. Think of this as being a sanity check for us to ensure that if we were 'lucky' and happened to get just the right protein scaffold for inhibiting our target, we would in fact be able to redesign it correctly as an inhibitor. To do this, we strip off the residues that form the key interactions with the target protein. We then go one by one and attempt to redesign them, not knowing what the true identity is. You can now try out this process for yourselves! The new puzzle entitled "Flu Virus Design" will present you with the antibody bound to hemagglutinin, but some of its surface positions will be shaved to alanine. Can you use Rosetta and your intuition to rebuild these sidechains correctly? Let us know by playing!
Finally, we are now using advanced protein-redesign strategies to produce proteins that would present the same key surfaces that interact with hemagglutinin. Once we have a few that we think are acceptable we would love to let you have a go at those as well! We'll keep you posted.
For more reading material on influenza hemagglutinin, visit this wikipaedia page:
Can we design?
This last round of design puzzles is setup to test some specific aspect of FoldIt. It is a test for us and the players. From prediction puzzles, we know players can find the target fold when a sequence is given, and now we are wondering if players can find the right structure AND the right sequence. The task for the players is to rebuild a beta-strand and choose a new set of amino acids on it in the context of a partially ripped-open core, namely the three positions highlighted in the first figure. And the task for us is to figure out what needs to change in FoldIt to allow players to distinguish desirable features.
In the second figure below are comparisons between the native structure, a highly ranked player solution, and another player solution at ~1000th rank. This puzzle is interesting because we know the native protein can be improved to be more stable. The target is a small 56 amino acid protein called protein G and is used widely in protein biochemistry labs as a model system for studying proteins. Players generally do well and place the strand in the right place, and it is clear that people come up with different solutions to the problem. Now the question is: the native protein has a small void at the end of the chain, and one of the best player solutions captures that. But is it actually a better structure than the 1000th ranked solution that uses a different strategy to push out the void? It is very hard to say one way or the other. From a designer's perspective, the structure with no void is better because if you look more closely, there are also fewer buried unsatisfied polar atoms. Yet we know the native works, so whatever is closer to it has very good odds.
In the third figure below, I am showing the comparison between the starting puzzle (in dark grey), Native in magenta, and the two aforementioned solutions (high score in green, 1000th in blue). It shows that the models only differ subtly, with the 1000th ranked structure being slightly more compact. To really call this, we'll have to test these sequences in the wet lab, and this may happen in the future. The task for us is to understand what's lacking in our ability to differentiate the good designs from the bad -- knowing that the score isn't a perfect model of the real world.( Posted by possu 73 1944 | Fri, 06/19/2009 - 01:52 | 5 comments )
Quest to Native results!
The results from the recent Quest to the Native puzzles have been very interesting. In these challenges we gave you a starting structure generated by rosetta@home that was far from the native structure, but included a ghost of the native protein to guide your folding.
The figure below shows the final results for "Quest to the Native 1".
On the y-axis is the Foldit score and on the x-axis distance from the native structure (closer to the left is better). The green dots represent the over 13,000 predictions that Foldit players have generated. The red dot at the bottom right is the starting Foldit puzzle,
There are two notable things about these results. First, it is clear that you have collectively mastered the tools implemented in fold.it so that you can generate a very accurate model starting with a quite inaccurate one. Second, there is an important lesson in the distributions of scores in the figure. Starting from the puzzle (the red dot), many players were able to increase the score without making very large changes in the structure (the score increases, but the distance from the native structure stays roughly the same). A second group of players made much larger changes in the structure, and were able to achieve much higher scores in doing so.
Of course, in real life puzzles the structure won’t be known already, so there won’t be a guide. But the two points above still hold. First, it is almost certain that you can get to the correct structure from the puzzle starting points with the tools in fold.it—but of course you have try out a number of possibilities because there is no guide. Second, while you can likely improve your score by searching close by the starting point, this is probably not where the real jackpot is—to get very high scores and to reach the native structure is likely to require more substantial changes in structure, similar to those in the Quest to the Native puzzles.
So the most important take home message is explore as much as possible, and don’t be afraid to stray from the starting puzzle conformation. In all hands competitions, consider the various possible starting points, not just the highest scoring, and again don’t spend most of your time really close to the starting point.
Like in science and many other aspects of life, innovation is the key to success!
I want to thank all of you again for playing fold.it—you are I believe showing the way to a completely new and powerful approach to key biomedical research problems and scientific problems more generally. We will soon be writing a paper describing your collective efforts that will announce this new approach to the scientific community.( Posted by David Baker 73 1944 | Mon, 06/01/2009 - 17:54 | 5 comments )
HIV design challenge
One of the biggest challenges facing protein design today is to model protein backbones. Unlike prediction, where a sequence is given, a design puzzle has more hidden traps since amino acids can change, allowing multiple (potentially false) answers to nearly identical problems. We are not at a stage where we know definitively how to choose the solutions that will work when produced and tested in the lab. Historically the complexity of a design problem is reduced by holding the backbones fixed. With the FoldIt game, we are asking users to sample flexible backbone designs with score-guided intuition to tackle this problem: how do we build one protein segment at a time under the constraint of a native scaffolding while maintaining the "foldability" of a sequence? Similar to prediction puzzles, we set up scores based on known metrics for assessing the quality of these models, and ultimately try to correlate these measures with experimental data to understand the underlying design principles, while iteratively improving them at the same time.
Using GP120, the HIV protein responsible for entry into host cells, as a model system is significant in that 1) it remains a viable candidate as an AIDS vaccine, and 2) it has an unusual topology. Having the tools to understand and engineer this molecule would contribute greatly to both AIDS research and protein engineering. Among several mechanisms used, the molecule uses a number of variable loops to distract the immune system from forging an effective response. In a nutshell, GP120s are located on the very outside of a viral particle, the envelope, and when the immune system sees it, a wave of antibodies are produced to try to neutralize the pathogen. The problem, however, is that although we make antibodies against this molecule, they are mostly directed to attack loops that are not related to the central machinery (which is hidden) responsible for invading host cells. To focus the response, our strategy in designing the vaccine is to expose the elements directly responsible (marked with CD4bs in the attached figure) by creating viral free proteins that resemble GP120 but lack its cloaking machineries (by editing structural regions marked with A,B,C,D in the figure, for example). In other words, we are interested in trimming away these loops while preserving the area on the surface vulnerable to neutralizing antibodies. We are hopeful because antibodies that can neutralize a wide variety of HIV strains do exist. The idea is to create a "mold" based on the known broadly neutralizing antibodies and present it to the immune system for the production of similar antibodies. The term "reverse vaccinology" has been coined to describe this strategy -- we know some antibodies work; now we try to produce copies of them by guiding the human body to make them.
As described previously, maintaining a protein structure while doing extensive remodeling work remains a challenge. However, one can intuitively imagine "protein sculpting" being useful in many applications of protein design. Besides the AIDS challenge presented above, we can apply the same shaping strategy to improve enzyme actives sites to make them more active, to alter cellular signaling by modulating the strength of proteins interacting with each other, and to create protein chimeras by shaping different parts to fit together for new functions, just to name a few. We are starting to address these problems with FoldIt. Designs that are judged plausible will be systematically studied by actually making them in the lab and testing for their (hopefully improved) functions. The goal is to learn enough about proteins through this process: to fundamentally improve our understanding of protein biochemistry and potentially create a vaccine or an enzyme along the way.( Posted by possu 73 1944 | Thu, 05/21/2009 - 18:20 | 1 comment )