The Physics of Foldit Part I: Why Is It So Darn Difficult To Come Up With A Scoring System?
Today during the scientist chat, players asked a number of excellent questions about the physics behind Foldit. In particular, a number of people were curious about score function development. I think that one of the big questions for a lot of people is, why should it be so hard to figure out how to score a protein properly? After all, we know the physics: the equations of quantum mechanics and quantum electrodynamics, which ultimately determine almost all of the behavior of a protein molecule, have been known for half a century or more. We should be able to plug values into the equations and get a physically meaningful number out, right? Why, then, do we have to keep tweaking and improving the scoring?
It turns out that things aren't quite that simple, for a number of reasons. The first is that the actual quantum mechanical equations can only be solved exactly for very simple systems (basically, a single electron bound to a single proton -- a hydrogen atom). There are very good approximations that can be applied to get solutions for more complicated systems, like helium or the dihydrogen molecule. Beyond this, there are looser approximations that can give results for complicated molecular systems, though the computational cost of applying these is astronomical. Worse, they scale horribly -- meaning that as you add more atoms to the system, the computational cost of the calculation goes up at an ever-increasing rate. Both Rosetta (the automated software that the Baker lab has developed for protein structure prediction and design) and Foldit (the game, that shares much of the core machinery with Rosetta) rely on being able to compute the scoring function quickly, at low computational cost. So we need approximations that can be calculated easily without compromising accuracy too much.
And so we make a number of approximations. Many involve treating proteins as classical (Newtonian) systems rather than as quantum mechanical systems: in a nutshell, we pretend that a protein is a collection of billiard ball-like atoms with well-defined positions, rather than a blob of probability distributions described by wave equations. This is a pretty good approximation, so long as we make sure that the "billiard balls" attract and repel each other like soft, spongy objects rather than like hard-edged objects -- but all of this greatly simplifies the math and turns the problem into one that's relatively easy to calculate. Beyond this, we use "knowledge-based" scoring terms for some things, rather than "physics-based" scoring terms. For example, while the interaction between any pair of amino acids might depend on very complicated physics, we don't necessarily need to do a complicated physics-based calculation to figure out whether those two amino acids are likely to interact. We have that information already, from the statistics that we can extract from libraries of known protein structures. Knowledge-based terms like the "pair potential", based on statistics taken from observation rather than derived from physical principles, help to simplify the computation enormously (though in recent years, we've been moving towards a more physics-based scoring function, and removing some of the knowledge-based terms).
One of the biggest approximations that we need to make is to treat the protein like an object floating in a vacuum. In actuality, the water surrounding the protein is incredibly important -- so important, in fact, that it provides the main driving force for a protein molecule to fold (the so-called "hydrophobic effect"). We make an enormous approximation by replacing explicitly-modelled water molecules with a "solvation term" that tries to approximate the effect of those thousands of water molecules.
In the next blog post in this series, I'll talk more about water, and why it's so important -- and about why its effect is so difficult to model computationally.
--v_mulligan (Vikram K. Mulligan)( Posted by v_mulligan 78 1936 | Wed, 01/22/2014 - 21:10 | 0 comments )