Selection of Foldit Designs for Further Testing

Started by Susume

Susume Lv 1

This is a follow-up question to the March Science Chat. In the chat, bkoep outlined how a foldit solution is selected for further testing in Rosetta and the wet lab:

'There are two different pathways for a Foldit design to reach the wet lab. The first is that it is high-scoring. The cream of the crop is automatically submitted to Rosetta@home, mainly as a way to benchmark our progress from week to week. We don't even really inspect these designs. The second pathway involves manual inspection. Here we're looking for designs that look like promising folders to us. We start with the shared solutions. Then we move on to the bulk solutions, which are clustered (to remove identical or near-identical solutions), and ranked by score. We go down the list of clustered solutions (which naturally includes the top-ranking solutions) and inspect anything that looks "plausible"—usually around 100 models.' –bkoep

My question is, do you base the manual selection on objective measures (such as loop quality, realistic sidechain positions, etc.), or on expert judgement by the scientists, or some of both?

Also, can you give us a ballpark estimate for how many of the ~100 manually inspected models you typically send to Rosetta? Do you feel you are testing enough lower-scoring designs to form a good estimate of foldit's false negative rate (how often the score function misses a good design)?

Susume Lv 1

Someone pointed out that I seem to be saying the scientists are not objective - I don't mean that at all! Just curious how much of the selection process has been "mathematized" vs. expert insights that have not yet been turned into measurable quantities.

Bruno Kestemont Lv 1

I can have a visually perfect "3 sheets 2 helices" design that scores much lower than the top solution.
After the score, I can imagine that a good equilibrium in subscores and an almost perfect filter bonus are relevant criteria.
Good H-Bonds between sheets and sidechains (H-Bond networks), some orange inside and a good blue outside might be visual criteria as well (I suppose).

I wonder which kind of view the scientists use in order to visual select promising designs.

bkoep Staff Lv 1

The manual inspection is qualitative and definitely somewhat subjective, although I think we make an effort to focus on explicit criteria. In the past, we have had to keep an eye out for GLY in secondary structure, long disordered loops, and egregious backbone geometry. At this point, the current array of filters and score function amendments keeps most of these in check.

These days, a design is typically passed over because it suffers from one of the following flaws: significantly deformed secondary structure (e.g. bent helices, sheets with poor twist); any buried polar atoms that do not make hydrogen bonds; any secondary structure element (helix or strand) made up entirely of polar residues. Of the ~100 models that undergo visual inspection, usually 10-20 promising-looking designs are accepted and queued for Rosetta@home analysis.

We haven't tried to quantify any kind of false negative rate in the Foldit score function. While that might be an interesting statistic to note, I don't think it would be a proper measure of Foldit's usefulness. We don't expect the Foldit score function to necessarily be a good discriminator in all of protein design space (i.e. "anything above this score threshold will fold; anything below will not"). We've been tuning the score function almost exclusively to reject false positives; so long as the top-scoring solutions are good, we haven't worried too much about the things that aren't making the cut.

Of course, a corollary is that Foldit confines players to a tiny slice of protein design space, but that's where we have to start. Once we're confident in having mastered "idealized" protein design, then we can start to expand into more complex design space. But a generalized score function for protein design is something of a holy grail for the field, and I don't think that we can reasonably expect to stumble upon it anytime soon.

Susume Lv 1

This is very helpful; thanks! I guess anyone who finds the above flaws in their protein can try fixing them by hand, then bring the score back up as much as they can (maybe in another track) and share the result with scientists. Buried polar atoms (you can see polar atoms by using the color scheme score/hydro+CPK and showing sidechains) and all-polar (all-hydrophilic) structures can both creep into a design just by running mutate scripts on it.