7 replies [Last post]
Tlaloc's picture
User offline. Last seen 9 weeks 1 day ago. Offline
Joined: 08/04/2008
Groups: Mojo Risin'

One thing that has become apparent in CASP9 is that the top ranking foldit score is not necessarily the protein structure that is closest to the native structure. As measured by the GDT (global distance test), some of the lower scoring proteins have higher ranking GDTs. How correlated are score and GDT? Is it .98 or is it .15? If the correlation is low, then are we really working on the right problem here?

In foldit, we are trying to maximize foldit score, which is to say minimizing the energy score. If, however, a lower energy score is not indicative to being closer to the native structure, are any of the things we coming up with really of use?

In CASP9, the foldit team is submitting the top 5 scoring proteins on each puzzle. Could it be that the 10th highest scoring protein has the highest GDT? Or the 50th?

Educate me.

Madde's picture
User offline. Last seen 2 years 25 weeks ago. Offline
Joined: 05/29/2008
Groups: Void Crushers
The correlation of energy

The correlation of energy level/score and GDT is not like the red line in the picture but like the green graph. That's why it's possible that a lower scoring solution can be nearer to the native.

And FoldCentral doesn't just submit the top 5 scoring solutions. I think they look for clusters of similar solutions, pick the highest score of each cluster, compare them to each other and submit the 4 models who look most promising plus the one overall top scoring solution.

Correct me if I'm wrong.

beta_helix's picture
User offline. Last seen 1 day 14 hours ago. Offline
Joined: 05/09/2008
Groups: None
great question!

Tlaloc, this is one of the big problems in our field and is one of the main reasons we are allowed to submit 5 models to CASP and not just our top-scoring prediction.

Take this case from CASP8, for example:

The blue line is an amazing prediction, way better than every other group in the world, but this was a group's model 2 prediction, their model 1 was not as close to the native (and presumably did not have as good a score).

There is even a prediction category at CASP specifically dedicated to this problem:

"Quality assessment of models in general (without knowing native structures) and the reliability of predicting certain residues in particular (QA)."

and it is a very difficult category because if we had a perfect metric to distinguish between incorrect models and less-incorrect models we might have been able to solve the folding problem already!

Why we believe that the Rosetta energy function is useful (which is perfectly demonstrated in Madde's plot) is that the Rosetta energy of models that are very close to the native is generally much lower than models that are far from the native.
I posted an example of this in the blog a while back: http://fold.it/portal/node/985116

If you want an even more in depth explanation as to why you can be very close to the native (but not score as well) you can check out the youtube clip that Madde found of the talk I gave at Gnomedex:

You can skip to around the 10:45 mark where I talk about "why is protein folding so hard"

Later in that talk I even show a case where the 9th top-scoring Foldit player (I think it was vertex) had the closest to the native prediction for a particular puzzle, so you are quite right about worrying whether the 10th or 50th scoring one might be closer.

This is why we are doing exactly what Madde mentioned, not just submitting the top 5 scoring Foldit Predictions for each CASP target, but trying to submit a diverse set of models as well as the top-scoring Foldit prediction.

Once CASP9 wraps up (the last target was released today by the organizers) we will post all the CASP9 natives that have been solved and released to the PDB as Quest to the Native puzzles so that you can hopefully see that if you had been able to get the native topology you would have gotten a higher Foldit score.

Tlaloc, I hope this helps and that you feel like all your work is really of use, because we believe that it is!

Madde's picture
User offline. Last seen 2 years 25 weeks ago. Offline
Joined: 05/29/2008
Groups: Void Crushers

"[...] the Rosetta energy of models that are very close to the native is generally much lower than models that are far from the native."

I'm wondering if this is also the case for a single chain of a polymer.

Look at the native (green) structure of target T0527 (which is a polymer - I deleted the second protein chain) in the pictures here:
I think that not a single Foldit player with a sense of aesthetics would allow this helix to stay that exposed.

Joined: 09/18/2009
Groups: SETI.Germany

Right, Madde.
First thing I'd do is put this helix antenna more inside.

Joined: 06/17/2010
Native model is broken! For

Native model is broken! For sure! :]

kbob's picture
User offline. Last seen 1 year 37 weeks ago. Offline
Joined: 09/03/2010
Groups: None
Look at typical energy

Look at typical energy surface landscape of protein (pic A)


There's a global energy minimum and different local minima.
Closest local minimum (to global) shuldn't be deepest (of all local minima).

Look at "palm tree" disconnectivity graph (pic B).
Connecting path beetween local minima can lead you wrong way, far avay from global but deep on energy scale.

admin's picture
User offline. Last seen 17 weeks 17 hours ago. Offline
Joined: 11/10/2007
Groups: vi users
Firas, is it possible that

Firas, is it possible that sort of thing is simply due to the crystallisation process throwing off structures near the termini (assuming the "native" was obtained by x-ray crystallography)?


Developed by: UW Center for Game Science, UW Institute for Protein Design, Northeastern University, Vanderbilt University Meiler Lab, UC Davis
Supported by: DARPA, NSF, NIH, HHMI, Amazon, Microsoft, Adobe, Boehringer Ingelheim, RosettaCommons