Suggestion for SC Score system

Case number:699969-2010517
Opened by:nspc
Opened on:Tuesday, October 6, 2020 - 17:06
Last modified:Sunday, October 11, 2020 - 21:38

------ Concepts -------

-Rosetta score is accurate, and works nice to find a stable alone protein.

-Most of the metrics are usefull to eliminate bad solutions.
For that the score system is good, like "Core existance".

With some metrics that it is harder to get max value(Like BUNS or SC), eliminate bad solutions is not enough.
We need more accurate score system to compare solutions, and make sure we keep the best one.

Score system is like an Heuristic In Artificial Intelligence. If the score system is accurate to compare solutions,
there is less chance that a recipe miss a very good solution.

------ Playing experience -------

When playing I often saw thoses situations :

After using some recipes, the score system give more points when a protein is a bit detached, because that resolve some BUNS.
It often mutates to hydrophobic too, to resolve them. This should give less points.

If we detach the protein complety, we have max score for BUNS, I think we should have less.

For SC, if only a sidechain is near the target, and have very good SC, it will give max score, because the SC can be good with only this sidechain.
It is a problem when a second sidechain is is contact, and have a total lower SC, it will give less score.
So recipe can consider it is a worse solution, and will come back to detached position.

I think SC should be low if there is not enough sidechain in contact zone. Like when it is 0 when there is no contact.

------ Suggestion for SC Score -------

In Metric UI we can show an additional number near the SC value :
the "Sidechain number that are inclued in SC compute" (so the sidechains near the target).

The more we have sidechains near target, the more point we can have (algo to define).

If protein if detached, we will have 0 points.
We can have a max value for the number of sidechain needed.

For exemple, we can define 14 sidechains max, if player have only 7 sidechains in contact, we give only 50% of SC score (500 point if 0.6 SC).

If think if we do that, it is a little like rosetta score system, we have like a score part in each sidechain..
but with a limit in sidechain inclued, because all the protein doesn't need to have contact.

So scientists can find thoses values to have somethink more accurate.
Of course the system that give score depending of sidechain number can be non linear.

The same idea can be used for BUNS too.

thanks for reading my suggestion :)

(Tue, 10/06/2020 - 17:06  |  1 comment)

bkoep's picture
User offline. Last seen 1 day 16 hours ago. Offline
Joined: 11/15/2012
Groups: Foldit Staff

Thanks for the suggestion, nspc!

SC Metric
You're right that the SC score can be finicky, especially if your interface is incomplete. SC does not give you the full picture about the interface, but that's okay -- SC is not supposed to be give you the full picture. That's why binder puzzles will also include SASA and DDG, to encourage large interfaces with lots of contacts.

The Foldit score combines many different subscores and Objectives. We cannot fully judge a solution by looking at just one part of it. Just because SC goes up does not mean your solution is a better binder design. We have to consider all of the score parts.

You suggest that the SC metric should combine the raw SC value and the number of sidechains at the interface. I agree that this would be a better judge of the interface than SC alone. But my counter suggestion is that you could accomplish the same thing by combining the SC metric and SASA metric. This is how Foldit scoring is intended to work.

"After using some recipes, the score system give more points when a protein is a bit detached, because that resolve some BUNS. . . . This should give less points."

If your protein scores better when it is detached, that is an indication that your protein is more stable when detached (i.e. your protein will not bind the target in reality). To me this sounds like the scoring is working correctly -- even if it's not doing what you would like it to do.

The highest scoring solutions will have zero BUNS and also be attached to the target. But we cannot expect "partial credit" for getting halfway there and solving just one of these problems. Our score will be bad (it should be bad) until we solve both together.

If the Foldit score could smoothly guide you to a high-scoring solution (the score always goes up as you get closer), then protein design would be an easy problem and we wouldn't need human brainpower to help us solve it.

Finally, there is some good news that scientists at the IPD have recently come up with a new metric that sort of combines the SASA and SC metrics. I think it should behave more like what you expect, and we hope to include it in Foldit soon.


Developed by: UW Center for Game Science, UW Institute for Protein Design, Northeastern University, Vanderbilt University Meiler Lab, UC Davis
Supported by: DARPA, NSF, NIH, HHMI, Amazon, Microsoft, Adobe, RosettaCommons