Core filter issues

Case number:699969-2003993
Opened by:spvincent
Opened on:Tuesday, August 1, 2017 - 21:14
Last modified:Sunday, August 20, 2017 - 19:24

One suggestion and one question about the way the Core Filter works on design puzzles:

1) I think it would be helpful if the Core Filter penalty were to be made a continuous function; as opposed to the current situation where it jumps around in multiples of 50. The current implementation plays havoc with scripts: an example of the sort of thing that happens is that as you compress a protein with bands the Core Filter value suddenly jumps by 50 while the Energy score is reduced by a lesser amount. This results in an unstable new best solution which has a lot of bands but which is stuck as far as wiggle is concerned: you have to turn off filters, delete bands and then wiggle to get anywhere. And when you subsequently turn filters back on the transient gain in the Core Filter value is lost.

2) What is the algorithm used to calculate the Core Filter? Knowing how this works might make it easier to construct proteins that satisfy it. I have a couple of solutions on the current design puzzle that, although they look nice and compact, nevertheless fail to satisfy the Core Filter by a considerable margin. Whereas a more elongated structure, somewhat counterintuitively, does satisfy the filter.

(Tue, 08/01/2017 - 21:14  |  3 comments)

Enzyme's picture
User offline. Last seen 52 years 4 weeks ago. Offline
Joined: 07/10/2008
Groups: None
bkoep's picture
User offline. Last seen 2 hours 15 min ago. Offline
Joined: 11/15/2012
Groups: Foldit Staff

Thanks for your suggestions! This issue does seem to come up pretty regularly. A serious shortcoming of the filters in general is that they all operate on step functions. I agree that the current Core Existence filter is particularly bad, and could probably be improved—at least so that the penalty step size is smaller.

The Core Existence filter first classifies each residue into one of three layers: core, boundary, or surface. The classification of a residue is based on the the number of neighboring Cα atoms that lie within some distance of its sidechain—or rather, within a cone that radiates out along the sidechain's Cα-Cβ axis (we usually want to ignore atoms "behind" the residue that do not interact with the sidechain). The neighbor "counting" is actually continuous (a residue can have a 3.46 neighbors, for example), but the layer classification is applied on a hard threshold of this neighbor count. You can see which residues are classified as core/boundary/surface with the "Show" checkbox for the Core Existence filter.

After all residues are classified, the Core Existence filter awards a large bonus if a certain number of residues (usually >30% of total residues) are classified in the core layer. If you don't qualify for the bonus, you receive incremental penalties for every residue below the 30% cutoff. We could probably back off on these bonus and penalty values, which have been pretty steep (160 pts!) in recent design puzzles.

This might seem like a silly way of measuring the size of a protein core, but there are a few reasons for it. The biggest reason is that these layer (core/boundary/surface) designations are used extensively in a lot of Rosetta protein design protocols. So the machinery is there, and its behavior is well known; and on top of that, we like to keep Foldit and Rosetta protocols as analogous as possible for easy comparison. The same layer system is used elsewhere in Foldit, as well: by the HBNet filter and sometimes by the SS Design filter.

The neighbor-counting system is also fast and reasonably informative about the protein core, as compared to something like SASA calculations (which are slow) or radius of gyration (which is less informative). However, it's not hard to imagine a different metric that bypasses the layer classification, and agglomerates neighbor counts more continuously—or at least in smaller steps. Hopefully we'll be able to try something like that in the future!

spvincent's picture
User offline. Last seen 6 hours 59 min ago. Offline
Joined: 12/07/2007
Groups: Contenders

Thanks for the explanation of filter bkoep (not quite sure how things would work with glycine). And hopefully the step size fix is an easy one.


Developed by: UW Center for Game Science, UW Institute for Protein Design, Northeastern University, Vanderbilt University Meiler Lab, UC Davis
Supported by: DARPA, NSF, NIH, HHMI, Amazon, Microsoft, Adobe, Boehringer Ingelheim, RosettaCommons