No replies
Susume's picture
User is online Online
Joined: 10/02/2011

In contact puzzles, the distance between two residues is measured between the beta carbons (the first joint in the sidechain), and the cutoff distance to give us credit for the contact is based on the sum of the sizes of the two sidechains involved. The max distance to get credit for a leucine-leucine contact is much greater than the max distance to get credit for an alanine-alanine contact, because leucine is much longer than alanine.

One unfortunate result of this is that it is very difficult or impossible to get credit for contacts involving glycine or alanine, because the calculated distance cutoff is so small. But not all homologs have glycine or alanine in those positions - having a predicted contact there does not mean that spot has to be glycine or alanine, only that glycine or alanine in that spot does not mess up the protein. The average distance of that contact in similar viable proteins is actually larger, because it includes other sidechains that occur at that location in other homologs. Having a smaller sidechain there does not necessarily mean those two residues physically pull closer together in our protein; it just means having less contact there is not enough to break the protein.

If a given position in a protein has glycine in our version, but shows frequencies across homologs of 50% leucine, 25% valine, 20% alanine, and 5% glycine, we should not have to get the contact as close as a glycine can be, but as close as that contact would be on average across the homologs. The cutoff distance in this case should be based 50% on the sidechain length of leucine, 25% on the length of valine, 20% on the length of alanine and only 5% on the length of glycine. Similary the contribution of the second sidechain involved should be based on the frequencies of different sizes in the second position.

This is important so that we don't lose the information contained in that predicted contact. A contact that is as close as the average of that contact across homologs should be identifiable by score in foldit. If we can't get the contact bonus for it (because of the sidechain we happen to have being smaller than the average of what actually occurs there across homologs), solutions where it is actually close enough to resemble the homologs score the same as solutions where the pair are far apart. That predicted contact's ability to affect the score is lost, and the information in that prediction is therefore not being used.

I have only considered cases where the sidechain in our protein is much smaller than the average of the sidechains across homologs. The question of what the cutoff should be when our sidechain is larger than the average of sidechains across homologs also deserves consideration.


Developed by: UW Center for Game Science, UW Institute for Protein Design, Northeastern University, Vanderbilt University Meiler Lab, UC Davis
Supported by: DARPA, NSF, NIH, HHMI, Amazon, Microsoft, Adobe, RosettaCommons