Possible bug in scoring function
Case number: | 699969-987935 |
Topic: | General |
Opened by: | Tlaloc |
Status: | Closed |
Type: | Bug |
Opened on: | Tuesday, June 29, 2010 - 03:47 |
Last modified: | Tuesday, November 2, 2010 - 21:56 |
If you use the get_segment_score_part("reference", i) function, passing "reference" as the first argument, it returns a part of the segment score. Seth says that this number is always the same for each amino acid type.
I composed this table:
-17 C Cysteine Phobic
-9 W Tryptophan Phobic
-6 F Phenylalanine Phobic
-5 Y Tyrosine Phobic
-5 H Histidine Philic
-2 I Isoleucine Phobic
-2 V Valine Phobic
-1 A Alanine Phobic
0 P Proline Phobic
1 G Glycine Phobic
1 L Leucine Phobic
2 T Threonine Philic
3 M Methionine Phobic
3 S Serine Philic
6 K Lysine Philic
6 D Aspartate Philic
8 E Glutamate Philic
8 N Asparagine Philic
9 R Arginine Philic
9 Q Glutamine Philic
The one that stands out here is Histidine. It has a reference number of -5, but is hydrophilic. All the others with negative scores are hydrophobic. Is this a mistake? This affects the scoring of the entire protein. If Histidine should have a score of 5 instead of -5, then every score of every protein is 10 too low for each segment of Histidine in the protein.
I'm probably wrong, but it seemed weird and should be checked.
Threonine and Methionine should also be checked.
Revised table. There is actually another digit to the scores, retrievable when you run the function, but not from pressing tab while hovering over the segment:
-17.0 C Cysteine Phobic 0.17
-9.1 W Tryptophan Phobic 1.50
-6.3 F Phenylalanine Phobic 2.50
-5.6 H Histidine Philic -1.70
-5.1 Y Tyrosine Phobic 0.08
-2.9 V Valine Phobic 2.30
-2.4 I Isoleucine Phobic 3.10
-1.6 A Alanine Phobic 1.00
-0.2 P Proline Phobic 0.29
1.0 L Leucine Phobic 2.20
1.7 G Glycine Phobic 0.67
2.7 T Threonine Philic -0.75
3.4 M Methionine Phobic 1.10
3.7 S Serine Philic -1.10
6.5 K Lysine Philic -4.60
6.7 D Aspartate Philic -3.00
8.1 E Glutamate Philic -2.60
8.9 N Asparagine Philic -2.70
9.7 Q Glutamine Philic -2.90
9.8 R Arginine Philic -7.50
The last column is just how hydrophilic or hydrophobic that amino acid is.
What's the difference between those values and the hydropathy index?
http://foldit.wikia.com/wiki/Amino_acids
These values do not have anything to do with the hydrophobicity of the different amino acids, they are actually weights that only affect the score when mutating from one sidechain to another (so they have nothing to do with prediction puzzles).
For example, when you mutate from a smaller sidechain to a big one (such as Tryptophan) you are going to have a lot more interactions that will be reflected in the score function, so Trp is penalized with -9.1
Phe, His & Tyr all have similar penalties as well.
These reference weights were set after careful optimizations on a large benchmark of native proteins.
Good catch, I would also think so.
By the way... are the score-parts explained somewhere?