Seq. SAVKRTIKGT HHWLLLTILT SLLVLVQSTQ WSLFFFLYEN AFLPFAMGII
TOPCONS iiiiiiiiii iMMMMMMMMM MMMMMMMMMM MMooooooMM MMMMMMMMMM
OCTOPUS iiiiiiiiii iMMMMMMMMM MMMMMMMMMM MMooooooMM MMMMMMMMMM
Philius iiiiiiiiii iMMMMMMMMM MMMMMMMMMM Mooooooooo MMMMMMMMMM
PolyPhobius iiiiiiiiii iMMMMMMMMM MMMMMMMMMo oooooooooM MMMMMMMMMM
SCAMPI iiiiiiiiii MMMMMMMMMM MMMMMMMMMM MooooooooM MMMMMMMMMM
SPOCTOPUS iiiiiiiiii iMMMMMMMMM MMMMMMMMMM MMooooooMM MMMMMMMMMM
PDB-homology
51 91
Seq. AMSAFAMMFV KHKHAFLCLF LLPSLATVAY FNMVYMPASW VMRIMTWLDM
TOPCONS MMMMMMMMMi iiiiiMMMMM MMMMMMMMMM MMMMMMoooo oooooooooo
OCTOPUS MMMMMMMMMi iiiiiMMMMM MMMMMMMMMM MMMMMMoooo oooooooooo
Philius MMMMMMMMMM iiiiiiMMMM MMMMMMMMMM MMMMMMMMoo oooooooooo
PolyPhobius MMMMMMMMMi iiiiiMMMMM MMMMMMMMMM MMMMMooooo oooooooooo
SCAMPI MMMMMMMMMM iiiiMMMMMM MMMMMMMMMM MMMMMooooo oooooooooo
SPOCTOPUS MMMMMMMMMi iiiiiMMMMM MMMMMMMMMM MMMMMMoooo oooooooooo
PDB-homology
101 141
Seq. VDTSLSGFKL KDCVMYASAV VLLILMTART VYDDGARRVW TLMNVLTLVY
TOPCONS oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiiMMMMM MMMMMMMMMM
OCTOPUS oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiiiMMMM MMMMMMMMMM
Philius oooooooooo ooMMMMMMMM MMMMMMMMMM MMiiiiiiii iiiiiiiiii
PolyPhobius oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiiiiMMM MMMMMMMMMM
SCAMPI oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiMMMMMM MMMMMMMMMM
SPOCTOPUS oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiiiMMMM MMMMMMMMMM
PDB-homology
151 191
Seq. KVYYGNALDQ AISMWALIIS VTSNYSGVVT TVMFLARGIV FMCVEYCPIF
TOPCONS MMMMMMoMMM MMMMMMMMMM MMMMMMMMiM MMMMMMMMMM MMMMMMMMMM
OCTOPUS MMMMMMMooo oooMMMMMMM MMMMMMMMiM MMMMMMMMMM MMMMoooooo
Philius iiiiiiiiii iiiiMMMMMM MMMMMMMMMM MMMMMMMMMM MMMMMMMMMo
PolyPhobius MMMMMooooo oooooMMMMM MMMMMMMMMM MMMMMMMMMM MMMMMMMiii
SCAMPI MMMMMooooo oooooooooo oooooooooo oooooooooo oooooooooo
SPOCTOPUS MMMMMMMooo oooMMMMMMM MMMMMMMMiM MMMMMMMMMM MMMMoooooo
PDB-homology
201 241
Seq. FITGNTLQCI MLVYCFLGYF CTCYFGLFCL LNRYFRLTLG VYDYLVSTQE
TOPCONS ooooooooMM MMMMMMMMMM MMMMMMMMMi iiiiiiiiii iiiiiiiiii
OCTOPUS oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiiiiiii iiiiiiiiii
Philius oooooooooM MMMMMMMMMM MMMMMMMMMM Miiiiiiiii iiiiiiiiii
PolyPhobius iiiiiiiiMM MMMMMMMMMM MMMMMMMMMM Mooooooooo oooooooooo
SCAMPI oooooooooo MMMMMMMMMM MMMMMMMMMM Miiiiiiiii iiiiiiiiii
SPOCTOPUS oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiiiiiii iiiiiiiiii
PDB-homology
251 281
Seq. FRYMNSQGLL PPKNSIDAFK LNIKLLGVGG KPCIKVATVQ
TOPCONS iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
OCTOPUS iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
Philius iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
PolyPhobius oooooooooo oooooooooo oooooooooo oooooooooo
SCAMPI iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
SPOCTOPUS iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
I think residues 221-290 in the post above match
the 70-residue sequence given for this puzzle:
0000000001 1111111112 2222222223
1234567890 1234567890 1234567890
--------------------------------------------
Conf 9501189999 9998199515 5124106999
PSIPRED CCCEEHHHHH HHHHHCCCCE ECCEEECHHH
Seq. CTCYFGLFCL LNRYFRLTLG VYDYLVSTQE
TOPCONS MMMMMMMMMi iiiiiiiiii iiiiiiiiii
OCTOPUS MMMMMMMMii iiiiiiiiii iiiiiiiiii
Philius MMMMMMMMMM Miiiiiiiii iiiiiiiiii
PolyPhobius MMMMMMMMMM Mooooooooo oooooooooo
SCAMPI MMMMMMMMMM Miiiiiiiii iiiiiiiiii
SPOCTOPUS MMMMMMMMii iiiiiiiiii iiiiiiiiii
--------------------------------------------
2222222222 2222222222 2222222222
2222222223 3333333334 4444444445
1234567890 1234567890 1234567890
3333333334 4444444445 5555555556 6666666667
1234567890 1234567890 1234567890 1234567890
-------------------------------------------------------
Conf 9999857999 9995799999 8887204268 6246425329
PSIPRED HHHHHHCCCC CCCCHHHHHH HHHHHCCCCC CCEEEEEECC
Seq. FRYMNSQGLL PPKNSIDAFK LNIKLLGVGG KPCIKVATVQ
TOPCONS iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
OCTOPUS iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
Philius iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
PolyPhobius oooooooooo oooooooooo oooooooooo oooooooooo
SCAMPI iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
SPOCTOPUS iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
-------------------------------------------------------
2222222222 2222222222 2222222222 2222222222
5555555556 6666666667 7777777778 8888888889
1234567890 1234567890 1234567890 1234567890
One question I have is what the letters M i and o mean above.
i = inside
o = outside
M = membrane
Considering the protein is a membrane one.
It is the tail domain of the large 290 residue protein. So the reason why first 11 residues are hydrophobic is that they are probably buried somewhere inside the whole protein.
I do not think that it is a membrane protein, as the given sequence is from the Non-Structural-Protein 6 (NSP6).
A great resource for me is the following: https://www.nytimes.com/interactive/2020/04/03/science/coronavirus-genome-bad-news-wrapped-in-protein.html
As it is stated there, NSP6 works with NSP3 and NSP4 to produce virus bubbles.
Does someone know, what exactly that mean?
There are also the RNA sequences given.
Translating it with: https://web.expasy.org/translate/
shows that the given sequence
(CTCYFGLFCLLNRYFRLTLGVYDYLVSTQEFRYMNSQGLL
PPKNSIDAFKLNIKLLGVGGKPCIKVATVQ) is at the end of the 5'3' Frame 1.
A question I have is, why we did not have the preceeding 10 amino acids from the open reading frame starting at position 210?
The sequence should then be:
5'3' Frame 1, start_pos=210
MLVYCFLGYFCTCYFGLFCLLNRYFRLTLGVYDYLVSTQEFRYMNSQGLL
PPKNSIDAFKLNIKLLGVGGKPCIKVATVQ
Since CTCYFG... starts at position 221,
the sequence given above has MLVYCF...
starting at position 211 not 210.
I know this is just a schematic image but, why would they put NSP6 between the cytoplasm and the reticulum lumen?
This would make NSP6 a transmembrane protein (endoplasmic reticulum membrane), I guess.
This is an article about the original SARS virus (2008), but it seems reasonable to speculate that the current virus behaves similarly: it invades the ER membrane and restructures it, using the resulting network of altered ER membrane as a place to replicate:
https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.0060226
Here is a more recent summary (2015) of ways that positive RNA viruses use the ER membrane to form protective containers in which to synthesize, either by intruding or by extruding. When they extrude the membrane, they form DMV (double membrane vesicles), which are like protective bubbles isolated from the cytoplasm where their RNA can be synthesized without being attacked.
https://pubmed.ncbi.nlm.nih.gov/25287059/
So some of the coronavirus proteins must be involved in attacking and restructuring the ER membrane, and possibly also as structural elements of the resulting altered membrane and the DMVs.
When you provide us with a partial protein, would you change the scoring function in order to ha a smaller weight for the exposed score part ? Or is there another mean to identify the part of the sem-protein that might be "inside" of the all protein ?
We use our standard score function for these puzzles.
This is potentially a problem, if this domain makes significant contacts with the rest of the protein, or with the membrane. Those contacts would not be scored appropriately with our standard score function, and could be problematic for predicting how the domain will fold.
However, many natural proteins consist of smaller domains that are perfectly capable of folding independently. If that's the case, then we should be able to use our standard score function.
Would it help to post a follow-up puzzle with
the same protein sequence as in this puzzle
that instead scores the protein as if it were
a membrane protein? I think such a follow-up
puzzle would give higher scores to solutions
with buried hydrophilic (blue) residues and
exposed hydrophobic (orange) ones.
I agree with you Jeff101. The only thing I'm think is, if it's a transmembrane protein the scoring system should be more complex than that.
Considering the different types of membrane proteins (peripheral and integral, and their different relations to the bilayer), the scoring system couldn't be onefold (pun not intended! haha).
There should be two scoring systems, one for the part embedded in the membrane (for the intrinsic part of an integral membrane , let's say), and another for the extrinsic part of the protein.
You should get a high score if the hidrophobics would be sticking out for the INtrinsic part of the protein.
You should get a high score if the hidrophilics would be sticking out for the EXtrinsic part of the protein.
So, two scoring systems.
Or they could try to lock the extrinsic part and score the intrinsic, and vice-versa. Using two different scoring objectives.
The NSP6 protein definitely associates with the membrane, but we think that this portion might form a well-folded domain in the cytosolic region of the cell.
This is far from proven, but there is some evidence to support it. This study from 2009 (working with a related coronavirus) used some cool techniques to study the orientation of the full protein in the membrane:
https://pubmed.ncbi.nlm.nih.gov/19386712/
It is still possible that this portion of the protein interacts with the membrane. But in this puzzle, we would like to predict how this domain might fold up if it were cytosolic.
We could consider a follow-up puzzle with a modified "membrane" score function, but Foldit is not very well suited to model the membrane environment. (As puxatudo points out, things get complicated.)