puzzle picture
1841b: Coronavirus NSP6 Prediction
Status: Closed

Summary

Name: 1841b: Coronavirus NSP6 Prediction
Status: Closed
Created: 05/21/2020
Points: 100
Expired: 05/28/2020 - 23:00
Difficulty: Intermediate
Description: Note: This puzzle replaces Puzzle 1841, which was accidentally posted with an incorrect sequence.

Fold this coronavirus protein! This is a portion of a larger protein encoded in the viral genome of SARS-CoV-2. It is encoded in a region of the genome called NSP6, but the protein's structure and function are still unknown. If we knew how this protein folds, we might be able to figure out its exact function. The puzzle's starting structure shows SS predictions from PSIPRED, and hints which parts of the protein might fold into helices or sheets. Refold this protein to find high-scoring solutions, which will tell us how this protein is most likely to fold!

Sequence:
CTCYFGLFCLLNRYFRLTLGVYDYLVSTQEFRYMNSQGLLPPKNSIDAFKLNIKLLGVGGKPCIKVATVQ
Categories: Overall, Prediction

Top Groups

RankGroupScorePoints
1Go Science9,500100
2Anthropic Dreams9,43077
3Gargleblasters9,28458
4Void Crushers9,21243
5L'Alliance Francophone9,20931

Top Evolvers

Top Soloists



Need this puzzle? Log in to download.  

Comments

bkoep's picture
User offline. Last seen 11 hours 34 min ago. Offline
Joined: 11/15/2012
Groups: None
PSIPRED predictions

Conf: 950118999999981995155124106999999985799999957999998887204268
Pred: CCCEEHHHHHHHHHHCCCCEECCEEECHHHHHHHHHCCCCCCCCHHHHHHHHHHHCCCCC
  AA: CTCYFGLFCLLNRYFRLTLGVYDYLVSTQEFRYMNSQGLLPPKNSIDAFKLNIKLLGVGG
              10        20        30        40        50        60


Conf: 6246425329
Pred: CCEEEEEECC
  AA: KPCIKVATVQ
              70
Batz's picture
User offline. Last seen 19 hours 46 min ago. Offline
Joined: 02/16/2012
Groups: Go Science
Membrane protein?

It looks like the first 11 residues might be covered inside a lipid membrane, together with the other part of the protein that is missing in this puzzle

SS_Prediction

Batz's picture
User offline. Last seen 19 hours 46 min ago. Offline
Joined: 02/16/2012
Groups: Go Science
Seq. SAVKRTIKGT

Seq. SAVKRTIKGT HHWLLLTILT SLLVLVQSTQ WSLFFFLYEN AFLPFAMGII
TOPCONS iiiiiiiiii iMMMMMMMMM MMMMMMMMMM MMooooooMM MMMMMMMMMM
OCTOPUS iiiiiiiiii iMMMMMMMMM MMMMMMMMMM MMooooooMM MMMMMMMMMM
Philius iiiiiiiiii iMMMMMMMMM MMMMMMMMMM Mooooooooo MMMMMMMMMM
PolyPhobius iiiiiiiiii iMMMMMMMMM MMMMMMMMMo oooooooooM MMMMMMMMMM
SCAMPI iiiiiiiiii MMMMMMMMMM MMMMMMMMMM MooooooooM MMMMMMMMMM
SPOCTOPUS iiiiiiiiii iMMMMMMMMM MMMMMMMMMM MMooooooMM MMMMMMMMMM
PDB-homology

51 91
Seq. AMSAFAMMFV KHKHAFLCLF LLPSLATVAY FNMVYMPASW VMRIMTWLDM
TOPCONS MMMMMMMMMi iiiiiMMMMM MMMMMMMMMM MMMMMMoooo oooooooooo
OCTOPUS MMMMMMMMMi iiiiiMMMMM MMMMMMMMMM MMMMMMoooo oooooooooo
Philius MMMMMMMMMM iiiiiiMMMM MMMMMMMMMM MMMMMMMMoo oooooooooo
PolyPhobius MMMMMMMMMi iiiiiMMMMM MMMMMMMMMM MMMMMooooo oooooooooo
SCAMPI MMMMMMMMMM iiiiMMMMMM MMMMMMMMMM MMMMMooooo oooooooooo
SPOCTOPUS MMMMMMMMMi iiiiiMMMMM MMMMMMMMMM MMMMMMoooo oooooooooo
PDB-homology

101 141
Seq. VDTSLSGFKL KDCVMYASAV VLLILMTART VYDDGARRVW TLMNVLTLVY
TOPCONS oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiiMMMMM MMMMMMMMMM
OCTOPUS oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiiiMMMM MMMMMMMMMM
Philius oooooooooo ooMMMMMMMM MMMMMMMMMM MMiiiiiiii iiiiiiiiii
PolyPhobius oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiiiiMMM MMMMMMMMMM
SCAMPI oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiMMMMMM MMMMMMMMMM
SPOCTOPUS oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiiiMMMM MMMMMMMMMM
PDB-homology

151 191
Seq. KVYYGNALDQ AISMWALIIS VTSNYSGVVT TVMFLARGIV FMCVEYCPIF
TOPCONS MMMMMMoMMM MMMMMMMMMM MMMMMMMMiM MMMMMMMMMM MMMMMMMMMM
OCTOPUS MMMMMMMooo oooMMMMMMM MMMMMMMMiM MMMMMMMMMM MMMMoooooo
Philius iiiiiiiiii iiiiMMMMMM MMMMMMMMMM MMMMMMMMMM MMMMMMMMMo
PolyPhobius MMMMMooooo oooooMMMMM MMMMMMMMMM MMMMMMMMMM MMMMMMMiii
SCAMPI MMMMMooooo oooooooooo oooooooooo oooooooooo oooooooooo
SPOCTOPUS MMMMMMMooo oooMMMMMMM MMMMMMMMiM MMMMMMMMMM MMMMoooooo
PDB-homology

201 241
Seq. FITGNTLQCI MLVYCFLGYF CTCYFGLFCL LNRYFRLTLG VYDYLVSTQE
TOPCONS ooooooooMM MMMMMMMMMM MMMMMMMMMi iiiiiiiiii iiiiiiiiii
OCTOPUS oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiiiiiii iiiiiiiiii
Philius oooooooooM MMMMMMMMMM MMMMMMMMMM Miiiiiiiii iiiiiiiiii
PolyPhobius iiiiiiiiMM MMMMMMMMMM MMMMMMMMMM Mooooooooo oooooooooo
SCAMPI oooooooooo MMMMMMMMMM MMMMMMMMMM Miiiiiiiii iiiiiiiiii
SPOCTOPUS oooooooMMM MMMMMMMMMM MMMMMMMMii iiiiiiiiii iiiiiiiiii
PDB-homology

251 281
Seq. FRYMNSQGLL PPKNSIDAFK LNIKLLGVGG KPCIKVATVQ
TOPCONS iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
OCTOPUS iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
Philius iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
PolyPhobius oooooooooo oooooooooo oooooooooo oooooooooo
SCAMPI iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
SPOCTOPUS iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii

jeff101's picture
User offline. Last seen 3 hours 8 min ago. Offline
Joined: 04/20/2012
Groups: Go Science
Above comments reformatted:
I think residues 221-290 in the post above match 
the 70-residue sequence given for this puzzle:

            0000000001 1111111112 2222222223
            1234567890 1234567890 1234567890
--------------------------------------------
Conf        9501189999 9998199515 5124106999 
PSIPRED     CCCEEHHHHH HHHHHCCCCE ECCEEECHHH 
Seq.        CTCYFGLFCL LNRYFRLTLG VYDYLVSTQE
TOPCONS     MMMMMMMMMi iiiiiiiiii iiiiiiiiii
OCTOPUS     MMMMMMMMii iiiiiiiiii iiiiiiiiii
Philius     MMMMMMMMMM Miiiiiiiii iiiiiiiiii
PolyPhobius MMMMMMMMMM Mooooooooo oooooooooo
SCAMPI      MMMMMMMMMM Miiiiiiiii iiiiiiiiii
SPOCTOPUS   MMMMMMMMii iiiiiiiiii iiiiiiiiii
--------------------------------------------
            2222222222 2222222222 2222222222
            2222222223 3333333334 4444444445
            1234567890 1234567890 1234567890

            3333333334 4444444445 5555555556 6666666667
            1234567890 1234567890 1234567890 1234567890
-------------------------------------------------------        
Conf        9999857999 9995799999 8887204268 6246425329
PSIPRED     HHHHHHCCCC CCCCHHHHHH HHHHHCCCCC CCEEEEEECC
Seq.        FRYMNSQGLL PPKNSIDAFK LNIKLLGVGG KPCIKVATVQ
TOPCONS     iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
OCTOPUS     iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
Philius     iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
PolyPhobius oooooooooo oooooooooo oooooooooo oooooooooo
SCAMPI      iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
SPOCTOPUS   iiiiiiiiii iiiiiiiiii iiiiiiiiii iiiiiiiiii
-------------------------------------------------------
            2222222222 2222222222 2222222222 2222222222
            5555555556 6666666667 7777777778 8888888889
            1234567890 1234567890 1234567890 1234567890

One question I have is what the letters M i and o mean above.
puxatudo's picture
User offline. Last seen 5 hours 7 min ago. Offline
Joined: 04/07/2014
Groups: Go Science
Meaning of 'M', 'i' and 'o'

i = inside
o = outside
M = membrane

Considering the protein is a membrane one.

Serca's picture
User offline. Last seen 23 min 18 sec ago. Offline
Joined: 02/03/2020
Groups: Go Science
It is the tail domain of the

It is the tail domain of the large 290 residue protein. So the reason why first 11 residues are hydrophobic is that they are probably buried somewhere inside the whole protein.

Joined: 03/22/2020
Groups: Go Science
No membrane protein

I do not think that it is a membrane protein, as the given sequence is from the Non-Structural-Protein 6 (NSP6).
A great resource for me is the following: https://www.nytimes.com/interactive/2020/04/03/science/coronavirus-genome-bad-news-wrapped-in-protein.html
As it is stated there, NSP6 works with NSP3 and NSP4 to produce virus bubbles.
Does someone know, what exactly that mean?

There are also the RNA sequences given.
Translating it with: https://web.expasy.org/translate/
shows that the given sequence
(CTCYFGLFCLLNRYFRLTLGVYDYLVSTQEFRYMNSQGLL
PPKNSIDAFKLNIKLLGVGGKPCIKVATVQ) is at the end of the 5'3' Frame 1.

A question I have is, why we did not have the preceeding 10 amino acids from the open reading frame starting at position 210?
The sequence should then be:
5'3' Frame 1, start_pos=210
MLVYCFLGYFCTCYFGLFCLLNRYFRLTLGVYDYLVSTQEFRYMNSQGLL
PPKNSIDAFKLNIKLLGVGGKPCIKVATVQ

jeff101's picture
User offline. Last seen 3 hours 8 min ago. Offline
Joined: 04/20/2012
Groups: Go Science
MLVYCF... starts at position 211

Since CTCYFG... starts at position 221,
the sequence given above has MLVYCF...
starting at position 211 not 210.

puxatudo's picture
User offline. Last seen 5 hours 7 min ago. Offline
Joined: 04/07/2014
Groups: Go Science
Just an image but...

I know this is just a schematic image but, why would they put NSP6 between the cytoplasm and the reticulum lumen?
This would make NSP6 a transmembrane protein (endoplasmic reticulum membrane), I guess.

https://viralzone.expasy.org/resources/R1a_topology.png

Susume's picture
User offline. Last seen 1 hour 1 min ago. Offline
Joined: 10/02/2011
Virus invades and changes ER membrane

This is an article about the original SARS virus (2008), but it seems reasonable to speculate that the current virus behaves similarly: it invades the ER membrane and restructures it, using the resulting network of altered ER membrane as a place to replicate:

https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.0060226

Here is a more recent summary (2015) of ways that positive RNA viruses use the ER membrane to form protective containers in which to synthesize, either by intruding or by extruding. When they extrude the membrane, they form DMV (double membrane vesicles), which are like protective bubbles isolated from the cytoplasm where their RNA can be synthesized without being attacked.

https://pubmed.ncbi.nlm.nih.gov/25287059/

So some of the coronavirus proteins must be involved in attacking and restructuring the ER membrane, and possibly also as structural elements of the resulting altered membrane and the DMVs.

Joined: 09/24/2012
Groups: Go Science
Q to Foldit team

When you provide us with a partial protein, would you change the scoring function in order to ha a smaller weight for the exposed score part ? Or is there another mean to identify the part of the sem-protein that might be "inside" of the all protein ?

bkoep's picture
User offline. Last seen 11 hours 34 min ago. Offline
Joined: 11/15/2012
Groups: None
No

We use our standard score function for these puzzles.

This is potentially a problem, if this domain makes significant contacts with the rest of the protein, or with the membrane. Those contacts would not be scored appropriately with our standard score function, and could be problematic for predicting how the domain will fold.

However, many natural proteins consist of smaller domains that are perfectly capable of folding independently. If that's the case, then we should be able to use our standard score function.

jeff101's picture
User offline. Last seen 3 hours 8 min ago. Offline
Joined: 04/20/2012
Groups: Go Science
Should you post a follow-up puzzle for this protein sequence?

Would it help to post a follow-up puzzle with
the same protein sequence as in this puzzle
that instead scores the protein as if it were
a membrane protein? I think such a follow-up
puzzle would give higher scores to solutions
with buried hydrophilic (blue) residues and
exposed hydrophobic (orange) ones.

puxatudo's picture
User offline. Last seen 5 hours 7 min ago. Offline
Joined: 04/07/2014
Groups: Go Science
More complex than that

I agree with you Jeff101. The only thing I'm think is, if it's a transmembrane protein the scoring system should be more complex than that.

Considering the different types of membrane proteins (peripheral and integral, and their different relations to the bilayer), the scoring system couldn't be onefold (pun not intended! haha).
There should be two scoring systems, one for the part embedded in the membrane (for the intrinsic part of an integral membrane , let's say), and another for the extrinsic part of the protein.

You should get a high score if the hidrophobics would be sticking out for the INtrinsic part of the protein.
You should get a high score if the hidrophilics would be sticking out for the EXtrinsic part of the protein.

So, two scoring systems.

Or they could try to lock the extrinsic part and score the intrinsic, and vice-versa. Using two different scoring objectives.

bkoep's picture
User offline. Last seen 11 hours 34 min ago. Offline
Joined: 11/15/2012
Groups: None
Probably cytosolic

The NSP6 protein definitely associates with the membrane, but we think that this portion might form a well-folded domain in the cytosolic region of the cell.

This is far from proven, but there is some evidence to support it. This study from 2009 (working with a related coronavirus) used some cool techniques to study the orientation of the full protein in the membrane:
https://pubmed.ncbi.nlm.nih.gov/19386712/

It is still possible that this portion of the protein interacts with the membrane. But in this puzzle, we would like to predict how this domain might fold up if it were cytosolic.

We could consider a follow-up puzzle with a modified "membrane" score function, but Foldit is not very well suited to model the membrane environment. (As puxatudo points out, things get complicated.)

Download links:
  Windows    OSX    Linux  
Windows
(7/8/10)
OSX
(10.12 or later)
Linux
(64-bit)

Are you new to Foldit? Click here.

Are you a student? Click here.

Are you an educator? Click here.
Other Games: Mozak
Search
Only search fold.it
Recommend Foldit
User login
Topics
Top New Users
Sitemap

Developed by: UW Center for Game Science, UW Institute for Protein Design, Northeastern University, Vanderbilt University Meiler Lab, UC Davis
Supported by: DARPA, NSF, NIH, HHMI, Amazon, Microsoft, Adobe, RosettaCommons