puzzle picture
1787: CRISPR-Cas Transposase Part II
Status: Closed

Summary

Name: 1787: CRISPR-Cas Transposase Part II
Status: Closed
Created: 01/15/2020
Points: 100
Expired: 01/23/2020 - 23:00
Difficulty: Intermediate
Description: Fold this transposase protein from a CRISPR-Cas complex! This puzzle presents the second half of the transposase protein, as a follow-up to Puzzle 1784: CRISPR-Cas Transposase Part I.

CRISPR-Cas is a mixed complex of RNA and proteins, which work together to make a very precise cut in a cell's DNA. Scientists recently discovered a variant of CRISPR-Cas that coopts a new protein called a transposase. In addition to cutting DNA, the transposase also allows precise insertion of new material into a target DNA strand. This new variant could lead to more efficient gene editing with CRISPR-Cas! New cryoEM experiments have shed some light on the transposase structure, which was previously unknown.

We're asking Foldit players to help solve how this transposase protein folds! The transposase is a large 400-residue protein, so this puzzle only includes half of the full protein sequence. Later, we'll post additional puzzles with cryoEM density for the transposase protein!

Sequence:
GHEAACTVSNWLAGHESKPLPNLPKSYRWGLVHWWMGIKDSEFDHFSFVQFFSNWPRSFHSIIEDEVEFNLEHAVVSTSELRLKDLLGRLFFGSIRLPERNLQHNIILGELLCYLENRLWQDKGLIANLKMNALEATVMLNCSLDQIASMVEQRILKPNRKSKPNSPLDVTDYLFHFGDIFCLWLAEFQSDEFNRSFYVSRW
Categories: Overall, Prediction

Top Groups

RankGroupScorePoints
1Go Science12,198100
2Gargleblasters11,87568
3L'Alliance Francophone11,64844
4Anthropic Dreams11,63827
5Marvin's bunch11,59016

Top Evolvers

Top Soloists



Need this puzzle? Log in to download.  

Comments

Joined: 01/19/2019
Groups: Go Science
smartblast match (TniQ, C-terminal half)

For a quick hint, the protein hits smart BLAST in a second and returns a perfect match for WP_000479715[193-394], a protein from Vibrio cholerae HE-45.

The DiANNA disulfide bridge predictor does not return very high confidence matches for residues in this fragment. A bridge is weakly predicted at 305-374, 113-182 or something for this puzzle.

This half does not look well-conserved enough for people to put it into protein family databases. The N-terminal half does have a Pfam hit (pf06527), and you can go there to look for suspiciously important residues on the HMM in a next puzzle. PMID 11160934 describes the TrinABQ complex and it seems that there's no catalytic DDE motifs for you to group together in TriQ.

RockOn's picture
User offline. Last seen 3 weeks 1 day ago. Offline
Joined: 06/01/2011
Groups: Go Science
Sha-Zam Batman

What do I need to do to understand the quick hint comment you gave us? Reading, hanging out in chat more, having more fun?

Thanks

Joined: 04/20/2012
Groups: Go Science
Google/Bing Links

I ran some keyword searches and found the links below:

https://blast.ncbi.nlm.nih.gov/smartblast/smartBlast.cgi?CMD=Web&PAGE_TYPE=BlastDocs

http://clavius.bc.edu/~clotelab/DiANNA/

http://pfam.xfam.org/

Googling "dde catalytic" gave many hits about transposases.
I think dde stands for D-D-E, Asp-Asp-Glu, or
Aspartic Acid-Aspartic Acid-Glutamic Acid,
a 3-amino acid-sequence or triad involved in
Mg2+ (magnesium) and Mn2+ (manganese) capture.

LociOiling's picture
User offline. Last seen 9 hours 13 min ago. Offline
Joined: 12/27/2012
Groups: Beta Folders
PDB matches

As with puzzle 1784, the protein in this puzzle matches to PDB entries 6PIF, 6PIG, and 6PIJ. All three proteins are discussed in an article just published in Nature.

Similar to the results for 1784, the puzzle 1787 protein partially matches chains I and J of 6PIF/6PIG/6PIJ. (All three of the PDB entries consist of multiple protein chains "in complex" with RNA and DNA.)

Jpred is probably the easiest tool for finding this kind of match. The output from AA Edit or a similar recipe can be used as input to Jpred.

Here's an example of the detailed output from Jpred:

>6pij_I mol:protein length:358  TniQ monomer I
          Length = 358

 Score =  400 bits (1027), Expect = e-111
 Identities = 195/202 (96%), Positives = 195/202 (96%), Gaps = 7/202 (3%)

Query: 1   GHEAACTVSNWLAGHESKPLPNLPKSYRWGLVHWWMGIKDSEFDHFSFVQFFSNWPRSFH 60
           GHEAACTVSNWLAGHESKPLPNLPKSYRWGLVHWWMGIKDS  DHFSFVQFFSNWPRSFH
Sbjct: 164 GHEAACTVSNWLAGHESKPLPNLPKSYRWGLVHWWMGIKDS--DHFSFVQFFSNWPRSFH 221

Query: 61  SIIEDEVEFNLEHAVVSTSELRLKDLLGRLFFGSIRLPERNLQHNIILGELLCYLENRLW 120
           SIIEDEVEFNLEHAVVSTSELRLKDLLGRLFFGSIRLPERNLQHNIILGELLCYLENRLW
Sbjct: 222 SIIEDEVEFNLEHAVVSTSELRLKDLLGRLFFGSIRLPERNLQHNIILGELLCYLENRLW 281

Query: 121 QDKGLIANLKMNALEATVMLNCSLDQIASMVEQRILKPNRKSKPNSPLDVTDYLFHFGDI 180
           QDKGLIANLKMNALEATVMLNCSLDQIASMVEQRILKPNRKSK     DVTDYLFHFGDI
Sbjct: 282 QDKGLIANLKMNALEATVMLNCSLDQIASMVEQRILKPNRKSK-----DVTDYLFHFGDI 336

Query: 181 FCLWLAEFQSDEFNRSFYVSRW 202
           FCLWLAEFQSDEFNRSFYVSRW
Sbjct: 337 FCLWLAEFQSDEFNRSFYVSRW 358

The Foldit protein is shown on the "query" line, while while match to PDB 6PIJ, chain I is shown on the "Sbjct" line. The Foldit protein is 202 segments, but only 195 segments match 6PIJ, chain I (6PIJ_I). The match as two gaps, with segments that are found in the Foldit protein, but not in 6PIJ_I. The gaps account for the missing seven segments.

If you dig deep into the PDB entries, you'll more detail about the gaps. The ATOM records in the PDB give the 3D position of each atom in the protein. ATOM records also report which amino acid the atom belongs to, and give the amino acid a sequence number in the protein chain. (There are also SEQRES records which list the amino acids, but don't explicitly number them.)

Looking at the ATOM records for 6PIJ_I, there are several gaps in the sequence numbering. The ATOM records start with residue 3, and run continuously to residue 165. This first section doesn't provide a good match to the puzzle 1787 protein, matching only four segments. This section does match puzzle 1784, covering segments 1 to 172 out of 188 segments.

Then ATOM numbering picks up with residue 196, and runs to 236. The amino acids in this second section of ATOM records correspond to the first 41 segments of puzzle 1787 protein.

The second section covers Foldit up to the first gap in Jpred results, segment 1 to 41.

The ATOM records then skip from residue 236 to residue 239, and continue to residue 358. These residues start just after the first gap, and match Foldit segments 44 to 163, where the second gap begins.

The ATOM records jump from 358 to residue 364, and run to residue 397. These residues match the Foldit protein after the second gap, Foldit segments 169 to 202.

The gaps in ATOM records mean the corresponding amino acids simply aren't included in the PDB model. For example, if you look at 6PIJ_I in Jmol or a similar viewer, you'll see residue 236 followed by empty space, then the model resumes with segment 239. A similar thing happens between residues 358 and 364, the second gap.

The header of the PDB entry for 6PIJ contains CAVEAT records that note the gaps:

CAVEAT   8 6PIJ                        [...] RESIDUES SER I236 AND ASP I239     
CAVEAT   9 6PIJ    THAT ARE NEXT TO EACH OTHER IN THE SAMPLE SEQUENCE ARE NOT   
CAVEAT  10 6PIJ    PROPERLY LINKED: DISTANCE BETWEEN C AND N IS 8.46. RESIDUES  
CAVEAT  11 6PIJ    LYS I358 AND ASP I364 THAT ARE NEXT TO EACH OTHER IN THE     
CAVEAT  12 6PIJ    SAMPLE SEQUENCE ARE NOT PROPERLY LINKED: DISTANCE BETWEEN C  
CAVEAT  13 6PIJ    AND N IS 11.88. [...]

To sum it up, the residue numbers that you see in a viewer like Jmol of Pymol are determined by the ATOM records of the PDB record. Even when there's a good match to a Foldit protein, matching up PDB residue number to Foldit segment numbers is not always easy. The PDB doesn't necessarily number the residues sequentially, and numbering often doesn't start with one or even zero. With some effort, it's possible to match up the PDB entry on display to what you see in Foldit.

LociOiling's picture
User offline. Last seen 9 hours 13 min ago. Offline
Joined: 12/27/2012
Groups: Beta Folders
tabular version

Here's how the residue numbers in 6PIJ_I correspond to the segment numbers in Foldit puzzle 1787.

6PIJ_I 1787 len
3-165 n.a. n.a.
196-236 1-41 41
239-358 44-163 120
364-397 169-202 34
Joined: 05/19/2009
Groups: Contenders
Ethical question

What is the point of putting out puzzles to solve *by ourselves* if some folks seemingly have looked up the resulting structures on the internet and simply recreate what they can see (or derive) online ?

Why would I waste my time building models truly from sratch based on my own insights if the 'best' teams seemingly do very well on copying ?

What exactly is the Foldit team trying to learn from our folding currently ? That helps in determining whether to spend time on something or not.

Joined: 09/24/2012
Groups: Go Science
Not sure "copying" from external sources is helping a lot

I don't think players derived much gain from "copying" from internet (if you look at the result of lociOiling on this puzzle, one of his worst results last months).

Visual copies are far from enough to gain upper ranking on a puzzle.

After all, remix and rebuild tools propose "copies" of nature sequences as well.

These comments try to guide us to design plausible solutions than could be close enough to native in order to help science.

(I must admit that the current ones are to complicated to me - I didn't use them - but these are public comments that any player might use).

LociOiling's picture
User offline. Last seen 9 hours 13 min ago. Offline
Joined: 12/27/2012
Groups: Beta Folders
wait for the electron density...

As Bruno says, the PDB probably won't be very helpful on these puzzles, but it's probably worth at look, at least to understand the context a little better.

On puzzle 1784, I simply treated it as a de-novo, and cynically made up a lot of helixes. That one took 15th place. On 1787, I tried to emulate one of the I chains (from 6PIJ, I think), and finished 29th.

As noted above and in some chat comments, there are lots of issues with 6PIF, 6PIG, and 6PIJ that make it difficult to sort out what's going on.

These three PDB matches are brand-new results, with the Nature article getting published last December, only weeks before puzzle 1784 started. For the three authors listed in the PDB entries, 6PIF, 6PIG, and 6PIJ are their only PDB publications. As the puzzle comments note, the study used cryoEM, so things may be a little fuzzy.

My scan of the PDB shows chains A, B, C, D, E, F, G, H, I, J, 1, 2, and 3 for 6PIJ. The lettered chains are proteins, the numbered chains are RNA and DNA. (The rcsb.org site doesn't seem to want to discuss chains 2 and 3, but it's not clear why.)

Chains I and J contain at least partial matches for the sections seen in puzzles 1784 and 1787. Chain I is "TniQ monomer 1", and chain J is "TniQ monomer 2". The two chains start out the same, with the first 163 matching, but then they diverge.

The numbering in the PDB is also confusing. The residues aren't necessarily numbered in order at the ATOM record level, as noted above. This makes looking at the model in Jmol or Pymol a little challenging. As also noted above, some residues found in Foldit are simply missing from the PDB models (no ATOM records).

I spent a fair amount of time these past two weeks tinkering with my PDB Reader recipe. I was hoping to get an update out, but I still haven't quite achieved my primary objective of reliably extracting secondary structure info. (PDB residue numbering leaves a lot to be desired.)

The real test will come when we see the electron density version of these puzzles. These puzzles have a lot of rings, which may help. Will we see only half the density to fit half the protein, or do we get the whole cloud. Time will tell....

[edit: the study involving PDB 6PIG and the others was published in December, 2019, not January, 2020)

Download links:
  Windows    OSX    Linux  
Windows
(7/8/10)
OSX
(10.12 or later)
Linux
(64-bit)

Are you new to Foldit? Click here.

Are you a student? Click here.

Are you an educator? Click here.
Social Media


Other Games: Mozak
Search
Only search fold.it
Recommend Foldit
User login
Topics
Top New Users
Sitemap

Developed by: UW Center for Game Science, UW Institute for Protein Design, Northeastern University, Vanderbilt University Meiler Lab, UC Davis
Supported by: DARPA, NSF, NIH, HHMI, Amazon, Microsoft, Adobe, RosettaCommons