For a quick hint, the protein hits smart BLAST in a second and returns a perfect match for WP_000479715[193-394], a protein from Vibrio cholerae HE-45.
The DiANNA disulfide bridge predictor does not return very high confidence matches for residues in this fragment. A bridge is weakly predicted at 305-374, 113-182 or something for this puzzle.
This half does not look well-conserved enough for people to put it into protein family databases. The N-terminal half does have a Pfam hit (pf06527), and you can go there to look for suspiciously important residues on the HMM in a next puzzle. PMID 11160934 describes the TrinABQ complex and it seems that there's no catalytic DDE motifs for you to group together in TriQ.
What do I need to do to understand the quick hint comment you gave us? Reading, hanging out in chat more, having more fun?
I ran some keyword searches and found the links below:
Googling "dde catalytic" gave many hits about transposases.
I think dde stands for D-D-E, Asp-Asp-Glu, or
Aspartic Acid-Aspartic Acid-Glutamic Acid,
a 3-amino acid-sequence or triad involved in
Mg2+ (magnesium) and Mn2+ (manganese) capture.
As with puzzle 1784, the protein in this puzzle matches to PDB entries 6PIF, 6PIG, and 6PIJ. All three proteins are discussed in an article just published in Nature.
Similar to the results for 1784, the puzzle 1787 protein partially matches chains I and J of 6PIF/6PIG/6PIJ. (All three of the PDB entries consist of multiple protein chains "in complex" with RNA and DNA.)
Jpred is probably the easiest tool for finding this kind of match. The output from AA Edit or a similar recipe can be used as input to Jpred.
Here's an example of the detailed output from Jpred:
>6pij_I mol:protein length:358 TniQ monomer I
Length = 358
Score = 400 bits (1027), Expect = e-111
Identities = 195/202 (96%), Positives = 195/202 (96%), Gaps = 7/202 (3%)
Query: 1 GHEAACTVSNWLAGHESKPLPNLPKSYRWGLVHWWMGIKDSEFDHFSFVQFFSNWPRSFH 60
Sbjct: 164 GHEAACTVSNWLAGHESKPLPNLPKSYRWGLVHWWMGIKDS--DHFSFVQFFSNWPRSFH 221
Query: 61 SIIEDEVEFNLEHAVVSTSELRLKDLLGRLFFGSIRLPERNLQHNIILGELLCYLENRLW 120
Sbjct: 222 SIIEDEVEFNLEHAVVSTSELRLKDLLGRLFFGSIRLPERNLQHNIILGELLCYLENRLW 281
Query: 121 QDKGLIANLKMNALEATVMLNCSLDQIASMVEQRILKPNRKSKPNSPLDVTDYLFHFGDI 180
Sbjct: 282 QDKGLIANLKMNALEATVMLNCSLDQIASMVEQRILKPNRKSK-----DVTDYLFHFGDI 336
Query: 181 FCLWLAEFQSDEFNRSFYVSRW 202
Sbjct: 337 FCLWLAEFQSDEFNRSFYVSRW 358
The Foldit protein is shown on the "query" line, while while match to PDB 6PIJ, chain I is shown on the "Sbjct" line. The Foldit protein is 202 segments, but only 195 segments match 6PIJ, chain I (6PIJ_I). The match as two gaps, with segments that are found in the Foldit protein, but not in 6PIJ_I. The gaps account for the missing seven segments.
If you dig deep into the PDB entries, you'll more detail about the gaps. The ATOM records in the PDB give the 3D position of each atom in the protein. ATOM records also report which amino acid the atom belongs to, and give the amino acid a sequence number in the protein chain. (There are also SEQRES records which list the amino acids, but don't explicitly number them.)
Looking at the ATOM records for 6PIJ_I, there are several gaps in the sequence numbering. The ATOM records start with residue 3, and run continuously to residue 165. This first section doesn't provide a good match to the puzzle 1787 protein, matching only four segments. This section does match puzzle 1784, covering segments 1 to 172 out of 188 segments.
Then ATOM numbering picks up with residue 196, and runs to 236. The amino acids in this second section of ATOM records correspond to the first 41 segments of puzzle 1787 protein.
The second section covers Foldit up to the first gap in Jpred results, segment 1 to 41.
The ATOM records then skip from residue 236 to residue 239, and continue to residue 358. These residues start just after the first gap, and match Foldit segments 44 to 163, where the second gap begins.
The ATOM records jump from 358 to residue 364, and run to residue 397. These residues match the Foldit protein after the second gap, Foldit segments 169 to 202.
The gaps in ATOM records mean the corresponding amino acids simply aren't included in the PDB model. For example, if you look at 6PIJ_I in Jmol or a similar viewer, you'll see residue 236 followed by empty space, then the model resumes with segment 239. A similar thing happens between residues 358 and 364, the second gap.
The header of the PDB entry for 6PIJ contains CAVEAT records that note the gaps:
CAVEAT 8 6PIJ [...] RESIDUES SER I236 AND ASP I239
CAVEAT 9 6PIJ THAT ARE NEXT TO EACH OTHER IN THE SAMPLE SEQUENCE ARE NOT
CAVEAT 10 6PIJ PROPERLY LINKED: DISTANCE BETWEEN C AND N IS 8.46. RESIDUES
CAVEAT 11 6PIJ LYS I358 AND ASP I364 THAT ARE NEXT TO EACH OTHER IN THE
CAVEAT 12 6PIJ SAMPLE SEQUENCE ARE NOT PROPERLY LINKED: DISTANCE BETWEEN C
CAVEAT 13 6PIJ AND N IS 11.88. [...]
To sum it up, the residue numbers that you see in a viewer like Jmol of Pymol are determined by the ATOM records of the PDB record. Even when there's a good match to a Foldit protein, matching up PDB residue number to Foldit segment numbers is not always easy. The PDB doesn't necessarily number the residues sequentially, and numbering often doesn't start with one or even zero. With some effort, it's possible to match up the PDB entry on display to what you see in Foldit.
Here's how the residue numbers in 6PIJ_I correspond to the segment numbers in Foldit puzzle 1787.
What is the point of putting out puzzles to solve *by ourselves* if some folks seemingly have looked up the resulting structures on the internet and simply recreate what they can see (or derive) online ?
Why would I waste my time building models truly from sratch based on my own insights if the 'best' teams seemingly do very well on copying ?
What exactly is the Foldit team trying to learn from our folding currently ? That helps in determining whether to spend time on something or not.
I don't think players derived much gain from "copying" from internet (if you look at the result of lociOiling on this puzzle, one of his worst results last months).
Visual copies are far from enough to gain upper ranking on a puzzle.
After all, remix and rebuild tools propose "copies" of nature sequences as well.
These comments try to guide us to design plausible solutions than could be close enough to native in order to help science.
(I must admit that the current ones are to complicated to me - I didn't use them - but these are public comments that any player might use).
As Bruno says, the PDB probably won't be very helpful on these puzzles, but it's probably worth at look, at least to understand the context a little better.
On puzzle 1784, I simply treated it as a de-novo, and cynically made up a lot of helixes. That one took 15th place. On 1787, I tried to emulate one of the I chains (from 6PIJ, I think), and finished 29th.
As noted above and in some chat comments, there are lots of issues with 6PIF, 6PIG, and 6PIJ that make it difficult to sort out what's going on.
These three PDB matches are brand-new results, with the Nature article getting published last December, only weeks before puzzle 1784 started. For the three authors listed in the PDB entries, 6PIF, 6PIG, and 6PIJ are their only PDB publications. As the puzzle comments note, the study used cryoEM, so things may be a little fuzzy.
My scan of the PDB shows chains A, B, C, D, E, F, G, H, I, J, 1, 2, and 3 for 6PIJ. The lettered chains are proteins, the numbered chains are RNA and DNA. (The rcsb.org site doesn't seem to want to discuss chains 2 and 3, but it's not clear why.)
Chains I and J contain at least partial matches for the sections seen in puzzles 1784 and 1787. Chain I is "TniQ monomer 1", and chain J is "TniQ monomer 2". The two chains start out the same, with the first 163 matching, but then they diverge.
The numbering in the PDB is also confusing. The residues aren't necessarily numbered in order at the ATOM record level, as noted above. This makes looking at the model in Jmol or Pymol a little challenging. As also noted above, some residues found in Foldit are simply missing from the PDB models (no ATOM records).
I spent a fair amount of time these past two weeks tinkering with my PDB Reader recipe. I was hoping to get an update out, but I still haven't quite achieved my primary objective of reliably extracting secondary structure info. (PDB residue numbering leaves a lot to be desired.)
The real test will come when we see the electron density version of these puzzles. These puzzles have a lot of rings, which may help. Will we see only half the density to fit half the protein, or do we get the whole cloud. Time will tell....
[edit: the study involving PDB 6PIG and the others was published in December, 2019, not January, 2020)