Could we please have two weeks on this puzzle? It is complex enough to require several days of hand work, possibly with multiple restarts, and large enough to require several days of running scripts. With the planned down time on top of that, you will not be getting everyone's best work.
My best score on 1258 was not my best work, but my first effort, because that was the only one I had time to run a full complement of scripts on. My best effort was a share-with-scientists version that never had time to reach its top score. I would hate to have the same thing happen with 1261.
Hi Susume!
The puzzle will be extended for a week. I will update the text on the description of the puzzle and extend the time. Thanks for letting me know about this.
Just for the info:
There are 302 predicted contacts with potentially 10027 points for them.
So based on the best piece of 1258: 9670 points.
The hurdle is at 19697 points...
Good Folding!!!
:)
Just a question. In case this protein is in lipid environment (at least partly, as I understood from discussion between Susume,jeff, Afligen etc), can we still trust the Roseta hiding subscore ?
Should it be deactivated in the Roseta calculation?
Hi Bruno,
The anchored portion of the protein that is in the lipid environment is removed and only the soluble portion is present in the current sequence.
As a side note, Foldit/Rosetta can handle lipid environments if the score function is modified; one day I hope to post a membrane protein, but for that, it would be important to visualize the lipid bilayer, which is currently not implemented.
Hi Everyone,
As mentioned in a previous comment for phase 1 of this puzzle, a paper to look at is:
http://www.sfu.ca/~mpaetzel/publications/Paetzel_SPase_Review_ChemRev_2002.pdf
"The bottom sequence in Figure 4 is M. tuberculosis LepB, and it’s aligned with the E. coli sequence (1st sequence). There’s a more recent review paper by Auclair et al (2012) with a similar figure, although in that one they refer to the M. tb protein as “Sip” instead of “LepB”."
Hope that helps out!
The fig 4 shows an alignment with e.coli. My problem is that I cannot interpret this (or do we have to find the related 3D picture portion of E.Coli on internet?). Should it be available with an alignment tool? Or would it be useful if yo post a picture of this portion here?
Is the following the Auclair et al (2012) review article mentioned above?
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3323777/
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3323777/pdf/pro0021-0013.pdf
Thanks!
Hi jeff101,
Both of those articles provide some clues :)
of solved box B;, box C etc ... with the related segment numbers in our puzzle - that would be great for a visual copy
Do you think all of them, or only some of them, should be reproduced in or models?
I think that those contacts are the closest clues to how this protein is folded and should be remained. In my own modeling of low homology models, I always try to keep the binding pocket and residues used in catalysis as close as possible. This is because in low homology models, the binding pocket is almost always very well conserved.
Here is an example of what was done on a project I worked on. At the time, the human serotonin protein did not have a crystal structure and the closest homolog crystal structure had ~25% similarity. However, if you looked at the binding pocket residues (binding pocket residues identified through mutational/experimental studies), the similarity jumped up to ~60%. Therefore, when we modeled the protein, we tried real hard to make sure all those residues were kept in contact.
After reading several articles and looking at homologs, as near as I can tell the following are likely conserved. I am not a biologist, so take it with a grain of salt:
93 lys and 15 ser - these form the "catalytic dyad" (part of the active site) and should be close enough for their tips to form a hydrogen bond.
13 ser may also be involved, so it may be good to have it near 93 as well.
93 lys is likely partially buried rather than on the surface of the protein, though its tip needs to be near the surface.
156 gly is in contact with 93 lys, close enough that if you were to put a sidechain on the gly it would clash with the lys sidechain.
162 ser sidechain is in contact with both 93 lys and 15 ser sidechains, and is part of the active site.
164 asp and 166 arg sidechains form a salt bridge, which I think shows up in foldit as a hydrogen bond. This helps shape the backbone so that 162 ser is in the right place.
These sidechains may form a hydrophobic pocket (called s1) near the catalytic dyad:
16 met, 92 val, 20 leu, 11 ile
These sidechains may form another hydrophobic pocket (called s3) near the catalytic dyad:
9 tyr, 11 ile, 31 ile, 50 val, 92 val, and the beta carbon of 90 asp
Our part of the protein may have a side with exposed hydrophobics for sticking against the membrane (this is in addition to the part of the protein that stays embedded within the membrane, which has been left off of our model).
Predicted 3D models assisted by predicted contacts, and predicted contact map (for 1261, not 1262) ... fwiw
http://raptorx2.uchicago.edu/ContactMap/myjobs/74298828_105939/
Interesting source !
With the articles coming in, and more information, could we please have a third round for this puzzle? Those of us with less science background could learn much from another try.
I'm not a scientist. I tried a sheet plane with helices on either side in phase one, which didn't do too badly. it could likely be improved. I don't even mind a scientist taking my phase one fold and showing how to improve it.
Sadly, I had to quit this puzzle days ago as a solo player. the number of residues plus the contact map made it too large for my cpu.
We'd love to put all hands on deck. so if you could take something, show us how to apply the science, many of us would be happy to try and refold
Skip
I would appreciate it if the next few Phases of this Challenge keep letting us load saves from all previous Phases (Puzzles 1258 and 1261 so far).
Thanks again!
Jeff
The blog post said there would be a phase with starts based on homologs. I hope there will be at least one phase with both homolog starts and contacts, because applying the contact data seems like the best way to modify a homolog template to make it more accurate.
Closed my client for an hour, restarted, got an automatic update, and was reset to the beginning of the puzzle (1261). None of my saved solutions are listed.
As mentioned above, the continuation of 1258 and 1261, please consider letting us have such. Am just beginning to get a handle on contact maps, lost my entire puzzle 4 hours ago in a restart and update. Would like to see all this end on a positive note, and actually see if I've learned something going forward. Version 3, please.
puzzle size? combined with contact map?
Some homologs, like the E. coli, have their catalytic lysine at a different location than the TB, and therefore must have a partially different shape. Is it possible to give more weight to contacts predicted from homologs with the similar active site residue locations, and less weight to contacts from homologs with active site residues located elsewhere?
http://foldit.wikia.com/wiki/Distance_Maps and http://memorize.com/distance-and-contact-maps show Distance and Contact Maps along-side their respective 3D images for many different proteins. Images are classified as in the SCOPe database (http://scop.berkeley.edu), and http://memorize.com/distance-and-contact-maps can shuffle the images in multiple-choice and matching modes to help you learn the patterns in them.