puzzle picture
1258: Tuberculosis Challenge - Phase 1
Status: Closed


Name: 1258: Tuberculosis Challenge - Phase 1
Status: Closed
Created: 07/11/2016
Points: 100
Expired: 07/19/2016 - 23:00
Difficulty: Intermediate
Description: This puzzle starts with an unfolded sequence with secondary structure assigned from PSIPRED. The target protein is LepB and is currently being investigated for drug discovery against Tuberculosis (TB). TB is caused by the bacillus Mycobacterium tuberculosis and has killed more than 1.5 million people in 2014. Right now, no crystal structure exists for this target. Models created by Foldit players will be used to help solve the structure when crystals become available.
Categories: Overall, Prediction

Top Groups

2Anthropic Dreams9,56273
3Go Science9,49652
5Another Hour Another Point9,23424

Top Evolvers

Top Soloists

Need this puzzle? Log in to download.  


Susume's picture
User offline. Last seen 16 hours 47 min ago. Offline
Joined: 10/02/2011
AA Sequence

Primary sequence is:

Joined: 04/20/2012
Groups: Go Science
More info on this protein:

The above sequence of 213 amino acids seems to be residues 82-294 in Fig.4 of
http://jb.asm.org/content/194/10/2614.full.pdf that includes boxes B, C, D, and E.
Box B contains Ser94 & Ser96 while Box D contains Lys174.

p.2617 of the above article says "Amino acid alignments of LepB with SPaseI
of other bacterial species identified a short intracellular domain and a large
extracellular domain containing the conserved regions boxes B, C, D, and E. The
predicted catalytically active residues of the characteristic serine-lysine dyad
are located in box B and box D." p.2617 also says "These results indicate that
the Ser94, Ser96, and Lys174 residues are essential for LepB function."

p.2618 says "we hypothesize that Ser94 and Lys174 form the catalytic center of the
protein, while Ser96 likely stabilizes the interaction with the preprotein and the
catalytic serine residue". pp.2618-9 says "the active site is located on the outside
of the cytoplasmic membrane, making it relatively accessible for small molecules."

Finally, p.2614 says "Stepwise translocation of the preprotein across the membrane
is driven by SecA-mediated ATP hydrolysis. After translocation, LepB cleaves the
signal peptide from the preprotein, releasing the mature protein into the periplasm."

All these things make me think the protein in Puzzle 1258 is an extracellular one.
They also make me think Ser94 and Lys174 should be close to each other.

Joined: 01/12/2015
Groups: None
This is great information and

This is great information and you are on the right track!

Susume's picture
User offline. Last seen 16 hours 47 min ago. Offline
Joined: 10/02/2011
serines and lysine in contact?

Ser94, Ser96, and Lys174 in the article correspond to 13 S, 15 S, and 93 K in our protein. If they form a dyad, does that mean they are in contact? And on the outside of the protein, so they can interact with other proteins?

Batz's picture
User offline. Last seen 1 day 1 hour ago. Offline
Joined: 02/16/2012
Groups: Go Science

Do the scientists know if we have to form disulfide bridges for this protein?
Would be really helpful, because there are 6 of them in the sequence

Joined: 01/12/2015
Groups: None
Hi Batz, There is no evidence

Hi Batz,

There is no evidence that there are disulfide bonds in LepB. One of the scientist who is working on this problem responded:

To predict if the protein forms disulfide bonds scientists look at the protein residency in a particular cellular compartment and its oxidative properties.
1. You check what kind of bacteria TB is gram+ or gram-
2. Predict protein compartment
3. Check for presence of disulfides

Joined: 04/20/2012
Groups: Go Science
Disulfides in LepB?

Is there any evidence that there are NO disulfide bonds in LepB?
Also, how does being gram+ or gram- affect disulfide content?
Finally, is TB gram+ or gram-?

Joined: 09/24/2012
Groups: Go Science
both (may be more Gram-)

"The phylogenetic position of Mycobacterium tuberculosis relative to other bacteria is controversial. Its cell wall has characteristics of both Gram-positive and Gram-negative bacteria. In the standard reference of bacterial phylogeny based on 16S ribosomal RNA sequence comparison, M. tuberculosis belongs to the high G+C Gram-positive bacteria that form a monophyletic group with the low G+C Gram-positive bacteria such as Bacillus subtilis. Some analyses indicate no particular relationship between these two groups. The availability of the complete genome sequence of M. tuberculosis allows us to reexamine this issue from genomic perspectives, as genome-based phylogenies may be more representative of the evolutionary history of whole organisms than molecular trees. In the genome tree constructed based on conserved gene content, M. tuberculosis is more related to Gram-negative than to Gram-positive bacteria as reflected by the evolutionary distance between nearest ancestral units. This conclusion may be supported by another analysis showing that M. tuberculosis shares relatively more orthologous genes for energy production and conversion with Gram-negative bacteria, in particular, Escherichia coli and Pseudomonas aeruginosa, than with Gram-positive bacteria" (Fu et al, 2002)

Fu, L. M.; Fu-Liu, C. S. (2002-01-01). "Is Mycobacterium tuberculosis a closer relative to Gram-positive or Gram-negative bacterial pathogens?". Tuberculosis (Edinburgh, Scotland) 82 (2-3): 85–90.

toshiue's picture
User offline. Last seen 6 hours 39 min ago. Offline
Joined: 01/31/2016
Groups: Go Science
Gram positive/negative

excellent answer, thanks for such...

Joined: 04/20/2012
Groups: Go Science

With 6 cysteines, there is
1 way to have no disulfides,
15 ways to have 1 disulfide,
45 ways to have 2 disulfides, and
15 ways to have 3 disulfides.

If we number the cysteines 1-6
so that 12,34 means 2 disulfides
(one between cysteines 1 & 2 and
another between cysteines 3 & 4),
below are all the different ways:

1 disulfide (15 ways): 
12 13 14 15 16 
23 24 25 26
34 35 36
45 46

2 disulfides (45 ways):
12,34 12,35 12,36 12,45 12,46 12,56
13,24 13,25 13,26 13,45 13,46 13,56
14,23 14,25 14,26 14,35 14,36 14,56
15,23 15,24 15,26 15,34 15,36 15,46
16,23 16,24 16,25 16,34 16,35 16,45
23,45 23,46 23,56
24,35 24,36 24,56
25,34 25,36 25,46
26,34 26,35 26,45

3 disulfides (15 ways):
12,34,56 12,35,46 12,36,45
13,24,56 13,25,46 13,26,45
14,23,56 14,25,36 14,26,35
15,23,46 15,24,36 15,26,34
16,23,45 16,24,35 16,25,34

bandsomeSS (https://fold.it/portal/recipe/101275)
is a Recipe for banding disulfide bonds.

bandsome (https://fold.it/portal/recipe/43861)
has a web page with discussion and links about
disulfide bonds. It says that more disulfide
bonds form in an oxidizing environment (like in
the blood, spinal fluid, extracellular medium,
lumen of the rough endoplasmic reticulum,
mitochondrial intermembrane space, secretory
proteins, lysosomal proteins, exoplasmic domains
of membrane proteins, hair, and feathers) than
in a reducing environment (like in the cytosol
and most cellular compartments).

Joined: 01/12/2015
Groups: None
This is pretty cool Jeff101

This is pretty cool Jeff101 and good work. Our scientists also thought along similar lines; however -

"LepB is located in the cell wall with N-terminal at the cytosol, both are known to be reducing environments.
So, I don’t think so but there might be a part of LepB sticking out to the extracellular compartment and some of disulfide might be formed." -anonymous scientist :)

According to predictions using PSORT, it doesnt look likely that there would be any disulfides forming.

Susume's picture
User offline. Last seen 16 hours 47 min ago. Offline
Joined: 10/02/2011
possible disulfide

Residues 23-26 look like they may be an insertion, with only 119 of 855 homologs having all four residues present. Of the 33 homologs having C at residue 26, all of them also have C at residue 23. Of the 36 homologs having C at residue 23, 33 of them have C at residue 26. So this might be a candidate for a disulfide, since the C's strongly tend to occur together or not at all. Not sure what use a disulfide would be on residues so close together, though.

The number of homologs having C in the other places is:
Residue 105: 131 have C
Residue 137: 38 have C
Residue 169: 12 have C
Residue 173: 1 has C

Susume's picture
User offline. Last seen 16 hours 47 min ago. Offline
Joined: 10/02/2011
CXXC disulfides in periplasm

Apparently in E. coli there are proteins in the periplasm (the stuff between the inner and outer membranes) with CXXC motif disulfides whose job is to aid in the formation of correct disulfides in other proteins that don't finish folding until they have been transported through the inner membrane and into the periplasm. LepB seems to clip the N-terminal off of proteins entering the periplasm; maybe with its conserved CXXC motif at 23-26 it also assists in the formation of disulfides.

spvincent's picture
User offline. Last seen 2 hours 11 min ago. Offline
Joined: 12/07/2007
Groups: Contenders
I think a puzzle this size is

I think a puzzle this size is quite intractable, particularly when the secondary structure is so ill-defined. How about a few starting structures in the alignment palette?

Joined: 01/12/2015
Groups: None
Hi Spvincent, I think you are

Hi Spvincent,

I think you are hitting upon one of the reasons why this specific target is so difficult. This target is bound to the cell membrane of mycobacterium tuberculosis through an 80 residue linker. For this puzzle, we have already taken that 80 residue linker out to focus more on the fold of the protein. Additionally, the protein is only ~25% identical to the closest homolog with a crystal structure.

For the first phase, I would like to see what types of interesting ideas come from Foldit. Why? Because this is a difficult problem and Foldit players think about the puzzles in a different way than I do, or the other scientists working on this puzzle. This is a great asset, especially in a field where ideas have become a little stagnated. I value the models and ideas from Foldit players.

Phase 2 will incorporate folded proteins from phase 1 along with a homology model that I created and a couple of template proteins (remember, the templates are bad because the low homology).

Joined: 04/20/2012
Groups: Go Science
Will there be a Predicted Contacts puzzle too?

Several recent De-novo puzzles (1252,1243,1237,1231,1224) have been followed by
Predicted Contacts puzzles (1255,1246,1240/1240b,1234,1227), where Contact Maps
are predicted using co-evolution data. Will there be a Predicted Contacts puzzle
for 1258's protein as well?

Joined: 01/12/2015
Groups: None
I do not know...I will look

I do not know...I will look into this.

Joined: 04/20/2012
Groups: Go Science
Puzzle Size:

This puzzle has about twice the usual number of amino acids in it.
Why not give us about twice the usual amount of time to work on it?
Also, please let us load this puzzle's solutions into future puzzles.


Joined: 01/12/2015
Groups: None
Hi Jeff101, I have no problem

Hi Jeff101,

I have no problem extending the puzzle for a week. Phase 2 will include models from phase 1 to work on. I know this is a hard puzzle (I have been playing it too, and for the life of me I cant get my score that high...).

tokens's picture
User offline. Last seen 9 hours 39 min ago. Offline
Joined: 11/28/2011
I don't see the reason for

I don't see the reason for extending the time for this puzzle. I doubt anyone will get close to anything resembling the correct fold with such little info we have been given. Rather go on to phase 2 where hopefully you will give us some more info to work with.

Joined: 09/24/2012
Groups: Go Science
Voting for extending the puzzle deadline

With the discussion here, we could start some new designs (jeff101 said he would do).
The top scores seem very low for such a long puzzle.
With also the holiday period, this could be an argument to extend the deadline a little bit (2 days?).

Joined: 04/24/2014
Groups: None

It looks like the puzzle will close as normal and move on to phase two tomorrow.

Joined: 05/26/2008
Noobie Question

How accurate do you estimate the predicted secondary structures to be ? 80% 90% 95% ? Just curious how much room there is to play with them and still remain within acceptable ranges.

spvincent's picture
User offline. Last seen 2 hours 11 min ago. Offline
Joined: 12/07/2007
Groups: Contenders
They are guidelines only and

They are guidelines only and need to be taken with a pinch of salt. Sometimes when you have long helices you can be pretty confident in the prediction but if, as here, you have scattered small fragments they don't mean too much. In some predictions the results come with a percentage figure that represents the confidence but that isn't provided here. There are public servers that will let you paste in the primary sequence (as supplied by Susume in the first comment) and provide a secondary structure prediction that may be different from the one provided. Here's one:


Joined: 01/12/2015
Groups: None
The predictions that I used

The predictions that I used came from PSIPRED, which I use for no other reason than it was what I was taught in graduate school. Here is the prediction that I used (edit, formating is bad. The original submission I made to PSIPRED can be found here, so everyone doesnt have to resubmit the sequence: http://bioinf.cs.ucl.ac.uk/psipred/result/4d4511ee-4775-11e6-8df1-00163e110593) :

Website: http://bioinf.cs.ucl.ac.uk/psipred/

If you would like to take a look at it.

brow42's picture
User offline. Last seen 3 weeks 6 days ago. Offline
Joined: 09/19/2011
Groups: None
proline lines

Since free_radical is being so helpful, here's my question.

There's a region i,i+4,i+8,i+11 of pro-pro-pro-xxx repeated x3 putting about 12 prolines almost colinear when in an alpha helix. Does this pattern have a particular binding partner, or would it bend the helix, and does it have a name? or does this never occur?

Joined: 01/12/2015
Groups: None
Hi brow42, I dont know the

Hi brow42,

I dont know the answer to this question, but I am asking around. Hopefully we can come up with one.


Susume's picture
User offline. Last seen 16 hours 47 min ago. Offline
Joined: 10/02/2011
pro-rich area not common among homologs

1258 has 7 prolines in residues 117-148 (I think this is the area brow is looking at), in the motif
PXXXPXXXPXXXXXXPXXXPXXXXXXXPXXXP. It would certainly be interesting if that area could form a proline-rich helix.

Of the 865 homologs that jpred found for our sequence, only 20 have 4 or more prolines in those 7 spots, and only 5 have 5 or more of them, so it is not a common motif. (Note these are not homologs with solved structures, just known sequences.)

toshiue's picture
User offline. Last seen 6 hours 39 min ago. Offline
Joined: 01/31/2016
Groups: Go Science
Speaking of expiration times...

the expiration time of 23:00...PDT, as usual?

Joined: 01/12/2015
Groups: None
This is because I have only

This is because I have only posted 1 puzzle before and I had a little trouble posting this one. I dont mind extending the time for this phase.

Joined: 04/20/2012
Groups: Go Science
Some papers to read?

In http://jb.asm.org/content/194/10/2614.full.pdf cited above,
references 5, 25, 27, and 36 sounded interesting to me. I was
able to download them all from home. The next trick is to read
them. Perhaps some of you will have time to read them before I do:

(5) http://jb.asm.org/content/175/16/4957.full.pdf

(25) http://www.nature.com/nature/journal/v396/n6712/pdf/396707b0.pdf

(27) http://www.sfu.ca/~mpaetzel/publications/Paetzel_SPase_Review_ChemRev_2002.pdf

(36) https://www.researchgate.net/publication/7634944_Type_I_signal_peptidase_An_overview

The following I found while searching for the others:

Joined: 04/20/2012
Groups: Go Science
Thanks jmbrownlee333 (Anfinsen_slept_here) !

I only knew about the http://jb.asm.org/content/194/10/2614.full.pdf
article above because jmbrownlee333 gave a link to it in Veteran Chat
a few days ago. Thanks to jmbrownlee333 for sharing this link!

Joined: 09/24/2012
Groups: Go Science
Just a comment and a question from a "non biochemist" player

All your discussion seems very complicated (but interesting) to me. Frankly said, I just tried designs from "artistic" inspiration and using idealSS and remixes to help the protein show me some direction to move. It gives relatively good point results but I suppose there will be 1 chance on 100000000 that my design is the right one !
I suppose that this is the purpose of this puzzle: many crowd "non expert" solutions ("out of the box") and, who knows, this could inspire the biochemists for some parts of the protein.

BTW, a question: our best scores seem very small as compared to the number of segments.

Does Nature always find the minimum energy OR are some natural proteins in kind of "local minimum" ?

Susume's picture
User offline. Last seen 16 hours 47 min ago. Offline
Joined: 10/02/2011
Another active site?

There is a group of mostly blue sidechains that is extremely conserved across homolog sequences: 156-166, GDNRxxSxDSR. Do we have any idea what these are for, or any clue where they should go?

They are even more strongly conserved than the serine-lysine catalytic dyad, half of which (the lysine) is in a different part of the sequence from the TB in about a quarter of the homologs (including, by the way, the related E. coli protein whose structure is solved).

The Nature article by Paetzel et al that jeff101 posted a link to, which is about the E. coli protein, describes two pockets that help the E. coli protein stick to its targets (which are the same kind of targets our TB protein wants to stick to) - but those are both hydrophobic pockets. Could this mostly blue part of the protein also be involved in sticking to targets? It must do something both important and very specific, or the amino acids would vary more across homologs than they do.

Joined: 04/24/2014
Groups: None
FYI Puzzle Close Time

I've been told the puzzle is closing at the posted time tomorrow as normal, with no extension.

Joined: 06/06/2013
Groups: Gargleblasters
Thanks to Jeff, Susume & JMBrownlee -- feedback request from Dev

Thanks to Jeff for the combination calculations. I think my head hurts thinking of the combinations. My probability course predated calculators and consisted of a little yellow pamphlet of about 30 pages, and for statistics we used a book of tables to look things up in case our estimating skills had us look at the wrong band of the slide rule. I graduated college undergraduate with a math minor and without a calculator as they were still too expensive, even for the professors.
And thanks to Susume for her insights into where the binding pocket for the catalyst might be (as well as JMBrownlee) I don't think any of this fixes my fold for TB :D But it makes me feel better about having to do what I could with the little time I had this week. I would love a redo of the TB puzzle with more direction and enough time for those of us with slow computers to fold a solid start. I'll pass a few smaller puzzles up so long as it is not an ED and doesn't require trying to learn to read anything in stick mode. All I know is what I have is likely of little use to science. And knowing why would be immensely useful to me

Get Started: Download
  Windows    OSX    Linux  
(10.7 or later)

Are you new to Foldit? Click here.

Are you a student? Click here.

Are you an educator? Click here.
Only search fold.it
Recommend Foldit
User login
Top New Users

Supported by: UW Center for Game Science, UW Department of Computer Science and Engineering, UW Baker Lab, VU Meiler Lab,
DARPA, NSF, NIH, HHMI, Microsoft, Adobe, RosettaCommons