9 replies [Last post]
Susume's picture
User offline. Last seen 1 day 19 hours ago. Offline
Joined: 10/02/2011

I wanted to comment on the mutate function in relation to the Marburg puzzles (but really this applies to all design puzzles). Vmulligan commented that some sheets have an edge strand that does not contribute to the hydrophobic core. I think this is caused by the mutate function, which commonly puts blue sidechains on both the inner and outer surfaces of edge strands. Most people run scripts that use the mutate function repeatedly on design puzzles, and if the blue AAs score even a tiny bit better they are kept by the script. Once the blue sidechains are in place, wiggle will tend to turn them (sidechain and backbone both) further away from the core and toward the water, where if they were orange wiggle would keep them turned more inward.

The only way I kept my 1108 sheets orange on one side (given that they were all edge strands) was by setting them to orange manually and not allowing any scripts to use the mutate function from then on. Players may not realize there is a reason to disallow mutate in the scripts, or they may be unwilling to forego the points that mutate brings. Even after using mutate scripts, they could manually switch those inner surface AAs to orange, do a shake and wiggle, and share that version with scientists, although the backbone would have been optimized for the blue AAs. I suspect the scientists could use Rosetta to substitute hydrophobes on those edge strands and re-optimize if the solution is otherwise promising.

Maybe the core existence filter could give some points for having a large enough number of sidechains in the core, and additional points for those sidechains actually being orange. Then the scripts would prefer the orange ones in those edge positions where they are in contact with the core on one side and water on the other side.

Another problem with the mutate function is that it will put a glycine anywhere there is an unusual backbone angle. Once the glycine is there, the wiggle function has no motivation to make that residue have a more normal backbone angle. I prefer to manually change almost all glycines to something else and require the subsequent rebuilds and wiggles to find a backbone position that the more rigid AA can take, but again I have to not let mutate run in any scripts after that or the glycine is likely to come back. The fragment filter rejects the backbone if it is too unnatural, but wiggle doesn't "know" about that, so you have to luck into a good rebuild before the fragment will be fixed, and if the glycine stays there is no guarantee it will stay fixed.

Mutate also puts glycine anywhere there is no room for a sidechain, and again once the glycine is there, there is no motivation for wiggle or for rebuild scripts to make more room, and the glycine just gets more entrenched as the scripts optimize things around it. Glycines make designed proteins floppy and less likely to fold up as predicted, so I hate getting stuck with them.

In short, if you want to build solutions that meet the scientists' requirements, the Mutate function may be working against you, and after the early game you may be better off without it. Fix your hydrophobes and glycines by hand, maybe add in a couple of prolines for stiffness, and let your non-mutating scripts make them work.

Joined: 06/24/2008
Groups: Void Crushers

I have also noticed that glycines are shoved everywhere. What I have also learned is you can not changed them too soon to something else if you wish to use any script that does mutations. You need to wait until the score is sufficiently high and stable then change them to other residues.

I do not know biology; but why do glycines score so well?

Also, should sheets always have side changes pointing in alternate directions: one in one out?

v_mulligan's picture
User offline. Last seen 3 years 13 weeks ago. Offline
Joined: 03/04/2009
Groups: None
Re: Glycines

See my comments below regarding glycines. You ask a good question.

Regarding sheets, yes, sheets generally alternate their side-chain directions. Occasionally you'll have a funny kink in a sheet that puts two side-chains in a row on the same side of a sheet, but too many kinks can ruin a design. Sheets also tend to be stabilized by beta-branched amino acid residues (valine and threonine, and, to a lesser extent, isoleucine) and destabilized by glycine (too flexible), proline (prevents the hydrogen bonding), or, to a lesser extent, alanine (tends to favour helices). Our scoring reflects this a little bit, but not strongly. If you're following Susume's strategy for designing sheets that are hydrophobic on one side and hydrophilic on the other, I'd recommend manually mutating one side to valines and the other to threonines, shaking and wiggling, and then eventually considering other substitutions. (Please take this with the caveat that there might be other strategies that are also very effective, though -- I don't know the single best recipe for success!)

frood66's picture
User offline. Last seen 1 hour 6 min ago. Offline
Joined: 09/20/2011
Groups: Marvin's bunch
when working on a 'science'

when working on a 'science' result (and often on score result at start) I ban all glycine and alanine. I agree with everything u say Subsume...could not have put it better.

Joined: 06/06/2013
Groups: Gargleblasters
I found the best way to keep

I found the best way to keep glycines out of an early mutate (I do ban gly and ala often) is to set an initial fold with aromatics. the aromatics keep spacing a bit wide, and while they may distort the shape they also mitigate against the possibility that the only thing that would fit would be a glycine. the aromatics can be mutated out later, but they help me as an intermediate sort of folder from packing the fold too tight at the start.
Susume's comments seem really helpful and sensible. For those like me who struggle a bit more, forcing in some aromatics early on might also be useful in getting more useful folds

v_mulligan's picture
User offline. Last seen 3 years 13 weeks ago. Offline
Joined: 03/04/2009
Groups: None
We do this too

Many people in the Baker lab have hit on a similar strategy when designing proteins using Rosetta (the design software that underlies Foldit). We either disable alanines and glycines completely during early rounds of design, or we penalize them heavily by turning up their reference energy values. Usually we use leucine (for helices) or valine (for sheets) instead of phenylalanine to help get the spacing of the core, but it's the same general strategy. We'd like to improve our packing algorithm so that you don't have to do this, but for now, I'm glad that players are hitting on the same strategies (and sharing them).

v_mulligan's picture
User offline. Last seen 3 years 13 weeks ago. Offline
Joined: 03/04/2009
Groups: None
Good observations!

As always, Susume has made some very good observations. I just wanted to comment a bit on the glycine issue: those saying that the mutate function puts in too many glycines are absolutely right. It's a problem we see in our designs, too. The mutate function (which is really the "packer" in Rosetta, the protein design software that underlies Foldit) tries to search through many combinations of residue identities and side-chain conformations to give the one that scores the best (lowest energy/highest Foldit score). Glycines tend to be put in more often than they should for two reasons:

First, glycine lacks a sidechain, so anywhere that a sidechain would clash with other residues, the mutate function favours glycine. (The steric clash results in a very high repulsive energy term value, which the mutate function interprets as a pretty negative Foldit score. Glycine is the only one that doesn't show that penalty if the clash is with the first sidechain carbon.)

Second, glycine is the only achiral amino acid of the standard 20: where all of the other amino acid types have a "handedness" to them, differing from their mirror image forms, glycine is completely indistiguishable from its mirror image form. This means that glycine also has less stringent conformational preferences: where any other amino acid will prefer to twist one way but not the other, glycine is as happy twisting either way. If you already have a backbone that twists the "wrong" way, the mutate function will find that any other amino acid type will give a very poor backbone score at the backwards twist, but glycine will score fine, and so it will tend to settle on glycine.

Now, we have a fudge factor called the "reference energy" -- basically a penalty that we add for every glycine. I personally think that the glycine reference energy is set a bit too low (i.e. glycine is not penalized enough), and I tend to manually increase this value when I'm designing proteins myself. I also usually increase this value in the puzzles that I post, so that the mutate function won't favour glycine quite so much. Unfortunately, I have to guess at the best value for the glycine reference energy. Based on the feedback in this thread, I've bumped up the glycine reference energy a bit more for the new, 17-residue Marburg puzzle that just went live, so hopefully the mutate function won't put in quite as many glycines.

For more information on glycine, take a look at the Wikipedia article on Ramachandran plots: https://en.wikipedia.org/wiki/Ramachandran_plot. This is a two-dimensional graph of the backbone energy of a amino acid residue type, as a function of the two rotatable mainchain bonds, phi and psi. You'll notice that the plots for all other amino acids have two clear low-energy regions (one associated with alpha-helices and one with beta-sheets), but glycine's plot is symmetric, with four low-energy regions (the additional two corresponding to mirror-image helices and sheets, since glycine lacks the handedness of the other amino acids).

v_mulligan's picture
User offline. Last seen 3 years 13 weeks ago. Offline
Joined: 03/04/2009
Groups: None
On the design strategy

Incidentally, Susume's suggestions for a design strategy are worth paying attention to. Sheets that are amphipathic (meaning that they have hydrophobic/orange and hydrophilic/blue character on different faces) are often much more likely to fold than sheets that are all hydrophilic, and it is important that every strand have some hydrophobic character. Unfortunately, if other backbone elements aren't already placed perfectly, the game will tend to think that both sides of a sheet are facing the surrounding water and should be covered with hydrophilic (blue) amino acids. The ability to say rationally, "This face should be hydrophobic and this face should be hydrophilic," is the sort of thing that we rely on you, the human players, to do, since our automatic algorithms tend not to be intelligent enough. (That's not to say that we're not trying to find ways of making them better, and at least in middle strands of sheets, the game will often put in hydrophobic where they should be. But you, the players, are still smarter than the computers!)

I'll add to this that this also applies to helices: helices that are hydrophobic (orange) down one side and hydrophilic (blue) down another are often more likely to form and more likely to pack against other hydrophobic surfaces.

Joined: 09/24/2012
Groups: Go Science
Once again: thanks susume !

By chance, I used not to use mutate function in design puzzle, just because i'm testing a kind of "mutate no wiggle" script, in order to mutate one by one to what i want. That helped me to have good results in some design puzzles, with luck as I notice now ! The latest months, I started again using mutate only to save a lot of time ... and my final results seem poorer.

What we can do with scripts like mnw is "keep hydrophobicity" or prohibiting mutations to a list of AAs. But we loose the optimization embedded in default mutate all function.

What I did not succeed to do yet is a normal "mutate all" with AAs prohibitions. Would it be an option available for the default mutate function?

Something like:

structure.mutate(hydrophobe, ((list of prohibited AAs)))


hydrophobe = nil, 1, 2 (1= only try hydrophobic, 2= only try hydrophillic)
list of AAs= g, ...

Joined: 03/30/2013
Groups: Go Science

While I agree with Susume that mutate needs to be used judiciously, I think it is important to remember that nature gave us glycine for a reason. We should not be afraid of using it. It is especially useful in constructing tight turns.


Developed by: UW Center for Game Science, UW Institute for Protein Design, Northeastern University, Vanderbilt University Meiler Lab, UC Davis
Supported by: DARPA, NSF, NIH, HHMI, Amazon, Microsoft, Adobe, Boehringer Ingelheim, RosettaCommons