Protein Design Critique: IL-7R Binder Redesign

Hey Foldit players, bcov here with an update on the IL-7R Binder Redesign puzzle series.

You’re doing great so far! I've looked at your solutions from first 4 puzzle rounds, and I think a lot of your designs are going to work! I just wanted to remind everyone that, in addition to the Foldit score you get on each puzzle, in the end you'll also get a binding score based on our testing of these designs in the lab!

Designing a protein to fold precisely is a difficult problem! When we test your protein, we are testing whether the sequence you chose folds into the shape of your solution. In Foldit, you can change your solution into whatever shape you want, but in the lab your sequence might not fold into the shape you wanted. It took scientists decades to figure out the shape a given protein sequence folds into (they call this the Protein Folding Problem). The good news for you though is that the Protein Folding Problem has a really simple answer:

Proteins fold to their most energetically favorable state!

I want to emphasize a few guidelines you can use to ensure your designed fold is the most favorable state:
· Secondary structure - use lots of alpha-helices or beta sheets
· Puzzle score - try to have the best score for your chosen fold
· Short loops - you'll need to use loops, but keep them as short as possible

Next I’ll show some examples and give my thoughts on a few designs from Foldit players. Please note that all of these designs have been chosen because they showcase a single weakness in an otherwise excellent design. We don't mean to disparage anyone's designs—on the contrary, the solutions highlighted in this critique are among our favorites!

A study of two 3-helix bundles

While both of these structures emphasize secondary structure and well-packed cores, design A is more likely to fold because of its shorter loops.

The reason we prefer secondary structure to loops is that loops typically have many alternate conformations (decoys) that score the same or even better than the design model. Shorter loops mean fewer decoys and a better chance of folding as intended. For instance, one can imagine how the loop of design B could misfold so that the third helix is on the wrong side of the bundle.

Bad beta-sheet, better beta-sheet, best beta-sheet

Beta sheets are a tricky secondary structure, because they require distant parts of the protein chain to come together. The point I want to highlight here again is that shorter loops are almost always better. In design C, there are too many loop residues between the helices and sheets. These loop residues are likely to rearrange themselves in real life.

Design D has shorter loops, but I still see a few backbone H-bond pairs that are unsatisfied here. (Also, I'm not so sure about that ARG / GLU zipper there. ARG / GLU like to form helices, so I'd probably go with HIS / THR...)

Design E is an optimized Baker Lab design (not from the IL-7R series), but I wanted to include it to demonstrate my point. Look at how short those loops are! This is a difficult fold to master, but FoldIt players like challenges, right?

4-helix bundles, the good, the bad, and the ugly

When it comes to 4-helical bundles (and really all designed proteins), the name of the game is compact. You want your design to resemble a ball with all portions stabilized by at least 2 other secondary structures. Design H fails just that; it's too long and unsupported. This structure will almost certainly fold into something more compact in real life.

Design G also fails this rule, as it's leaving a large portion of the structure thin and unsupported. Those two helices would have been better on top of the protein like the good example is doing here.

Yes, design F would be better if the helices were longer, but we didn't give players enough residues for that (unfortunately, we're limited to small proteins for our lab experiment). If you run out of residues for good helix packing, you can try beta-sheets. Although, previous experiments have shown that helices are more robust than beta-sheets. So if the choice is between an okay beta-sheet and an okay helix, I'd go for the helix.

Don't try to make additional target contacts

First let me say that these designs are very interesting in that they make additional contacts with the target. Especially in design I, I'm not even sure I could design that with all the tools I have! But, I want to remind everyone that in this design challenge, folding is more important than binding.

You've already been given two helices that are guaranteed to bind the IL-7R. If you can just fold the rest of the protein into a stable fold then you'll have a binder!

Great 3-helix bundle, but that long loop isn't going to fly

Finally, one more design to really hammer home the message of shorter loops. Design K looks great with three well packed helices, but look a little closer and you'll see that a long loop is required to stretch back and meet the third helix. I'll admit, this protein has a chance to work, but with a loop that long, who knows where the final helix will actually fold...



We have a lot more puzzles planned for this series, and we look forward to seeing more designs from Foldit players! Round 5 just closed, and we'll get started on the analysis of those solutions right away. In the mean time, check out the Round 6 puzzle, which is online now!

( Posted by  bcov 79 940  |  Fri, 08/16/2019 - 17:57  |  12 comments )
1
Joined: 05/19/2009
Groups: Contenders
What tools do you use to make the Lab's designs ?

Hi bcov, you write "I'm not even sure I could design that with all the tools I have!". What tool do you use to design your proteins ? Do you use our version of Foldit or a more mathematical approach ? I sometimes wished I had more exact control over what part is stable and what part can move.
Regards,
BP

bcov's picture
User offline. Last seen 2 weeks 4 days ago. Offline
Joined: 11/08/2016
Groups: None
Hi Bletchley, My tools are

Hi Bletchley,

My tools are much more mathematical as you said. I'm also trying these puzzles just like you guys, but my strategy relies much more on using computers than critical thinking. Additionally, I have a library of 40,000 65aa proteins that I can use as a guide.

There are two primary ways that I'm tackling this problem
1. High speed building. Here, I simply take chunks of my 40,000 proteins and graft them onto the puzzle. Almost all of them don't fit, but a few do. Then I look for combinations of two that fit together. For all pairs that work, I then go forward with sequence design and anything that passes my filters I call finished.
2. High speed grafting. Here, I take my 40,000 proteins and see if any are exact matches for the puzzle. I get even fewer hits here as this is a very constrained search, but the few I do get are basically ready to go. All I need to do is copy the puzzle AAs onto my protein and it's ready to order.

Anything that I produce looks great by the things I talked about in my blog post, but there's a problem. Given my 40,000 proteins and the puzzle, there are only so many combinations that are going to work and they all look pretty similar. This is why I was hoping Foldit players would be able to help. The space of protein designs that will work is huge and I know I'm only looking at a small corner of it.

As to why I couldn't make those designs, design I is the real gem here. You can't see it from the picture, but this design has loops that cover the surface making h-bonds with just about every available atom. I wouldn't be able to make this because none of the "chunks" that I insert would exactly match the target like that. It really takes careful thought and subtle tweaks to make something like that. (Or absurd amounts of sampling). Anyways, I don't think that protein would fold, but it looks really cool.

actiasluna's picture
User offline. Last seen 4 hours 55 min ago. Offline
Joined: 03/05/2015
Groups: Gargleblasters
The problem with the frozen structure

... for me anyway - is that segment 102 faces an awkward direction and forces longer loops for any structure I've tried there. Given what is written above about the loops, any structure attaching via seg 102 may not fold on the side of the structure the person folding initially intended.

So far it would be difficult to make a ball-like structure that you state we should try for, at least from the connections given. The locked structure, I am assuming, is to preserve the bonds you want to keep, so if that's the case I understand the intention, but it may be one cause of the problems with the folds as shown above.

The other is the limited additional segments. With an "awkward" loop on 102 and a limit of 5 segs this is a tough one.

And, thinking more on this... what if some of those locked helix segs were able to move - if at least a couple more were unlocked so that they could wiggle and mutate to accommodate fewer loop segs at those turns.

Unless I'm going about it all wrong, the blueprint tools aren't much help at those locked loop points either. This is way above my knowledge but I'll ask anyway - is there a way to make it so that in game these cut points can be joined as in puzzles without locked bits? Maybe figure out a way that the last 2 segs of the locked side of the cutpoint can be closed in play, but the segs can't be deleted? This would help visualize the structure, as cut points often look different when closed. Likely this question has been asked before about other such puzzles.

bkoep's picture
User offline. Last seen 2 hours 3 min ago. Offline
Joined: 11/15/2012
Groups: None
Great feedback!

This is a very difficult problem for all the reasons you mentioned (and that's one of the reasons we're asking Foldit players for help!).

One suggestion for an awkward-facing cutpoint (like segment 102) is to try extending the helix through the cutpoint. Instead of starting your loop immediately at the cutpoint, you could first add one or two residues with a helix backbone (the red region of the Rama Map), so that the end of the helix points in a better direction. Then again, this is tough when you have a small budget for extra residues!

Part of the problem is simply visualization, like you point out. Although you can technically keep building the helix on the other side of the cutpoint, it still looks like your helix is broken by the cutpoint. Maybe we can find a way to let players close the cutpoint visually, while still keeping the cutpoint open behind the scenes (the cutpoint is important for keeping the frozen pieces in place while you manipulate the designable parts).

I also agree that we can improve how some of the tools (like the Blueprint) interact with cutpoints. These kinds of cutpoints are likely to keep showing up in future puzzles, so it's probably worth some attention.

LociOiling's picture
User offline. Last seen 3 hours 27 min ago. Offline
Joined: 12/27/2012
Groups: Beta Folders
more complaints

These puzzles have their share of annoying little quirks.

The permanent cutpoints don't really give you a clue about whether they could be closed. Having two different colorings, like a regular cutpoint, would be helpful here.

The rebuild tool is disabled on these puzzles. The remix tool is there, but it will never find a replacement for segments involved in a permanent cutpoint. So it can be tough to correct bad backbone and ideality scores near the two permanent cutpoints.

Turning on "show backbone issues", and then clicking on the ! icons is one way to clear up problems, especially near the start.

One weird feature is the binder helixes losing secondary structure. Some recipes set everything to loop, but fail to restore structure at the end. The designable sections of the binder helixes can be set to loop this way. For some reason, the auto structures tool won't reset the structure of these segments. I'm guessing it's something to do with the locked segments on either side. It's still possible to set the structure to helix manually, but kind of a pain.

jeff101's picture
User offline. Last seen 3 hours 17 min ago. Offline
Joined: 04/20/2012
Groups: Go Science
Recipe "AA Copy Paste Compare v 1.1.1 -- Brow42" might help:

The recipe "AA Copy Paste Compare v 1.1.1 -- Brow42"
at https://fold.it/portal/recipe/38147 might help you
restore your secondary structures more quickly than
doing it manually.

Joined: 09/24/2012
Groups: Go Science
And this recipe

Recipe: Reset my SS in note 19:

https://fold.it/portal/recipe/49556

It's quick to save and recall your own SS.

spvincent's picture
User offline. Last seen 3 hours 18 min ago. Offline
Joined: 12/07/2007
Groups: Contenders
Exterior sidechains

I've always been under the impression that the identity of the outside sidechains didn't matter too much, and that whatever the mutate tool saw fit to put there was fine. But the comment under structure D about replacing Arg/Glu with His/Thr suggests otherwise: are there recognized pairs of amino acids we should be using for the outside of sheets?

bkoep's picture
User offline. Last seen 2 hours 3 min ago. Offline
Joined: 11/15/2012
Groups: None
SS Propensity

For the most part, you're correct that Mutate will give you reasonable residues at any position.

However, if we look at the distribution of amino acids in natural α-helices and β-sheets, then we do see some subtle SS preferences for certain amino acids. Here's a table that summarizes some early data from Chou's and Fasman's research in the 1970s:
http://www.bmrb.wisc.edu/referenc/choufas.shtml

We've also seen similar results in more recent, better-controlled protein design experiments. Rocklin et al. found some surprising trends in their design of mini-proteins (this is why we sometimes prohibit SER and THR in α-helices). See Figure 4 in this paper:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5568797/

To most players I recommend sticking with the Foldit Mutate tool, which does take some of this data into account, and is optimized to give you the best Foldit score. But if you are interested in exploring SS propensities of your designs, then you could try plugging your sequence into a SS-prediction server like PSIPRED:
http://bioinf.cs.ucl.ac.uk/psipred/

Joined: 09/24/2012
Groups: Go Science
A table of useful information about the residues

Hopefully, the AA information on the Wiki (for Lua scripting) is still correct:

https://foldit.fandom.com/wiki/Lua_Script_Library#fsl.Aminos_--_A_table_of_useful_information_about_the_residues.

It gives the following information about any aminoacid:
{short, abbrev, longname, hydro, scale, pref, mol wt, isoelectric point (pl), van der waals volume, abbrevcap, codon}

abrev is used to call or recognize the AA in the Lua functions
abbrevcap is used in the residues info (clicking on Tab on a residue)
longname is the AA name
hydro gives the preferred hydrophobicity
pref gives the preferred secundary structure

It's used in several mutate recipes like
Mutate No Wiggle
Maaa

Additionally, Jeff101 collected additional info from the literature. I used it in the Maaa recipe (probabilistic options). Here is a copy of (an extract from) Jeff"s library.

--Library of Amino Acid properties by jeff101 (Foldit player)
--v1 22 Jan 2016
-----------------------------------------------------------------
--START Library of Amino Acid properties by jeff101
--Free for non commercial use providing the source is aknowledged
--Source: jeff101 (Copyright) based on the following sources:
-- L75=Biochemistry by Lehninger 2nd Ed 1975 ISBN=0-87901-047-9
-- from Sigma 1998 catalog
-- JJR Frausto da Silva and RJP Williams' The Biological Chemistry of the Elements ISBN=0-19-855802-3
-- L&B=SJ Lippard and JM Berg's Principles of Bioinorganic Chemistry ISBN=0-935702-73-3
-- Alberts' Molecular Biology of the Cell ISBN=0-8240-3695-6
-- CRC 1980-1 61st Edition
-- C&SI=Cantor & Schimmel's Biophysical Chemistry Part I: The conformation of biological macromolecules ISBN=0-7167-1188-5
-- B&T=Branden & Tooze's Introduction to Protein Structure ISBN=0-8153-0270-3
-- Stryer=Lubert Stryer's Biochemistry 4th Edition ISBN=0-7167-2009-4
--
-- LUA transposed by Bruno Kestemont
--USE: All the AAs and properties are in the same order
--Copy-paste the properties you need in your recipes, comment the other ones in order to avoid duplications of names.
--EXTRACT this only an extract for Maaa
AAshort ={'a','c','d','e','f','g','h','i','k','l','m','n','p','q','r','s','t','v','w','y'} --
AAname ={'Alanine','Cysteine','Aspartic Acid','Glutamic Acid','Phenylalanine','Glycine','Histidine','Isoleucine','Lysine','Leucine','Methionine','Asparagine','Proline','Glutamine','Arginine','Serine','Threonine','Valine','Tryptophan','Tyrosine'} --p.73 L75

FreqInProt ={8.3,1.7,5.3,6.2,3.9,7.2,2.2,5.2,5.7,9,2.4,4.4,5.1,4,5.7,6.9,5.8,6.6,1.3,3.2} --frequency in proteins (%)
VolumeVDW ={67,86,91,109,135,48,118,124,135,124,124,96,90,114,148,73,93,105,163,141} --VDW volume in cubic A
Pa ={1.41,0.66,0.99,1.59,1.16,0.43,1.05,1.09,1.23,1.34,1.3,0.76,0.34,1.27,1.21,0.57,0.76,0.9,1.02,0.74} --Pa=a-helix preference
Pb ={0.72,1.4,0.39,0.52,1.33,0.58,0.8,1.67,0.69,1.22,1.14,0.48,0.31,0.98,0.84,0.96,1.17,1.87,1.35,1.45} --Pb=b-strand preference
Pt ={0.82,0.54,1.24,1.01,0.59,1.77,0.81,0.47,1.07,0.57,0.52,1.34,1.32,0.84,0.9,1.22,0.9,0.41,0.65,0.76} --Pt=reverse turn preference

fa ={0.522,0.278,0.351,0.549,0.402,0.19,0.446,0.358,0.383,0.48,0.429,0.263,0.212,0.421,0.282,0.282,0.295,0.409,0.409,0.22} --fa=freq helical
fb ={0.167,0.222,0.137,0.044,0.219,0.138,0.122,0.274,0.126,0.209,0.286,0.113,0.106,0.211,0.154,0.124,0.205,0.282,0.203,0.22} --fb=freq beta

Hfreq ={1.29,1.11,1.04,1.44,1.07,0.56,1.22,0.97,1.23,1.3,1.47,0.9,0.52,1.27,0.96,0.82,0.82,0.91,0.99,0.72} --alpha-helix frequency
Sfreq ={0.9,0.74,0.72,0.75,1.32,0.92,1.08,1.45,0.77,1.02,0.97,0.76,0.64,0.8,0.99,0.95,1.21,1.49,1.14,1.25} --beta-sheet frequency
Lfreq ={0.78,0.8,1.41,1,0.58,1.64,0.69,0.51,0.96,0.59,0.39,1.28,1.91,0.97,0.88,1.33,1.03,0.47,0.75,1.05} --beta-turn frequency
--END Amino Acid properties by jeff101 (EXTRACT)

brow42's picture
User offline. Last seen 1 week 21 hours ago. Offline
Joined: 09/19/2011
Groups: None
disordered loops

bcov and bkoep, can you discuss when, or if, unstructured loops like the https://en.wikipedia.org/wiki/Omega_loop can be allowed in a protein? For example, a lot of the low-density regions in an electron density seem to be very floppy regions that nevertheless seem to be allowed in nature.

Does rosetta's energy function penalize disordered loops than nature? Is it just impossible to determine where a floppy loop can exist without destabilizing the protein?

bkoep's picture
User offline. Last seen 2 hours 3 min ago. Offline
Joined: 11/15/2012
Groups: None
Great question!

If two regions of a protein interact strongly enough, then you can sometimes insert an unstructured loop between them and they will still fold together. However, this usually works best if each of the regions can fold okay on its own (as an independent, well-folded domain). If the regions are small and poorly-folded on their own, then the unstructured loop is more likely to confound folding.

For some protein design projects, we need to fuse two protein domains together as a single chain, but we don't want to engineer a structured loop between them. In this case, we'll connect them with a "GS-linker" which is just a long stretch of alternating GLY and SER residues (GLY is very flexible and tends to be unstructured; SER is polar and helps to keep the linker soluble).

Rosetta (the modeling software underlying Foldit) is not very well suited for modeling disordered loops. Rosetta is pretty good about evaluating the energy of a single, rigid state, but people tend to prefer molecular dynamics simulations to model flexible or dynamic regions of a protein.

Also, disordered loops are generally less useful for designed proteins, where we are mostly interested in precise control over the entire shape of the protein. It's true that a lot of natural proteins have disordered loops and that these loops can be essential for natural protein functions; but the exact role of a dynamic loop is very difficult to study, much less to design for. In Foldit we strongly discourage players from designing long unstructured loops, because they confer little—if any—benefit, and they are very likely to disrupt folding.

Get Started: Download
  Windows    OSX    Linux  
Windows
(7/8/10)
OSX
(10.7 or later)
Linux
(64-bit)

Are you new to Foldit? Click here.

Are you a student? Click here.

Are you an educator? Click here.
Search
Only search fold.it
Recommend Foldit
User login
Soloists
Evolvers
Groups
Topics
Top New Users
Sitemap

Developed by: UW Center for Game Science, UW Institute for Protein Design, Northeastern University, Vanderbilt University Meiler Lab, UC Davis
Supported by: DARPA, NSF, NIH, HHMI, Amazon, Microsoft, Adobe, RosettaCommons