Remix and Fragment Insertion
This post will cover the new Remix tool and the idea of Fragment Insertion in protein design.
A Fragment is a shape for a piece of protein backbone. Fragments can be of any size. A fragment of size 3 will be a shape for 3 residues in a row, size 9 will be for 9 residues.
When we insert a fragment, we are copying the shape of that fragment onto a piece of backbone. Think of it as copy/paste for a piece of backbone shape. Below you can see 3 different fragments in turquoise that were copied onto the backbone by Remix.
The collection of fragments that we copy/paste from is called a fragment library.
We want our fragment library to be filled with the best fragments possible - fragments that we’re confident are good shapes that will give our folds the highest chance of success.
So where does our fragment library come from?
Often times the best approach to protein folding (or anything, really) is to take what works and re-use it.
We have thousands of proteins from nature whose shape we already know. We’re certain that those shapes work because we have physical proof. By looking at these known shapes, we can look for fragments that are common in many natural proteins. We take these and make our fragment library out of them.
Then, when someone needs a shape for a piece of backbone, we look into our library and find fragments which we can copy/paste onto our protein. The tool that does this looking up and copy/paste is called the fragment picker.
Foldit's Fragment Pickers
Rebuild was the first and original fragment picker in Foldit. Rebuild picks from a library of fragments of size 3. When you run rebuild on a piece of backbone, it picks a random sub-piece of size 3 within your selection, looks up a fragment, then copies and pastes it onto your protein.
There are two problems with Rebuild. The first is that only having fragments of size 3 means that if you want a bigger fragment, you’re going to have to combine several smaller fragments, which is less scientifically valid.
The second and larger problem is that Rebuild isn’t very particular about which shapes it chooses. You just ask for fragments of size 3 and it gives them to you*. Then Rebuild does a bunch of work behind the scenes to try and make the fragments fit into your selection.
You can see the actual fragments Rebuild is trying to insert (no behind the scenes work to make it fit) by putting a cutpoint at one end of the selection and disabling cutpoint forces. Take a look at some of the results below:
As you can see, these fragments really don't fit very well. The blue band represents the gap between where the endpoint is and where it needs to be. The only way to make them "fit" requires destroying the original fragment in the process.
Remix tries to solve both of these problems. Firstly, Remix's fragment library has fragments from size 3 up to size 9.
Second, and more importantly, when you ask for a fragment out of Remix, it instead looks for a fragment that will naturally fit between the ends of your selection.
Here are some results of Remix without any modification after insertion:
All of these fragments just fit. The yellow band shows you the cutpoint is already close enough to be closed. In reality, we still "fix" the fragments from Remix as well, but they only need minor adjustment, so the fragment is left intact.
What this means is that Remix is much better at leaving you with more scientifically valid fragments.
* Rebuild does take your backbone sequence and secondary structure into account when doing a lookup, but no conformation information.
Using The New Remix
Remixing through the UI
To Remix a piece of backbone, select the piece and hit the Remix button (or, in the original interface, right click and hit the Remix button). This will pop up the Remix UI.
Let's go over the UI here. First off, there are arrow keys left and right. These let you cycle through the various fragments that Remix found for this selection. You can see which fragment you're currently looking at in the text below the buttons. The first fragment is always what you started with before you ran Remix, and won't change anything.
The Stop button accepts the currently shown fragment. You can also use the stop button in the upper left hand corner of the screen, and it will have the same effect.
Next to the text showing which fragment you have selected, you can also see a score. This score allows you to get an idea of how well fragments score without having to close the tool and shake the selection. Keep in mind it's only a rough estimate, and is only useful for comparing the results relatively. Your final score will likely be nothing like the score shown here.
Lastly, we have the Plus button. This button gives you access to the quicksave functionality of the new Remix tool.
When you press the button, you will see a new button pop up above.
Pressing this new Plus button will quicksave this fragment to Slot 1.
After saving, you can click that quicksave button to go back to that fragment. Pressing Plus for an additional fragment will give you the option of saving to a new quicksave slot, or overwriting an existing one. You can also press the Stop button that replaced the Plus in order to cancel the quicksave.
When you've got all the fragments you want, press Stop and these fragments will be available in your Quicksave slots. Pressing Ctrl-1 through Ctrl-8 will give you access to them.
Remixing with Scripts
Remix can also be accessed via scripts. Here's a quick tutorial on how to use it:
The function call for Remix is
When you run this function, it will Remix your selection and place up to num_results different results into your quicksave slots, starting at slot start_quicksave. It will return the number of results that were actually inserted, as sometimes there will not be as many as you have requested available.
Let's look at an example:
If there were 3 or more results, this would print "3" and place the results in quicksave slots 5,6,7.
If there were only two results available, it would print "2" and you would only have results in quicksave slots 5 and 6.
Fragment picking is best used for figuring out the loops of the protein. Loop shapes vary a lot more than other secondary structure, and so finding good loops is harder, and using actual fragments from real proteins becomes more important.
In general, it is best to use the larger fragments, since that gives you a bigger piece of good backbone in a way that several smaller Remixes may not.
Don't put too much value in the estimated score shown in the Remix UI. Differences of less than 100 points aren't very meaningful.
In the event that Remix doesn't find anything, try selecting one more or one less residue on either side of the selection. Often times this will be enough to give you a better range of results. This is easy to do in the Selection Interface, but requires some secondary structure reassigning in the Original Interface.
Lastly, after inserting a fragment, any changes to that selection will put you further and further from the fragment. As such, it's best if you can find a fragment that requires minimal modification to make it into your final design.( Posted by jflat06 80 669 | Wed, 05/11/2016 - 00:16 | 5 comments )
Drug Design Update: Merk Molecular Force Field
Today we are releasing a new way to wiggle small molecules, the Merk Molecular Force Field (MMFF). This wiggle action uses a different scoring scheme (MMFF) than the current wiggle button. In the current implementation of wiggle, it is possible for the small molecule to “fold” in on itself. With MMFF, this shouldn’t be a problem. MMFF is also very good at optimizing hydrogen bonds and the torsional space of the small molecule. There are some optimization problems that need to be worked out with MMFF (as you can see from the videos), but for now, give it a try and let us know what you think! It should be noted that this is only available through the selection interface, for now.
In addition to the new wiggle of molecules, there are numerous bug fixes in this release. These include fixes for:
1) When switching menus, custom geometry does not disappear. (changing from design to pulling the molecule around)
2) Game freezes when rotamers are generated
3) Too much output in logs
4) When loop building around the ligand, game crashes
5) Replacing atoms crashes - a lot
6) Selection interface quirks
Enjoy, and please post all your bugs in feedback and suggestions and science questions right here in the thread.( Posted by free_radical 80 1495 | Tue, 05/03/2016 - 19:06 | 10 comments )
Drug Design Update: Tool Talk
We are now ready to deploy a series of tools for small molecule drug discovery! Our goal is to release new tools on a rolling basis for the next couple of weeks. Because the tools are still being tested and not guaranteed to be bug free, we have created an “experimental” user group. This user group is open to everyone who wants to test the drug discovery tools; however, we do ask, that you report any bugs that you find or any suggestions that you have for the interface. After all the tools have been tested, we will release the drug discovery tools to Foldit’s main client.
You should expect this build to be experimental. This means that you have a chance of losing your designs. Because of this, it is highly recommend that you keep a separate install outside of your main client. This will ensure that you do not lose your main game puzzle progress.
In order to get access for the experimental group, follow the steps for access to the devprev build. Instead of replacing “main” with “devprev” in the options.txt file, type “experimental”.
"update_group" : "experimental"
The first tool is a very simple tool and is an expansion of the very first puzzle’s tool released for small molecule drug discovery. It is simply named, “Ligand Design Tool”. This tool provides the ability to change the identity of elements, create bonds between atoms, add a predefined set of fragments, delete atoms, and delete bonds. The Ligand Design Tool is available in both the Selection Interface and in the original Foldit interface.
You may want to change your view to the ligand design view. This will let you see the hydrogens on the small molecule, which can be used to extend the ligand by a single atom. You can access this by changing the view options in the advanced options menu.
In addition to releasing the first tool, we have three new tutorials. The tutorials are a work in progress and will be updated when we have some down time. The tutorials are particularly exciting as they follow the progression of a scientific team as they design an inhibitor for the FKB binding protein.
The puzzle that we are using for testing is Dihydrofulate Reductase (DHFR). Read on to get some more information on DHFR!
Finally, here is what to expect in the upcoming weeks. We will have a new blog post for each topic explaining in detail the science behind each tool and the tool's purpose:
Ligand minimization with a new force field, MMFF, and a set of new filters to guide you in designing small molecules, the rule of five filter and a similarity filter. We will have an explanation of all these features later on, but here is a video of the MMFF minimization.
There is also the ligand queueing interface. This tool allows us to give you small molecules that have been pre-identified (through experiments or virtual high throughput screening) that might bind in the target protein. This tool will also allow you to share your small molecule designs between your teams.
Finally, we are also creating a tool that lets you design small molecules like medicinal chemists. This is called reaction based drug design and provides a synthetic pathway for organic chemists to design the small molecule you create.
Additionally, I should be around in chat on Friday (April 29, 2016) around 2PM Eastern to help answer questions! Think of it like an "office hour", where you can drop in and get things answered versus an actual scientist chat.
A Ramachandran plot is a way to examine the backbone conformation of each residue in a protein. It was first used by G.N. Ramachandran et al. in 1963 to describe stable arrangements of individual residues of a protein. Today, a Ramachandran plot is frequently used by crystallographers to identify protein models with an unrealistic backbone.
As many of you may recall, each residue of a protein has two rotatable bonds, which we designate φ and ψ. If we take a protein structure and measure the rotations about these bonds (between -180 and 180 degrees), then we can plot each residue with respect to its φ (x-axis) and ψ (y-axis). The result is a Ramachandran plot, where each black point is a residue of the protein:
Certain rotations are more stable than others: white areas of the Rama plot are unstable, and a residue in this space will have a bad backbone score; colored areas of the Rama plot are more stable, and a residue in this space will have a better backbone score.
The stable areas of the Rama Map in Foldit are divided into four regions, called ABEGO regions, and are colored accordingly:
- Red: Right-handed helix (characteristic of α-helix)
- Blue: Right-handed strand (characteristic of β-strand)
- Green: Left-handed helix (uncommon, except for GLY)
- Yellow: Left-handed strand (very uncommon, except for GLY)
Because the 20 different amino acid types have different properties, each amino acid type has a slightly different Rama profile. For example, most amino acids have a side chain that would clash with the backbone in a left-handed helix, so maps of these residues have only a faint green region. However, glycine has no side chain and can easily adopt a left-handed helix conformation, so its map has a large, intense green region.
Mouse over a point in the Rama Map to see its residue type and number in the upper right corner.
Click on a point to see the specific Rama profile for its amino acid type; this also selects the residue in Selection Mode.
Click and drag a point to change the φ and ψ rotations of a single residue's backbone.
The viewport at the top of the Rama Map will focus on a selected residue, and simply shows the local configuration of the protein backbone around the selected residue. Each residue in the viewport is colored according to the ABEGO region in which it lies. The ABEGO coloring scheme can also be applied to the main Foldit console in the View Options with View->AbegoColor.
When designing a protein, there are usually a number of different loop backbones that can connect α-helices and β-strands. However, we've found that certain types of loops occur frequently in native proteins, and that these "ideal" loops can be distinguished by ABEGO patterns. For example, the most common way to connect two β-strands is by a short hairpin, with two residues in left-handed helix (green) conformation.
The Foldit Rama Map includes a gallery of ideal loops, located in the drop menus in the upper right corner. Each drop menu displays a handful of ideal loops that can be used to connect some combination of α-helices and β-strands. These are provided as a reference for Foldit players, and we encourage players to try to incorporate these loop structures in their designs. Within each drop menu, the most common loops are listed at the top, but a less common loop may be preferred depending on the precise layout of α-helices and β-sheets in a design!
The Rama Map will be available to use in selected design puzzles. It can be accessed from the Actions menu in the Original Interface; or from the Main menu in Selection Interface. Try out the new Rama Map in the latest design puzzle!( Posted by bkoep 80 1005 | Wed, 03/16/2016 - 01:53 | 10 comments )
Sheets and Barrels
One of our players recently asked an interesting question in the Forum, about structural components that differentiate beta-sheets and beta-barrels. We posed this question to the Baker Lab's beta-barrel specialist Anastassia Vorobieva, and here's what she had to say...
Question, by brow42:
We recently had a design puzzle that preferred sheets. Some players made a sheet sandwich and some made beta barrel. We all made hydrophobic cores. But what structural component in real proteins lead to one or the other?
Answer, by bkoep:
I'm not the expert on this, but I can tell you what I do know. Perhaps I can track down another Baker lab scientist to follow up...
In many beta barrels, there are key positions that adopt irregular backbone conformations to reshape the beta sheet. Some positions adopt a "beta bulge," in which an extra residue is inserted between two residues of a beta strand. In the primary sequence, this residue would interrupt the normal pattern of alternating polar and nonpolar residues. There are also "glycine kink" positions, in which a glycine residue deforms the beta sheet by adopting a conformation unfavorable for other amino acids.
In beta sandwich proteins (and in many other structures with beta sheets), the "edge strands" of a beta sheet are often sprinkled with polar residues on the core-facing side of the sheet (which is normally nonpolar). Sometimes these are residues like TYR, which has a hydrophobic region that can contribute to core packing, as well as a polar atom that can extend out into solvent to make hydrogen bonds.
Complete answer, by Anastassia Vorobieva, PhD:
As bkoep pointed out, the presence of glycines in the middle of the beta-sheet (which is rare in beta-sandwiches), the position of the bulges and the presence of edge polar residues are good discrimination criteria between beta-sandwiches and beta-barrels.
However, there is no easy answer and we still have no clear idea of how these structural elements interact with each other. For example, beta-bulges are present in both beta-sandwiches and beta-barrels. Only their position matters. And some beta-barrels have polar residues in their core, especially those that bind small molecules. And to make everything even more confusing, some beta-barrels are able to close without the presence of glycines in the sheet!
To get a little bit more into details, beta-strands like to have a right twist. In other words, the side chains and the hydrogen bonds tend to rotate clockwise along a beta-strand.
This individual strands twist results in the "fan-shaped" beta-sheet mentioned by Susume. However, twist can be constraint in strands located in the middle of a sheet as such strands have to interact with the neighbor strands that have their own twist. In beta-barrels, the curvature necessary to close the barrel is hardly compatible with the individual twist of the beta-strands. As a result, there are some key positions in the barrel where the strand just can't continue to twist to the right and simultaneously interact with the two neighbor. There are several strategies in native proteins to "reset" the twist in such regions:
- Placing one glycine, which is the only residue that can twist to the left.
- Placing a bulge, which forces the right twist at the expense of the hydrogen bonds sometimes.
- Reduce the number of inter-strand hydrogen bonds. In the barrels that are able to close without glycines the strands typically interact with a larger offset.
To design beta-barrel proteins de novo, we are currently working on strategies to predict the key regions in the sheet were the twist will become a problem.
For your models in Foldit, here are some ideas to find twist problems:
- The side chains and the hydrogen bonds rotate to the right along the strand.
- If the twists of two neighbor strands are not coordinated, the side-chains of two interacting residues will tend to bend towards each-other. When two neighbor strands twists are well coordinated, the side-chains are parallel to each-other.
- The bending of the side-chains towards each-other is likely to cause several problems in the structure. These side-chains are likely to clash with each-other and the local torsion of the backbone to be unfavorable. As a consequence, the Foldit score will probably be negatively affected if one tries to force closure of a sheet that is more likely to fold into an open sandwich.
To summarize, the presence of glycines and polar edge-residues are good discriminators between barrels and sandwiches. If they are not sufficient, look for clashes and score problems that would indicate that you are trying to force the closure of a sheet that does not meant to be closed.
Regarding the loop length mentioned above, it should not have an influence in a properly twisting sheet, as the twist of the turn is compatible with the overall twist of the strand. However, spvincent is absolutely right in the case if the twist of the strands is not properly adjusted. Then the constraints building up by that unappropriated twist will be especially high in the loop region and a longer loop will help.( Posted by bkoep 80 1005 | Tue, 03/08/2016 - 22:13 | 2 comments )