Foldit Drug Design Part Two
My name is Sandeep Kothiwale (aka fragmentor). I am continuing the Foldit drug design blog this week. I am a graduate student at Vanderbilt University and developing the drug design module of Foldit. This blog describes the shake/wiggle feature for small molecules which is analogous to the one for protein molecule.
Drug molecules (small molecules) bind to a target molecule (protein in our case) and effect the function of the protein. This change in protein function leads to the desired physiological effect of relieving disease or its symptoms. For example, Imatinib (Gleevec) binds and blocks an enzyme whose over-activity causes leukemia.
As with imatinib, all drug molecules bind their targets in a specific pocket in a particular 3D arrangement. For successful drug design, one needs to recapitulate the expected binding pose of the putative drug (ligand) to the protein. This requires that 3D structure of ligand be determined which is able to bind the target. Spatial arrangement that atoms in a molecule can adopt with respect to each other is called a conformation. A molecule can adopt multiple freely convertible conformations by rotations about individual single bonds. Thus enumeration of 3D conformations is essential in modeling ligand binding in Foldit. As you might know, we use the wiggle feature for enumerating side-chain conformations. This is accomplished using a set of rules that have been identified for 20 or so amino acids from known protein structures in the Protein Data Bank (PDB). As you can imagine enumeration of small molecule conformation is substantially more complex than wiggle for 20 or so amino acid side chains because of large chemical space.
Foldit will use an algorithm that I helped develop for sampling conformations of ligands. It uses information contained in the Cambridge Structure Database (CSD), a repository of small molecule crystal structures (on a side note, the CSD group has let us use their database free of charge!). The algorithm uses a CSD-derived database, the csd-rotamer library that contains statistics about most commonly seen conformations of small molecular fragments. Given a molecule of interest, the algorithm determines which smaller fragments are part of it and uses information in the csd-rotamer library to sample conformations.
During the drug-design process ligand will be built by adding fragments to the base fragments. One could hit the ligand-wiggle button to sample conformations of ligands and let Rosetta (Foldit’s engine) choose the conformation that best fits the binding pocket. We have a video of this cool technology above (and at the link). The video first shows adding a fragment to the base small molecule (shown in orange) and then at 26s, the new fragment rotates. We are using HIV protease as a test case. Check it out!( Posted by fragmentor 76 2091 | Mon, 02/02/2015 - 18:14 | 4 comments )
It's been several months since CASP11 ended in August of 2014, and in December assessors presented their full analysis of CASP predictions at the CASP11 meeting in Mexico. You can see results for all teams and assessor presentations on the CASP website, but we'd like to focus for a minute on Foldit's performance.
There were other "would-be" blue ribbons, in which Foldit players produced first-rate models that we failed to select as our best. Although they did not select it as their top model, the GoScience team submitted the best overall prediction for target TR769! Likewise, WeFold used a Foldit model to develop the best overall prediction for target TR837!
Overall, we were outperformed most notably in the Refinement category by the FEIG team, which uses a vast amount of supercomputing power to explicitly simulate protein dynamics; and in the Contact-Assisted category by the LEE team, which was able to take advantage of "ambiguous contacts" that were not addressed in our Foldit puzzles.
Note: These rankings are calculated using GDT_TS, which is just one metric for evaluating model quality. The CASP website explains some other metrics that might be used to evaluate models.
With a bit of closer examination, we've concluded our troubles in the Refinement category can be divided into several distinct cases:
In the first case, Foldit players were very good at exploring solutions close to the native, but the solutions scored poorly and were not submitted as CASP predictions. Looking at the energy plot for target TR769 below, we can see that players found solutions with GDT_TS as high as 0.94(!), although their energies were less favorable than solutions near the starting model.
Energy plot for TR769. Every red dot represents a Foldit solution plotted against GDT_TS (where a value closer to 1.0 indicates a model closer to the native structure) and Rosetta energy (where a very negative value corresponds to a very high Foldit score). The blue dots represent the five solutions that were submitted as CASP predictions by the FOLDIT team. The vertical black bar represents the GDT_TS of the starting model.
In the second case, which was more common, the "energy funnel" looked good—meaning that models with better GDT_TS had more favorable energies—but Foldit players simply weren't able to explore solutions close to the native structure. In the energy plot for target TR782, for example, we can see that the best-scoring solutions at the bottom of the funnel were also the most similar to the native. Unfortunately, most Foldit players tended to move the protein away from the native conformation.
Energy plot for TR782. Every red dot represents a Foldit solution plotted against GDT_TS (where a value closer to 1.0 indicates a model closer to the native structure) and Rosetta energy (where a very negative value corresponds to a very high Foldit score). The blue dots represent the five solutions that were submitted as CASP predictions by the FOLDIT team. The vertical black bar represents the GDT_TS of the starting model.
Lastly, we saw a few targets that were both difficult to explore and difficult to score. For target TR803, solutions appeared to score better and better as they diverged from the native structure, and most Foldit players spent time moving away from the native.
Energy plot for TR803. Every red dot represents a Foldit solution plotted against GDT_TS (where a value closer to 1.0 indicates a model closer to the native structure) and Rosetta energy (where a very negative value corresponds to a very high Foldit score). The blue dots represent the five solutions that were submitted as CASP predictions by the FOLDIT team. The vertical black bar represents the GDT_TS of the starting model.
In the Contact-Assisted category, we were happy to find that Foldit players could use predicted contacts to make huge improvements in their solutions. In most cases, we posted an initial "Ts" puzzle with a limited set of simulated contacts, and then followed it up with a more complete set of "Tc" contacts. In every instance, more contacts resulted in better predictions.
For example, compare Foldit solutions for Ts/Tc827 with T0827, which was posted without contacts under the guise of 1005: De-novo Freestyle 44. Not only did additional contacts result in further exploration toward the native structure, but the complete contacts also reshaped the energy funnel to strongly favor solutions closer to the native!
Energy plot for T0827, Ts827, Tc827. Every red dot represents a Foldit solution plotted against GDT_TS (where a value closer to 1.0 indicates a model closer to the native structure) and Rosetta energy (where a very negative value corresponds to a very high Foldit score). The blue dots represent the five solutions that were submitted as CASP predictions by the FOLDIT team. The vertical black bar represents the GDT_TS of the starting model.
In the future, we'll be working to see how we can improve scoring in cases like TR769, and how to encourage more exploration for targets like TR782. We're encouraged by the progress Foldit players have made in the use of predicted contacts, and are looking forward to applying this method in future non-CASP efforts. A big thanks to all of our players for their tireless contribution to structural biology research!bkoep 76 871 | Wed, 01/28/2015 - 01:43 | 4 comments )
Foldit drug design introduction
My name is Steven Combs (aka free_radical). I am currently a post doc with a dual appointment at Vanderbilt University and Eli Lilly. I have been working with David Baker’s lab and the developers of Foldit to enable drug design in Foldit.
During one of the developers chats, it was mentioned that players wanted more updates on new developments in Foldit. I will try and update everyone as much as possible on my progress for drug design in Foldit and explain some of the scientific ideas behind the implementations in the game.
To start off, I would like to explain one component that has changed in Rosetta (the underlying software for Foldit) to enable drug design. Rosetta assigns properties to atoms based on the type of atom. These properties can be anything from whether the atom is a hydrogen bond donor/acceptor to whether the atom likes to be exposed to water or not. Further, numerical values used in scoring a residue based on its atoms can be assigned. Many of these values used in scoring are derived from the CHARMM force field, which was developed by Dr. Karplus (who just recently received a Nobel Prize in chemistry!).
While these values help with scoring the residue and atoms, they do little to tell about the configuration of the atom in relationship to other atoms bonded to it. This is extremely important in drug design. For drug design, the type of bonds that can be added or deleted or the types atoms that can be added or deleted need to know what the configuration of the original atom was. For example, if an atom is double bonded to another atom, can that atom form a triple bond? Does it have any free electrons to participate in another interaction? When building small-molecules for drug design, these properties, or chemical rules, need to be known.
To do this, I, along with members from the Meiler lab, have worked to put new atom types into Rosetta. I will use the amino acid TYR as an example of the new atom types. Below is a diagram of TYR with some of the atoms labeled with their properties assigned by Rosetta using the old atom type scheme.
Several properties are encoded onto the atom, such as the carbon being aromatic and the oxygen being polar. These properties are very useful when scoring the side-chain, but we also need to add on a layer for encoding the configuration of the atom.
The rules that we use to encode the configuration are based on the geometrical configuration of the atoms in relationship to what is bonded to and the number of electrons in the bonds (referred to as Gasteiger atom types). For our TYR example, the aroC retains the same original properties, but we also now know its geometry.
The new atom type is C_TrTrTrPi. This means that the carbon has three bonds that are in the trigonal configuration. Trigonal configuration refers the VSEPR rules. The Pi at the end of the naming means that there is one pi-orbital in the system, occupied by one electron. That pi-orbital is free to interact with other hydrogens or other pi-orbitals to form a cation-pi interaction or pi-pi interaction, all which are important for drug design (more on this topic in the future). For the oxygen, it is now labeled O_Te2Te2TeTe. This means that there are two lone pairs in tetrahedral (sp3, Te2Te2) and two bonds in tetrahedral configuration (TeTe).
While amino acids will not see much use for these types of descriptors for drug design, small molecules will. For example, lets look at a cyano group, which is a common group used in drug design.
In the cyano group, the old Rosetta designation for the atom is aroC, but the configuration of that atom is much different than the aroC seen in TYR! If we were to modify the atom, how would we know the configuration of the bonds? This is where the power of the new atom type comes into play. With the new atom typing, we now know that the carbon is linear (the DiDi portion; Di=diganol/linear) and that it has two pi-orbitals (PiPi). This means if we add or replace atoms, we know exactly the placement for the new atoms and the type of interactions this atom can make.
While these modifications may seem small, they greatly enhance the ability of Rosetta for drug design. With the new atom types, we can combine/add/delete/modify residues and small molecules rapidly and with ease.
For the upcoming weeks, are there specific topics that you would like to be addressed? What would everyone like to hear about? If anyone has any questions on this subject, I will be more than glad to address them!( Posted by free_radical 76 2091 | Wed, 01/21/2015 - 17:41 | 11 comments )
Ebola puzzle 1000
We've been quiet about Ebola for a while. I just wanted to let folks know that we have gone over the results from Puzzle 1000, and players have produced some very promising starting points for design. In particular, the top-scoring solution, which came from the GoScience team, has a couple of hydrophobic amino acid residues providing very nice shape-complementarity to the binding pocket, and also happens to form a nice beta-hairpin (with a couple of good backbone hydrogen-bonds) that can serve as a great starting point for further design. The GoScience design is shown in purple in the cross-section below, with the Ebola glycoprotein in green.
The second-place team, the Contenders, also filled the pocket quite well, using two aliphatic amino acid side-chains rather than an aliphatic and an aromatic. This also was in a hairpin conformation. The fact that players were hitting on a consistent backbone conformation over and over also helps us: it tells us that this is the backbone conformation that tends to fit here, narrowing our search.
There were a number of other interesting designs, too, even though some weren't the top-scoring. L'Alliance Francophone, for example, created a good design that filled the cavity well while simultaneously forming some good hydrogen bonds between the target and the peptide. Please continue to share your most interesting designs with the scientists, whether or not they're the top-scoring!( Posted by v_mulligan 76 2091 | Thu, 01/15/2015 - 21:49 | 6 comments )
New IRC Server!
Today we're deploying something that we've been working on behind the scenes recently - the new IRC server!
Our old IRC server was using unsupported and unmaintained software, which made it difficult to update when we needed new features or bug fixes.
You likely wont see much of a change with the new server, but here are a few things that will be different:
* If you want to connect to the new server with your own IRC client, you will need to connect to port 6665 instead of port 6667 on irc.fold.it (We may switch this to the default 6667 at a later date).
* If you're using your own IRC client, you will need to repeat the process of configuring it to identify on the new server (adding your IRC key). You may have added your IRC key for the old server, but this is a new server, so you'll have to do it again.
* You will now only be able to join #global, #veteran, and your group channel.
* Group admins will automatically have oper privileges in their own groups (the group leader is able to flag who is and who isn't admin on the website).
* The server wont automatically ban you if you fail to identify before joining your group channel. As long as you identify, you should be able to rejoin immediately.
* The upgrades include an additional feature for downloading old puzzles. You can now download old puzzles just like you download recipes off the site. You need to be logged into chat in your Foldit client, and also logged into the website. When you are, there is a link on the Puzzle page, immediately above the comments section. Clicking that link will download the puzzle. (This will be enabled soon). Note that we don't officially support old puzzles, but if the puzzle is fairly recent, it should still work!
* The server and chat should be more reliable overall, barring some initial kinks that may have to be worked out.
What wont change:
* Behavior while using the Foldit client chat.
* You'll still IDENTIFY with NickServ as usual when using your own IRC client.
* You can still join #global and #veteran without identifying.
Any new clients that start up will connect to this server automatically, but you'll have to restart your old clients. It may take a while for everyone to restart and migrate to the new server.
Some of our players have already helped to test the new server, but there might be some bugs that we haven't found yet. Feel free to respond to this post if you notice anything that isn't working properly!( Posted by jflat06 76 623 | Tue, 01/13/2015 - 19:55 | 5 comments )