Introducing the Trim Tool!
Hey Foldit players!
We're very excited to announce the addition of Foldit's newest feature: the Trim tool.
Trimming and Untrimming in Foldit is a very important addition that will make it much easier for players to work with much larger proteins than ever before!
In the past, it's been difficult to work with large proteins in Foldit because Foldit features (such as scoring and displaying proteins) are computationally intensive and scale with the size of a protein.
The Trim tool allows for the use of larger proteins without a sacrifice in performance by allowing users to trim a protein down to some set of selected residues.
Simply select the region you want to work with, and use the U hotkey to reduce the puzzle down to the selected region. Once you're done working with that region, hit the U hotkey again to Untrim and return to the full puzzle.( Posted by apetrides 45 262 | Mon, 05/09/2022 - 17:16 | 0 comments )
KLHDC2: New Small Molecule Design Puzzle Series
We’re excited to announce the start of another round of small molecule design puzzles.
We are again looking at molecules which can bind E3 ligases to be used in the future design of PROTAC drugs. This time, we’re focusing on KLHDC2 (Kelch domain-containing protein 2). KLHDC2 is an excellent target for PROTAC binders, as it’s expressed in a wide variety of tissues (so useful for a range of degradation targets) and is conserved across species (useful for early testing in animal models). It’s also a rather tight binder to its target ligand, whose structure in complex with the protein is known. However, we currently only have peptide binders for this protein. Peptides are problematic for potential drugs, as they’re easily broken down by your body, and can have issues getting into cells.
That’s where you come in. In this puzzle series, we’re giving you the structure of the peptide in complex with the KLHDC2 protein. We’re hoping you can rebuild this molecule to preserve the nice binding interactions, while simultaneously making it less like a peptide.
As with the previous VHL design round, we’re working with collaborators at Boehringer Ingelheim (BI), who have volunteered to evaluate the compounds which players design, and synthesize those molecules which look promising. BI also has an existing collaboration with researchers at Oxford University, who have signed up to test the synthesized compounds in their KLHDC2 binding assays.
To help you along, here’s some considerations to keep in mind:
Carboxylate interaction: The core of the interaction between the ligand and the protein is the hydrogen bonding and electrostatic (charge) interactions between the end carboxylate group and a pair of arginines and a serine in the protein. We recommend you keep that interaction (and the carboxylate structure) intact.
Hydrophobic interaction: Experiments indicate a large amount of binding energy comes from the interaction of the methionine sidechain with a hydrophobic pocket. We’d like you to maintain that hydrophobic interaction. The thioether (sulfur) in methionine isn’t ideal from a drug perspective, but you should be able to replace it with some other hydrophobic group.
Amides: The standard amide backbone of a protein is a liability for drugs. See if you can replace those amides while keeping the hydrogen bonds and other hydrophilic interactions they make. Note it’s mostly the standard alpha amino acid backbone which is a problem. Amides in other contexts have fewer issues in drugs.
Drug properties: We want to improve the efflux and membrane permeability properties of the molecule (how well the drug is taken up and moves around the body). Metrics such as TPSA, clogP, hydrogen bond donors and hydrogen bond acceptors are helpful proxies on how the drug will behave in the body. Keep an eye on these objectives, and make sure they stay in range. It’s tricky, because these metrics are a balancing act. Keep the good binding to the protein while threading the needle of the multiple properties.
Torsion Quality: One of the significant issues we saw with the VHL results was rotatable bonds which were in a strained position. The new Torsion Quality objective is a good way to keep track of how strained the rotatable bonds are. We’ve updated things such that wiggling should help, but there’s also a new “Tweak Ligand” tool which should allow you to rotate bonds into a better position. Of course, if you reduce the number of rotatable bonds, they can’t be strained.
Atom number and bad groups: We want to make sure we stay within the “typical” range of number of atoms for each element for drugs, to make sure that synthesis is easy. Also, to work well as a drug, we need to avoid certain groups. Either because they’re too unstable to actually be synthesized, because they’ll react or fall apart as soon as they enter the body, or because they are toxic or otherwise complicate the drug development process. (Note that groups which are bad in open chains may be perfectly fine when present in the context of rings.)
We realize that’s a lot of objectives to keep track of, but we’re hopeful Foldit players are up to the task of balancing all these objectives and coming up with novel molecules which will still bind the protein well.
All participants and game sponsors of current and future small molecule design games commit to complying with the Foldit Terms of Service including those pertaining to intellectual property. All compounds created as part of the collaboration puzzles will be made publicly available. Experimental results from testing the molecules will also be released publicly.( Posted by rmoretti 45 271 | Wed, 04/20/2022 - 19:55 | 0 comments )
VHL Ligand Design Updates
In January we completed a series of 10 puzzle rounds with the goal to redesign a small molecule which bound to the von Hippel-Lindau E3 ubiquitin ligase. (Blog Post) The hope was that Foldit players would be able to make novel changes to the core of the molecule while simultaneously improving the molecule in ways which would make it better for oral (by mouth) drugs.
From all your design efforts, we received over 6500 unique molecules. Some were kinda funky, but there were a number of promising ones in the mix.
Our collaborators at Boehringer Ingelheim (BI) were interested in seeing everything Foldit players were able to produce, so we sent all 6500+ compounds to BI to be evaluated. The first thing to do was to run some automated filtering to get rid of compounds which have major issues. This included compounds which extended too far out of the binding pocket (~150 excluded), as well as those compounds which were too large (~1000) or too small (~150), or fell outside of the acceptable range for the numbers of atoms for various elements (~1000), or had too many rotatable bonds or rings (~1000). They were also filtered for compounds which were outside the desired range for the number of hydrogen bond donors & acceptors (~500) and clogP (~50). Finally, substructure searches were performed for groups which were reactive or otherwise would cause issues in a drug (~500) as well as other problematic groups such as long hydrophobic chains (e.g. aliphatic alkanes, aliphatic ethers, and aliphatic alkenes; ~500).
After all that filtering, approximately 1000 compounds were taken forward into more detailed evaluation. In this round of evaluation, the collaborators attempted to identify compounds which changed the central binding motif of the ligand (the hydroxyproline ring) while improving properties associated with better oral drugs, most notably the TPSA (topological polar surface area). The collaborators also double checked all of the original 6500 molecules, to make sure that the automated filtering didn’t accidentally throw out a promising molecule which could be easily fixed.
When evaluating the molecules, one major issue was noticed. There were a large number of molecules which may have had good binding scores by Foldit, but had strained bond rotations (torsions). It’s unlikely that the molecules would ever actually bind in that position due to the energetic strain those molecules were under. The Foldit score wasn’t capturing this energetic strain, but luckily there are methods developed by the Rarey group at the University of Hamburg which allow BI to evaluate how “abnormal” the torsions are. (We have now incorporated these methods into Foldit – see the recent release.)
From a combination of the torsion strain evaluation, the TPSA predictions, and other such metrics to predict how promising the compounds may be as an oral drug, our collaborators selected about 250 compounds. These included both compounds directly from Foldit players, as well as compounds which take their core idea from Foldit compounds, but which had been modified by BI chemists to improve certain properties or fix issues with synthesizability or the like. These compounds were then redocked in the protein to see if they would easily find the designed binding mode. Those compounds which could be redocked were further evaluated with more computationally involved binding energy prediction methods.
Finally, the promising compounds were evaluated and ranked by a number of experienced medicinal chemists at Boehringer Ingelheim, looking at how good a potential drug they might be, as well as how easy they might be to synthesize. From that evaluation, they came up with a ranked list of 19 compounds which were sent off to be synthesized. – Congratulations to Bruno Kestemont, fiendish_ghoul, equilibria, NeLikomSheet and 5 other anonymous users (user name sharing form), whose work formed the basis of these molecules.
Synthesis of these compounds is now underway. Chemical synthesis is hard, so not all of these molecules might actually be created in the end. But early news is promising, and all of the compounds which are successfully synthesized will be submitted for testing in BI’s internal assays, both for ability to bind to the VHL E3 ligase protein, as well as how well they perform in efflux and permeability (the properties which affect how good an oral drug this might make, and what we were hoping to gauge through the TPSA measure).
As was mentioned at the start, all data generated from this project will be released publicly, with no restrictions on subsequent use. We don’t have the experimental assay data yet, but people interested in the full set of 6500 compounds which Foldit players have generated can download an SDF formatted file. (You should be able to open an SDF file in PyMol, Chimera or other structure viewing programs. Coordinates of the molecules should be placed for binding into the puzzle starting structure.)( Posted by rmoretti 45 271 | Wed, 04/20/2022 - 19:46 | 1 comment )
Experiment results for IL-2R binders
We have lab results from our IL-2R binder experiments! In late 2021, we challenged Foldit players to design a protein binder for the IL-2 receptor, as a strategy to reduce the side effects of cancer immunotherapy. We sourced Foldit solutions to put together a pool of 1997 designs to test for IL-2R binding in the wet lab.
In short, we did not see any strong binders for the IL-2R target.
We tested the binder designs at the UW Institute for Protein Design, using fluorescence activated cell sorting (FACS). You can read more about the FACS technique in this previous blogpost. To recap, a FACS experiment lets us quickly sort through thousands and thousands of designs, which are displayed on yeast cells and tagged with fluorescent markers.
Below is a preview of the raw experiment results. You can download the data for all 1997 designs here.
pdb_id counts1 counts2 counts3 counts4 counts5 counts6 counts7 2011731_c0001 31 0 0 0 0 0 0 2011731_c0007 125 0 0 0 0 0 0 2011731_c0008 112 0 0 0 0 0 0 2011731_c0013 162 0 0 0 0 0 0 2011731_c0018 270 0 0 0 0 0 0 2011731_c0019 97 0 0 0 0 0 0 2011731_c0024 292 0 0 0 0 0 0 2011731_c0026 146 0 0 0 0 0 0 2011731_c0029 2 0 0 0 0 0 0 ...
The seven “counts” columns correspond to seven different FACS sorts, according to the following schedule:
- Enrichment at 1000 nM target
- Enrichment at 1000 nM target
- Binding at 1000 nM target
- Binding at 100 nM target
- Binding at 10 nM target
- Binding at 1 nM target
Designs by IPD scientists
Alongside the 1997 Foldit designs, we also tested 30,000 designs created by IPD scientists using an automated design method. From the 30,000 IPD designs, we detected 77 binder hits for the IL-2R target.
By and large, these hits reinforce what we already know about binder design. The 77 hits had AlphaFold confidence ranging 80-97% with an average confidence of 92% (vs. 88% among Foldit designs). And the hits all had high Contact Surface, ranging 400-600 with an average of 506 (vs. 432 among Foldit designs).
That's a good sign! Every week we see Foldit players design proteins with similar metrics. And now we're working on hard targets that are difficult for automated design, like the TGF receptor and CD22.
IPD researchers will be following up on the 77 hits to more precisely measure binding mode and affinity, and see if they can be improved for tighter and more specific binding. In the meantime, we'll continue to challenge Foldit players with binder targets! Players can look forward to more design tools, like the recent Neural Net Objective, to help us design ever better binders.
Thank you to all the Foldit players who participated in the IL-2R binder design puzzles. Keep up great work, and happy folding!( Posted by bkoep 45 251 | Thu, 02/24/2022 - 01:50 | 8 comments )
The Neural Net Objective
To help players reach high AlphaFold confidence, we are launching a new Neural Net Objective that can highlight the parts of your protein that are incompatible with an AlphaFold prediction.
A guide for AlphaFold confidence
Since we launched the DeepMind AlphaFold tool in Foldit last summer, players have been able to submit their protein designs for AlphaFold prediction. The confidence of an AlphaFold prediction seems to be a good indicator of design success.
Figure 1. Successful designs (blue) tend to yield AlphaFold predictions with higher confidence than design failures (orange). We would like a way to convert low-confidence solutions into high-confidence solutions.
This is nice because a high AlphaFold confidence gives us some human confidence that our designs will fold in the lab. For especially motivated Foldit players, it suggests when a work-in-progress has become “good enough” and it’s time to start over with another design.
However, a low confidence can be frustrating to work with, because the prediction doesn’t suggest how you can improve your design. You are on your own to try and guess what it is that AlphaFold doesn’t like in your design.
The Neural Net Objective is meant to guide Foldit players towards designs with higher AlphaFold confidence. This new Objective analyzes the underlying data in an AlphaFold prediction, and looks for local regions of your solution that contrast with this data.
Local design quality
Before digging in, let’s re-familiarize ourselves with the concept of local interactions in a protein.
Local interactions occur between residues that are close in sequence -- for example, H-bonding between residues in the same loop. By contrast, non-local interactions involve residues that are far from each other in the protein sequence (although they might end up close to one another after the protein folds).
Figure 2. An example protein design 2003796_1015 by Galaxie, illustrating local and non-local interactions. The green dashed line shows a local H-bond, between two residues that are close to one another in the protein sequence. The blue dashed line shows a non-local H-bond, between two residues from distant parts of the protein chain.
Seasoned Foldit players might remember back in 2017 when we liked to make a big deal about fragment quality. During our design analysis, Foldit scientists would focus on local interactions in a protein design by breaking it down into small fragments (about 9 residues), which could be easily compared with fragments of natural proteins.
We found that the Foldit Rebuild tool was introducing unrealistic fragments into Foldit players’ designs, with shapes that didn’t match normal protein fragments. These unrealistic fragments were preventing the whole protein from folding.
It was convenient that we could isolate the problem to a local issue, because it suggested the problem might be corrected locally as well, without disrupting the rest of the protein design. In some cases, it seemed you could just swap out a single bad loop with a better one, and “rescue” the whole design.
Ultimately, the fragment quality analysis led us to revamp the old Rebuild tool into the newer Remix, and focus on “idealized” loops with well-known ABEGO patterns. Now, without the wacky fragments, Foldit players have able to design creative new proteins with a high success rate, as described in the landmark 2019 Foldit design paper.
Of course, in the full picture, there is more to protein folding than local effects! A protein folding landscape includes lots of important “long-range” interactions that can’t be captured in small fragments. But, if we can pinpoint local problems with a design, these are usually the first places to make improvements.
Fast-forward to 2020, when protein design researchers were starting to discover the incredible power of deep neural networks like trRosetta. (AlphaFold v2.0 was already announced, but not published until 2021.)
Much neural net research originates from the field of 2D image recognition, and was later adapted to other problems (like protein folding). Instead of modeling the 3D protein structure directly, neural nets will often represent the protein structure as a 2D distogram that predicts the distance between every pair of residues in a protein (very similar to a Contact Map). We covered this idea more in-depth in our discussion of AlphaFold v1.0. Distograms are used heavily by AlphaFold v1.0 and trRosetta; the modern AlphaFold v2.0 adds a 3D representation as well, but still uses distograms internally.
One way of comparing a protein design to a neural net prediction is to measure how well the predicted distogram matches the actual distances in your model. If the distances in your model match the predicted distogram from the neural net, then model is in agreement with the neural net.
Figure 3. Visualizing distogram agreement for the design in Figure 2 above. (Left) A heatmap plotting the cross entropy (CE) between the predicted distances from AlphaFold and the actual distances in Galaxie's model; darker cells indicate strong agreement while lighter cells show disagreement. The right heatmap shows the local distogram, ignoring interactions from residues that are distant in sequence. The green and blue squares highlight the same local and non-local interactions from Figure 2 above.
Using distograms to fix designs
In late 2020, we were joined at the IPD by a talented student and Foldit player Susan Kleinfelter, who discovered that we could use distograms to derive especially useful information about local design quality.
Susan noticed that, if you focused only on local interactions and ignored everything else, you could use a distogram to evaluate the local structure of a protein design -- similar to the way we previously evaluated fragment quality. In fact, Susan found that the local distograms from trRosetta were strongly correlated with fragment quality. The local distograms were very good at pointing out regions with local problems.
Figure 4. Distogram agreement predicts fragment quality. (Left) The distribution of local distogram cross entropy for >30,000 fragments from 4000 Foldit designs. Poor-quality fragments (RMSD > 2.0 A) tend to show more disagreement with local distogram predictions (CE > 2.0); good quality fragments tend to show better agreement with distogram predictions. (Right) The fragment quality and distogram agreement for every residue in Galaxie's design. Both fragment quality and distogram agreement indicate a problem region around residues 25-30.
Critically, Susan then showed that these local problems could be corrected with a local solution. She could completely rescue a failed design by mutating only the residues in the problematic region, leaving everything else untouched.
After AlphaFold 2 was published in 2021, we repeated Susan’s trRosetta experiments with the AlphaFold distograms, and showed that it was also very good at predicting fragment quality (although not quite as good as trRosetta...).
Figure 5. Redesigning problem regions in 4000 Foldit designs. (Left) The distribution of AlphaFold confidence for 4000 designs before and after we redesigned problem regions (distogram CE > 2.0). After redesign, significantly more designs pass our goal of 80% confidence. (Right) The distribution of retained sequence identity for the 4000 redesigned models. Our redesign only mutated a few residues in each solution, with most solutions retaining >80% of their original sequence.
Likewise, we’ve found that the problem regions identified by the AlphaFold distogram are sweet spots for redesign. By redesigning only the regions with poor distogram agreement, we were able to drastically improve the AlphaFold confidence with minimal changes to the overall design. In this dataset of 4000 previous Foldit designs, we mutated only 20% of residues on average, and the number of high-confidence designs went from 17% to 44%!
For some designs, like Galaxie’s design 2003796_1015 above, the difference is even more stark. By mutating just two residues in the offending region, we can bring the AlphaFold confidence of this design from 66% to 81%, turning a problematic design into a promising one!
The Neural Net Objective
The Neural Net Objective will be included on all future puzzles that allow AlphaFold submissions.
The Objective can only run if it has an AlphaFold prediction to work with, so most of the time it will simply report “No data”. Use the DeepMind AlphaFold tool to submit your solution for an AlphaFold prediction.
After AlphaFold finishes and you load the result (either Load Original or Load Prediction), then the Neural Net Objective will display the AlphaFold confidence in the upper-left Objectives Panel. Click Show to color your solution according to the distogram analysis. For convenience, the AlphaFold panel also includes a checkbox to Show Neural Net Objective.
A blue color reflects agreement between your solution and the AlphaFold distogram. A red color indicates disagreement, and suggests that residues in this area should be mutated.
Unfortunately, the AlphaFold distogram doesn’t tell us which amino acids will improve confidence, so you may have to play around with different mutations to find something that works. If no mutations can improve the distogram agreement, you might use the Remix tool to try a new backbone shape.
For now, the Neural Net Objective will not award any score bonus or penalty. But we hope that players will find it useful for improving the AlphaFold confidence of Foldit designs. Higher-confidence designs means a higher success rate for lab testing, which will lead us to even more exciting science!( Posted by bkoep 45 251 | Fri, 01/28/2022 - 23:09 | 4 comments )