Back to Recipes Homepage
recipe picture
Recipe: print protein 2.8
Created by LociOiling 1 1
5
Your rating: None Average: 5 (3 votes)

Profile

Name: print protein 2.8
ID: 102599
Created on: Thu, 01/11/2018 - 23:14
Updated on: Thu, 01/11/2018 - 23:14
Description:

Update of "print protein lua2 V0" by marie_s. Version 2.5 fixes rulers and breaks up long lines. Version 2.6 saves and restores to conserve moves for sketchbook. Version 2.7 speeds things up. Version 2.8 adds features suggested by recent puzzles.



Best For


Comments

LociOiling's picture
User offline. Last seen 2 hours 10 min ago. Offline
Joined: 12/27/2012
Groups: Beta Folders
new features for complex ligand puzzles, RNA

Version 2.8 adds several new features which deal new puzzle features we've seen recently. This post summarizes the changes, use the "parent" link above to see more on the existing features.

Some of the puzzles have had a complex mix of locked and unlocked segments. Some have had large ligands, and in at least one case, the ligand had multiple rotamers. One puzzle was made entirely of RNA.

Ligands

Ligand reporting has been completely revamped. Each ligand segment is now reported separately. Ligands are no longer excluded from the score table, and most other functions.

Ligands appear in the amino acid string as "x", and since they're now included, you may want to remove them if you use JPred or some other prediction service.

RNA/DNA

For RNA, the base codes reported by Foldit, "ra", "rc", "rg", and "ru" are reported in the score table. They appear in the amino acid string as single-character codes, so again, you may want to be selective. (Similar logic is included for the DNA codes, but it hasn't been tested yet.)

To make things easier, there's a new "type" code, which indicates whether a segment is protein, RNA, DNA, or ligand, types "P", "R", "D", and "M". The codes appear in the score table and in the string section.

Locked backbone and sidechains

The recipe checks for locked segments, and if any are found, reports on them in a new column in the score table, and in the string section. The Foldit functions detect only locked backbone, so the recipe checks for locked sidechains by looking at rotamer counts.

In the score table, locks are reported in a single column. The entry "U U" means both backbone and sidechain appear to be unlocked; "L U" means locked backbone with unlocked sidechain, and "U L" means unlocked backbone with locked sidechain. Finally, "L L" means both backbone and sidechain are locked.

Backbone locks and sidechain locks are reported separately in the strings section as well. The "modifiable" section also looks at ranges where the lock status of the backbone and the sidechains is different.

New columns

On the "More" page, you can now select the long name and abbreviation for each amino acid or nucleobase, which get added to the score report like the other optional columns.

Limitations

There are some limitations. The check for locked sidechains uses rotamer.GetCount, which slows things down, especially on large puzzles. For amino acids that normally have more than one rotamer, having only one rotamer almost certainly means the segment is locked. But glycine and alanine have only one rotamer, so they're always marked as unlocked.

LociOiling's picture
User offline. Last seen 2 hours 10 min ago. Offline
Joined: 12/27/2012
Groups: Beta Folders
Full documentation

Version 2.8 of "print protein" includes better support for puzzles with locked segments. It also has better reporting for ligands. Ligand segments are now included in all relevant sections, where they were previously excluded. For RNA puzzles, the recipe now handles two-character base (nucleotide) codes.

print protein overview

This version of "print protein" is based on the classic "print protein lua2 V0" by marie_s.

The recipe displays detailed scoring information, including the each segment's score and subscores. The subscores include categories like backbone, clashing, packing, hiding, and ideality.

The recipe also reports the protein's primary structure (amino acid sequence) and secondary structure (helixes, sheets, and loops).

The recipe offers copy-and-paste reporting for most of its key outputs. Complete output is also available in the recipe's scriptlog.

Thanks to spvincent, Timo van der Laan, and HerobrinesArmy for code and ideas. Thanks to brgreening for helping to illuminate the mystery of the total score.

ligands and segment count

The recipe has detailed reporting of ligands, looking for any segments with a secondary structure code of "M". Ligands are now included all reports.

scoring information

The recipe detects active subscores using the logic found in "Tvdl enhanced DRW".

In some cases, this logic may suppress certain subscores, such as disulfides, when they have a low total value across all segments. The recipe reports the active subscores in the main dialog and the scriptlog.

The recipe calculates the "filter bonus" by toggling filters off and on, and then checks the total score. In theory, the total score is 8000 points plus the total of all segment subscores, plus the filter bonus. There's is usually a discrepancy, which is reported as "dark" score.

The recipe also reports the Rosetta energy score scoreboard.GetScore, normally a negative number. The recipe converts the Rosetta score to a Foldit score using the formula "FolditScore = 10 * ( 800 - RosettaScore ). Again, there's normally a discrepancy between this converted score and the current score reported by the Foldit client.

sequence information

The recipe reports the primary sequence as a string of single-letter codes, and the secondary structure as a string with "H" for helix, "E" for sheet, "L" for loop, and "M" for molecule, indicating a ligand.

The recipe also uses single-letter codes for RNA or DNA bases. Since amino acids can have the same codes, if any ligand, RNA, or DNA segments are present, the recipe includes a "type" string which identifies what a particular segment represents. The codes are "P" for protein, "M" for molecule/ligand, "R" for RNA, and "D" for DNA.

In the scriptlog, the sequence and secondary structure information is reported both as single strings, and as fixed-length lines with rulers. The single strings are for copy-and-paste into other tools, while the rulers make it easier to find a specific segment.

The recipe also makes the primary sequence and secondary structure are available in a copy-and-paste dialog.

The recipe issues warning messages to the scriptlog if a non-standard amino acid code or secondary structure code is found. The code "x" is substituted for a non-standard amino acid code. Ligands are represented by code "x".

The recipe also reports hydrophobicity as a string with "i" for if hydrophobic, and "e" if not hydrophobic.

Locked segments are reported as single-character codes - "U" for unlocked, and "L" for locked. There are separate strings for locked backbone and locked sidechains. The same information is also presented in other sections.

modifiable sections

The recipe reports on modifiable sections, including locked and unlocked sections, zero-score sections, and mutable sections. The "mutable segments" report is now optional.

Some puzzles have locked sections with movable sidechains or locked sections that are mutable. Some recipes incorrectly assume that "locked" means not modifiable in any way.

main dialog and segment subscore report

The recipe displays a main dialog before the segment subscore report is produced. Along with reporting other information, the dialog lets you select which subscores are to be included in the report. The mini contact table and detailed mutable reports are also optionally available, as density analysis reports for Electron Density puzzles.

The main dialog has a "more" button, which displays less frequently used options. The hydropathy index (a fixed value based on the AA code), atom count, and rotamer count can optionally be included, along with the long names and abbreviations for the amino acid or RNA/DNA base. You can select the delimiter character, with the tab character as the default. The number of decimal places reported is also adjustable.

The segment subscore report available in a cut-and-paste dialog, or in the scriptlog. The report now includes a total line reflecting the column totals for the scoring components.

The fixed-width option found in previous versions, for example reporting "12389" instead of "123.89" or "123,89" has been eliminated.

density analysis

For Electron Density puzzles, the recipe offers various types of density analysis.

The density analysis has several sections. The density report appears as a default option on puzzles with a density component.

The first section of density analysis looks at density by amino acid type. Some amino acids outscore others. For example, tyrosine might average a density subscore near 50, but glycine might have average density under 20. The "density by AA" section lists each amino acid found in the puzzle, the number of segments with that AA, the total density score of those segments, and the mean density for that AA. It also lists the worst density score and the corresponding segment number, and best density score and segment number.

The next three sections are similar, but show the density component for "aromatics" (rings) versus non-aromatics, aliphatics versus non-aliphatics, and hydrophobics versus hydrophilics.

Aromatic AAs typically have a much higher density score than non-aromatics. Aliphatics typically score lower than non-aliphatics. Hydrophobics and hydrophilics are close, with hydrophobics typically scoring a bit better.

The aromatics are "f" phenylalanine, "h" histidine, "w" tryptophan, and "y" tyrosine.

For this recipe, aliphatics are "v" valine, "l" leucine, and "i" isoleucine. (Not included: "g" glycine and "a" alanine.)

The first four sections of the density analysis are output in spreadsheet-ready format, similar to the main segment report.

The final section is the "density deviation" report. For each segment, this section shows a "+" if the density subscore is higher than the average for that AA, a "-" if lower, and an "=" if the density subscore is close to the mean.

The density deviation report looks something like this:

"density deviation (above/below mean density by AA)"
1234567890123456789012345678901234567890123456
-++-++-+++-+=+-+---+++=+++++++++++-+----+-=---

The density deviation section is intended to provide a quick indication of which sections are scoring best in terms of density.

The density analysis items are available for copy-and-paste, and can also be retrieved from the scriptlog.

cut-and-paste dialogs

When you click "OK" in the main dialog, the segment subscore report and other selected reports are produced. The cut-and-paste dialog then appears, with text boxes for the subscore report, and the primary and secondary structures of the protein. These fields can be copied and pasted into a spreadsheet or another tool.

If density analysis is selected, the results are reported in a separate cut-and-paste dialog.

To copy a given field from a cut-and-paste dialog, click in its text box. Then use ctrl+a on Windows or command+a on Mac to "select all". Then use ctrl-c or command-c to copy. The selected text can then be pasted into the tool or webpage of your choice.

The use of the tab character as the default delimiter produces more legible scriptlog output (in most tools), and also simplifies pasting data into most spreadsheets, such as Excel and Open Office Calc. At least US English versions, spreadsheets typically recognize the comma-separated value (CSV) format automatically when pasting, and offer the tab character as the default delimiter.
scriptlog

Certain outputs, such as the mini-contact table and the detailed mutable report, and available only in the scriptlog.

The scriptlog file has the name "scriptlog.*trackname*.xml" where *trackname* is the current trackname, or "default" for the default track. The scriptlog file is located in the foldit installation folder, for example c:\Foldit in a Windows environment.

Although the scriptlog is nominally an XML file, with XML tags at the beginning and end, the recipe output is normally just plain text. (A few recipes create XML tags in their output, however.) A normal text editor, such as notepad on Windows, can be used to view the scriptlog. In some cases, you may need to manually select a tool to open the "XML" type. For example, in Windows, right-click the scriptlog file and select "Open with", then "Notepad".

Want to try?
Add to Cookbook!
To download recipes to your cookbook, you need to have the game client running.
Parent
Children

none

Authors
Sitemap

Developed by: UW Center for Game Science, UW Institute for Protein Design, Northeastern University, Vanderbilt University Meiler Lab, UC Davis
Supported by: DARPA, NSF, NIH, HHMI, Microsoft, Adobe, RosettaCommons