Back to Recipes Homepage
recipe picture
Recipe: print protein 2.7
Created by LociOiling 2 1
Your rating: None Average: 5 (4 votes)


Name: print protein 2.7
ID: 102523
Created on: Thu, 11/02/2017 - 15:58
Updated on: Thu, 11/02/2017 - 22:58

Update of "print protein lua2 V0" by marie_s. Version 2.5 fixes rulers and breaks up long lines. Version 2.6 saves and restores to conserve moves for sketchbook. Version 2.7 speeds things up.

Best For


LociOiling's picture
User offline. Last seen 32 min 45 sec ago. Offline
Joined: 12/27/2012
version 2.7 - now faster

Version 2.7 of "print protein" is much faster than previous versions. The score information gathered before the main dialog is now saved in a table. The table is then used to prepare the reports. In testing, this cut the report time from 8.6 seconds to 2.2 seconds. Tests were run on puzzle 1446 (320 segments), and averaged across 10 trials. The processing before the main dialog was nearly the same in both versions, taking just under 6 seconds.

Version 2.7 also adds some new information, such as the puzzle number, user name, group name, competition type and rank, and group score and rank. The Rosetta score returned by scoreboard.GetScore is reported in the scriptlog and converted to a Foldit score. (It never seems to match current.GetEnergyScore.)

print protein overview

This version of "print protein" is based on the classic "print protein lua2 V0" by marie_s.

The recipe displays detailed scoring information, including the each segment's score and subscores. The subscores include categories like backbone, clashing, packing, hiding, and ideality.

The recipe also reports the protein's primary structure (amino acid sequence) and secondary structure (helixes, sheets, and loops).

The recipe offers copy-and-paste reporting for most of its key outputs. Complete output is also available in the recipe's scriptlog.

Thanks to spvincent, Timo van der Laan, and HerobrinesArmy for code and ideas. Thanks to brgreening for helping to illuminate the mystery of the total score.

Some of the newer features of "print protein" are discussed in detail below.

ligands and segment count

The recipe first does detailed checking for ligand sections, looking for any segments with a secondary structure code of "M". The recipe calculates the total score for each range of ligands found. For a "normal" ligand, one "M" segment at the end of the protein, the recipe reduces the segment count by one. The recipe issues a warning message to the scriptlog for any other ligands found. The recipe also issues a warning if the "normal" ligand has a non-zero score, but this may in fact be "normal". At one point, the "auto structures" tool incorrectly convert the last segment to a ligand, which inspired these checks. The auto structures bug has been fixed.

scoring information

The recipe detects active subscores using the logic found in "Tvdl enhanced DRW".

In some cases, this logic may suppress certain subscores, such as disulfides, when they have a low total value across all segments. The recipe reports the active subscores in the main dialog and the scriptlog.

The recipe calculates the "filter bonus" by toggling filters off and on, and then checks the total score. In theory, the total score is 8000 points plus the total of all segment subscores, plus the filter bonus. There's is usually a discrepancy, which is reported as "dark" score.

The recipe also reports the Rosetta energy score scoreboard.GetScore, normally a negative number. The recipe converts the Rosetta score to a Foldit score using the formula "FolditScore = 10 * ( 800 - RosettaScore ). Again, there's normally a discrepancy between this converted score and the current score reported by the Foldit client.

sequence information

The recipe reports the primary sequence as a string of single-letter amino acid codes, and the secondary structure as a string with "H" for helix, "E" for sheet, and "L" for loop. The recipe also reports hydrophobicity as a string with "i" for if hydrophobic, and "e" if not hydrophobic.

In the scriptlog, the sequence and secondary structure information is reported both as single strings, and as fixed-length lines with rulers. The single strings are for copy-and-paste into other tools, while the rulers make it easier to find a specific segment.

The recipe also makes the primary sequence and secondary structure are available in a copy-and-paste dialog.

The recipe issues warning messages to the scriptlog if a non-standard amino acid code or secondary structure code is found. The code "x" is substituted for a non-standard amino acid code. (Some previous puzzles have had segments with an amino acid code of "unk".)

modifiable sections

The recipe reports on modifiable sections, including locked and unlocked sections, zero-score sections, and mutable sections. The "mutable segments" report is now optional.

Some puzzles have locked sections with movable sidechains or locked sections that are mutable. Some recipes incorrectly assume that "locked" means not modifiable in any way. Locked sections with zero subscores are likely the only truly non-modifiable sections.

main dialog and segment subscore report

The recipe displays a main dialog before the segment subscore report is produced. Along with reporting other information, the dialog lets you select which subscores are to be included in the report. The mini contact table and detailed mutable reports are also optionally available, as density analysis reports for Electron Density puzzles.

The main dialog has a "more" button, which displays less frequently used options. The hydropathy index (a fixed value based on the AA code), atom count, and rotamer count can optionally be included, and you can select the delimiter character, with the tab character as the default. The number of decimal places reported is also adjustable.

The segment subscore report available in a cut-and-paste dialog, or in the scriptlog. The report now includes a total line reflecting the column totals for the scoring components.

The fixed-width option found in previous versions, for example reporting "12389" instead of "123.89" or "123,89" has been eliminated.

density analysis

For Electron Density puzzles, the recipe offers various types of density analysis.

The density analysis has several sections. The density report appears as a default option on puzzles with a density component.

The first section of density analysis looks at density by amino acid type. Some amino acids outscore others. For example, tyrosine might average a density subscore near 50, but glycine might have average density under 20. The "density by AA" section lists each amino acid found in the puzzle, the number of segments with that AA, the total density score of those segments, and the mean density for that AA. It also lists the worst density score and the corresponding segment number, and best density score and segment number.

The next three sections are similar, but show the density component for "aromatics" (rings) versus non-aromatics, aliphatics versus non-aliphatics, and hydrophobics versus hydrophilics.

Aromatic AAs typically have a much higher density score than non-aromatics. Aliphatics typically score lower than non-aliphatics. Hydrophobics and hydrophilics are close, with hydrophobics typically scoring a bit better.

The aromatics are "f" phenylalanine, "h" histidine, "w" tryptophan, and "y" tyrosine.

For this recipe, aliphatics are "v" valine, "l" leucine, and "i" isoleucine. (Not included: "g" glycine and "a" alanine.)

The first four sections of the density analysis are output in spreadsheet-ready format, similar to the main segment report.

The final section is the "density deviation" report. For each segment, this section shows a "+" if the density subscore is higher than the average for that AA, a "-" if lower, and an "=" if the density subscore is close to the mean.

The density deviation report looks something like this:

"density deviation (above/below mean density by AA)"

The density deviation section is intended to provide a quick indication of which sections are scoring best in terms of density.

The density analysis items are available for copy-and-paste, and can also be retrieved from the scriptlog.

cut-and-paste dialogs

When you click "OK" in the main dialog, the segment subscore report and other selected reports are produced. The cut-and-paste dialog then appears, with text boxes for the subscore report, and the primary and secondary structures of the protein. These fields can be copied and pasted into a spreadsheet or another tool.

If density analysis is selected, the results are reported in a separate cut-and-paste dialog.

To copy a given field from a cut-and-paste dialog, click in its text box. Then use ctrl+a on Windows or command+a on Mac to "select all". Then use ctrl-c or command-c to copy. The selected text can then be pasted into the tool or webpage of your choice.

The use of the tab character as the default delimiter produces more legible scriptlog output (in most tools), and also simplifies pasting data into most spreadsheets, such as Excel and Open Office Calc. At least US English versions, spreadsheets typically recognize the comma-separated value (CSV) format automatically when pasting, and offer the tab character as the default delimiter.

Certain outputs, such as the mini-contact table and the detailed mutable report, and available only in the scriptlog.

The scriptlog file has the name "scriptlog.*trackname*.xml" where *trackname* is the current trackname, or "default" for the default track. The scriptlog file is located in the foldit installation folder, for example c:\Foldit in a Windows environment.

Although the scriptlog is nominally an XML file, with XML tags at the beginning and end, the recipe output is normally just plain text. (A few recipes create XML tags in their output, however.) A normal text editor, such as notepad on Windows, can be used to view the scriptlog. In some cases, you may need to manually select a tool to open the "XML" type. For example, in Windows, right-click the scriptlog file and select "Open with", then "Notepad".

Want to try?
Add to Cookbook!
To download recipes to your cookbook, you need to have the game client running.

Developed by: UW Center for Game Science, UW Institute for Protein Design, Northeastern University, Vanderbilt University Meiler Lab, UC Davis
Supported by: DARPA, NSF, NIH, HHMI, Amazon, Microsoft, Adobe, Boehringer Ingelheim, RosettaCommons