Recipe: NetSurfP 1.2
Created by LociOiling 2 1
Name: NetSurfP 1.2
ID: 102397
Created on: Tue, 06/27/2017 - 20:08
Updated on: Tue, 06/27/2017 - 20:08

Convert NetSurfP webpage output into secondary structure prediction and copy-and-paste spreadsheet format. V1.1 handles blank lines gracefully and adds a confidence prediction. V 1.2 lets you copy-and-paste the entire NetSurfP page, and handles missing CRLF or newline on the last data line.

allows copy-and-paste of entire NetSurfP page

Version 1.2 of the NetSurfP recipe lets you copy the entire NetSurfP page and paste it into the recipe. Previous versions expected only the "data" part of the page, and were a bit sensitive at that. Thanks to Susume for suggesting the improvements.

NetSurfP is yet another web-based secondary structure prediction service. (JPred is another.)

NetSurfP outputs its results in a columnar format. The predictions for helix, sheet, and loop are expressed as probabilities.

(NetSurfP also predicts the "surface accessibility" of a given residue, which seem to be more or less the inverse of the likelihood the residue is buried in the hydrophobic core.)

The formatting for the NetSurfP results doesn't lend itself to being pasted directly into a spreadsheet.

This recipe does three things. First, it converts the NetSurfP output to a tab-delimited format that can be pasted into a spreadsheet. Second, it creates a secondary structure string. Third, it creates a confidence prediction string. For each segment in the input, the confidence ranges from 0 to 9, with 0 being low confidence.

The secondary structure string can be copied and pasted into SS Edit 1.2 to change the secondary structure of your protein.

To use NetSurfP 1.2, run a NetSurfP prediction. NetSurfP needs the primary structure of the protein as input, as a string of one-character amino acid codes. You can use print protein 2.4 or AA Edit 1.2 to get the required primary structure string.

Once NetSurfP completes its prediction, copy the output to the clipboard. Using NetSurfP 1.2, you can simply copy (control + a or the equivalent) the NetSurfP results.

Start the recipe, and paste the NetSurfP output into the textbox on the first screen. When you click OK, the secondary screen displays three text boxes, one containing the spreadsheet output, one containing the secondary structure string, and one containing the confidence string. The contents of the textboxes can be copied and pasted, and they also appear in the recipe's scriptlog.

The secondary structure string is created by picking the secondary structure type with the highest probability for each segment. The picking logic is quite simple, and doesn't worry about ties or close finishes.

The confidence string is simply the first digit of the probability of the winning structure prediction for each segment, so 0.994 gives confidence "9", and 0.590 gives "5".

This recipe depends heavily on the NetSurfP output format. Any changes to NetSurfP output may require revisions to the recipe.

Sample NetSurfP output:

# For publication of results, please cite:
# A generic method for assignment of reliability scores applied to solvent accessibility predictions.
# Bent Petersen, Thomas Nordahl Petersen, Pernille Andersen, Morten Nielsen and Claus Lundegaard
# BMC Structural Biology 2009, 9:51 doi:10.1186/1472-6807-9-51
# Column 1: Class assignment - B for buried or E for Exposed - Threshold: 25% exposure, but not based on RSA
# Column 2: Amino acid
# Column 3: Sequence name
# Column 4: Amino acid number
# Column 5: Relative Surface Accessibility - RSA
# Column 6: Absolute Surface Accessibility
# Column 7: Z-fit score for RSA prediction
# Column 8: Probability for Alpha-Helix
# Column 9: Probability for Beta-strand
# Column 10: Probability for Coil
E T  Sequence               1    0.865 120.003   0.476   0.003   0.003   0.994
E E  Sequence               2    0.758 132.423   0.324   0.694   0.003   0.303
E E  Sequence               3    0.741 129.488   0.588   0.782   0.003   0.216
E R  Sequence               4    0.409  93.707   0.281   0.858   0.002   0.139
E K  Sequence               5    0.380  78.063   0.459   0.923   0.002   0.076
E K  Sequence               6    0.597 122.844   1.114   0.938   0.007   0.055
E E  Sequence               7    0.609 106.340   1.316   0.970   0.001   0.030
B I  Sequence               8    0.066  12.284  -0.022   0.970   0.001   0.030
E Q  Sequence               9    0.436  77.941   0.867   0.970   0.001   0.030
E K  Sequence              10    0.613 126.012   1.216   0.970   0.001   0.030

Sample scriptlog output:

---secondary structure prediction---
---prediction confidence---
