Name: DistMap1 101679 Fri, 12/11/2015 - 15:50 Mon, 01/25/2016 - 02:29 DistMap1q.txt 1/24/16 312am code

What does DistMap1 do?

DistMap1 was adapted from DistMap0 (http://fold.it/portal/recipe/101659).
DistMap1 finds the distances between every pair of alpha-carbons in the protein
and makes several maps of these distances. Its output includes lines like below:

``````DistMap1: Puzzle 973: Post-CASP Tc853: Server Model Contacts
with Puzzle ID 998376 on win_x86 at 01/24/16 20:08:30
has score 11250.722, 0 bands, and 152 residues.

Use chway to pick which character set (see chlist[chway] below) to use.
Each character will represent a different distance range in the charts.
Some character sets have 10 distance ranges while others have only 5.
Some sets contain non-ANSI characters, which use more bytes than ANSI ones,
making scriptlog.default.xml larger for the same total # of characters.

The following character sets use only ANSI characters:

The following character sets include non-ANSI characters:

Got the following inputs by 01/24/16 20:09:12 for 152 residues:
use only ANSI characters=false, spacing=0,
labels=2, make full-size chart=false,
makecsv=0, csvspacing=2, epsway=0, epsdim=152,
skip5num=0, and sortway=2.

Gathering data and doing calculations...
Got list of 152 AA's and their secondary structures.
Got 11476 distances among 152 alpha-carbons (01/24/16 20:09:22).

Some of the alpha-carbon distance map(s) below use 10 shades:
0 means distance is in range (-Inf, 2).
1 means distance is in range   [ 2, 4).
2 means distance is in range   [ 4, 6).
3 means distance is in range   [ 6, 8).
4 means distance is in range   [ 8,10).
5 means distance is in range   [10,15).
6 means distance is in range   [15,20).
7 means distance is in range   [20,30).
8 means distance is in range   [30,50).
9 means distance is in range   [50,Inf).
Some of the alpha-carbon distance map(s) below use 5 shades:
0 means distance is in range (-Inf, 4).
1 means distance is in range   [ 4, 8).
2 means distance is in range   [ 8,15).
3 means distance is in range   [15,30).
4 means distance is in range   [30,Inf).
All nonzero distances range from 3.75 to 44.71
for the Ca-Ca pairs 35-36 and 58-121 respectively.
For these distances, rms value=21.66, average=20.17, rms deviation=7.89,
so  max= 44.71, avg+rmsdev= 28.07, avg= 20.17, avg-rmsdev= 12.28, min=  3.75,
and max= 44.71,  upper 1/4= 25.79, med= 19.93,  lower 1/4= 14.36, min=  3.75.

L152d-░░░░░░─░▒░░░░░▒▒░─░░░░▒▒▒░░░▒░░░░█
L147d-░░░░░░░▒░░░░░░░▒▒░░░░░░░▒▓▒▒▒░░▒█░
L143k-──░░▒▒░░░─░░──░░░▒▒▒░░░░░▒▒░░░▒█▒░
L138g-───░░░─░░──────░░░░▒░░░░▒▒░▒▒▒█▒░░
L134t-░░░░░░─░░─────░░░░░▓▒▒░░▒░░░▓█▒░░░
E129v-░░░░░░░░░░░░░░░░░░░▒▒▒▒▒▓░░▒█▓▒░▒▒
E125v-──░░░░─░░░░───░░░░░▒░░░░▒▓▒█▒░▒░▒░
L120n-───░░░─░───────░░░░░░───░▓█▒░░░▒▒░
E115n-──░░░░░░░░░░──░▒░░░░░░░░▒█▓▓░░▒▒▓░
E111t-░░░░░░░░░░░░░░░▒░░░▒░░░▒█▒░▒▓▒▒░▒▒
L106d-▒░░░░░─░░░░░░░▒░──░░░▒▒█▒░─░▒░░░░▒
L102g-▒▒▒▒░░░░░░░░░░░░░░░░▒▒█▒░░─░▒░░░░▒
H097k-░░▒▒▒░░░░░░░░░░░░░░▒▓█▒▒░░─░▒▒░░░░
H093t-░░░░▒░░░░─░░░░░░░░▒▒█▓▒░░░░░▒▒░░░░
H088e-──░░░░░░░─░░──░░░░▒█▒▒░░▒░░▒▒▓▒▒░░
L083t-──░░▒▒░░──░░░──░░▓█▒▒░░░░░░░░░░▒░░
L079d-──░░▒▒▒░──░░░──░▒█▓░░░░─░░░░░░░▒░─
E074d-─░░░░▒▒▒░░▒░░░░▒█▒░░░░░─░░░░░░░░▒░
E070d-░░▒▒░▒░▓▒▒▒░░░▒█▒░░░░░░░▒▒░░░░░░▒▒
H065k-▒▒▒░░░─░▓▒░░░▒█▒░──░░░░▒░░─░░░─░░▒
E060p-▒█▒░░░░░▒▒░░▒█▒░░───░░░░░───░───░░
H056f-░▒▒▒░░░▒░░▒▒█▒░░░░░─░░░░░───░───░░
L051n-░░▒▒▒▒▒▒░░▒█▒░░░░░░░░░░░░░──░──░░░
L047s-░░▒▒▒▒▒▓░░█▒▒░░▒▒░░░░░░░░░─░░──░░░
E042y-░░░░░░─░▓█░░░▒▒▒░────░░░░░─░░───░░
E038n-░▒▒░░░░░█▓░░░▒▓▒░──░░░░░░░─░░░░░░▒
E033t-░░▒▒▒▒▒█░░▓▒▒░░▓▒░░░░░░░░░░░░░░░▒░
H028e-─░░░▒▒█▒░─▒▒░░─░▒▒░░░░░─░░──░──░░─
L024i-░░░▓▓█▒▒░░▒▒░░░▒▒▒▒░░░░░░░░░░░░▒░░
H019i-░░▒▓█▓▒▒░░▒▒░░░░░▒▒░▒▒░░░░░░░░░▒░░
H015k-░▒▒█▓▓░▒░░▒▒▒░░▒░░░░░▒▒░░░░░░░░░░░
H010l-▒▓█▒▒░░▒▒░▒▒▒▒▒▒░░░░░▒▒░░░─░░░─░░░
E006t-▒█▓▒░░░░▒░░░▒█▒░░───░░▒░░───░░──░░
L001g-█▒▒░░░─░░░░░░▒▒░────░░▒▒░───░░──░░
||||||||||||||||||||||||||||||||||
LEHHHLHEEELLHEHEELLHHHLLEELEELLLLL
0000000000000000000000111111111111
0011122334455667778899001122233445
1605948382716050493837261505948372
gtlkiietnysnfpkdddtetkgdtnnvvtgkdd

152-675785650 DistMap1: for all 152 amino acids
133-767884605    min Ca-Ca dist=  3.75 for  35-36
114-867876066    max Ca-Ca dist= 44.71 for  58-121
095-757770645 Ca-Ca dist ranges (chway=1 rev=0):
077-868807788    0=0 for  0-2,  1=1 for  2-4
058-677087887    2=2 for  4-6,  3=3 for  6-8
039-670787775    4=4 for  8-10, 5=5 for 10-15
020-707765667    6=6 for 15-20, 7=7 for 20-30
001-076687876    8=8 for 30-50, 9=9 for 50-Inf

152-░░▒░─▒░▒█ DistMap1: for all 152 amino acids
133-░░░──▒░█▒    min Ca-Ca dist=  3.75 for  35-36
114-─░░─░░█░░    max Ca-Ca dist= 44.71 for  58-121
095-░▒░░░█░▒▒ Ca-Ca dist ranges (chway=24 rev=0):
077-─░──█░░──    █=0 for  0-4
058-░░░█─░──░    ▓=1 for  4-8
039-░░█░─░░░▒    ▒=2 for  8-15
020-░█░░░▒░░░    ░=3 for 15-30
001-█░░░─░─░░    ─=4 for 30-Inf

152-▓▓▒▓█▒▓▒─ DistMap1: for all 152 amino acids
133-▓▓▓██▒▓─▒    min Ca-Ca dist=  3.75 for  35-36
114-█▓▓█▓▓─▓▓    max Ca-Ca dist= 44.71 for  58-121
095-▓▒▓▓▓─▓▒▒ Ca-Ca dist ranges (chway=24 rev=1):
077-█▓██─▓▓██    ─=0 for  0-4
058-▓▓▓─█▓██▓    ░=1 for  4-8
039-▓▓─▓█▓▓▓▒    ▒=2 for  8-15
020-▓─▓▓▓▒▓▓▓    ▓=3 for 15-30
001-─▓▓▓█▓█▓▓    █=4 for 30-Inf``````

The final map above is small enough to appear in the Recipe Output window,
so it can be included in a snapshot of the protein when a puzzle ends.

Below is a sample snapshot made right after the above DistMap1 run:
http://fold.it/portal/files/chatimg/irc_421719_1453688652.png

Default options

Jeff. It's impressing with the chart in the log.

Could you set the default options so that we see these sharts by default please (without beeing obliged to export the output to external software)?

Using the defaults outputs only ANSI characters.

I set the defaults to output only ANSI characters.
chlist[23] and chlist[24] for chway=23 and 24
contain shaded blocks, which are non-ANSI characters.
I think allowing any non-ANSI characters makes the
scriptlog file double in size.

DistMap1 can make *.eps plots:

Below are some *.eps plots made by DistMap1 for the same protein structure as above.
Note the resemblance between these distance maps and the Contact Map shown in the Foldit GUI above.
Click on the plots below to see more details.

Programs like GSview can display *.eps plots and convert them to other formats (like *.png).
The DistMap1 inputs epsway and epsdim control how these plots appear:

epsdim picks how many residues appear in the plot.
Here all epsdim are 152 to include all the protein's alpha-carbons.

epsway=1 uses 5 shades (5 distance ranges).
epsway=2-4 use 10 shades (10 distance ranges),
but epsway=4 gives the most contrast to short distances,
and epsway=2 gives the least contrast to short distances.

DistMap1 can make 3 types of charts for input into Excel:

The first type of chart is a 'peel-the-onion' chart starting like below,
listing pairs of alpha-carbons and the distances between them,
with largest distances first, and smaller distances later.
Once an alpha-carbon is listed, it will not be listed again.
Surface amino acids should give larger distances and so
should be listed before core ones.

``````SS,AA#,AA,SS,AA#,AA,  dist
H, 58, s, L,121, t, 44.71
L, 55, d, L,119, d, 42.85
L, 54, d, L,123, s, 41.89
L,  1, g, L,118, s, 41.68
E,  4, k, L,120, n, 41.59
L, 59, t, L,138, g, 40.52
L, 53, n, L,137, n, 40.17
H, 57, s, L,122, k, 40.08
L,  3, e, L, 77, p, 39.02
L, 30, e, L,132, d, 37.10
H, 56, f, L,136, g, 36.60
E,  5, m, L,117, k, 36.32``````

The second type of chart is an 'AA-stats' chart starting like below,
listing statistics like the maximum, minimum, average, median,
rms deviation, upper quartile, and lower quartile for the
distances from each alpha-carbon to all the other ones.

``````SS,AA#,AA,  dmax,davg+d,  davg,davg-d,  dmin,d=rmsd,  drms,  dmax, dtopq,  dmed, dbotq,  dmin
L,121, t, 44.71, 36.00, 26.77, 17.53,  3.81,  9.24, 28.32, 44.71, 33.01, 27.63, 21.81,  3.81
H, 58, s, 44.71, 34.90, 25.30, 15.69,  3.82,  9.60, 27.06, 44.71, 32.70, 25.61, 18.50,  3.82
L,119, d, 44.09, 35.32, 26.06, 16.80,  3.79,  9.26, 27.65, 44.09, 32.54, 26.30, 20.46,  3.79
L, 55, d, 43.97, 34.82, 25.41, 16.00,  3.85,  9.41, 27.09, 43.97, 32.58, 25.77, 19.64,  3.85
L,120, n, 43.42, 35.11, 26.12, 17.14,  3.81,  8.99, 27.63, 43.42, 32.56, 26.74, 20.84,  3.81
L, 54, d, 43.11, 33.50, 24.43, 15.35,  3.84,  9.07, 26.06, 43.11, 31.09, 24.83, 18.21,  3.84``````

The third type of chart is an 'all-by-all-distances' chart starting like below,
giving a grid (here 152x152) with one row and one column for each amino acid.
This grid's elements are the distances between each pair of alpha-carbons
for the amino acids of each element's row and column. This grid is symmetrical
about its diagonal, and its diagonal's elements should all be zeros.

``````SS,AA#,AA,   L1g,   L2e,   L3e,   E4k,   E5m,   E6t,   H7n,   H8g,   H9q,  H10l, ...
L,  1, g,  0.00,  3.83,  5.46,  8.00,  9.36, 13.13, 14.89, 17.60, 15.61, 14.95, ...
L,  2, e,  3.83,  0.00,  3.80,  5.51,  6.24,  9.91, 11.21, 14.12, 12.43, 11.49, ...
L,  3, e,  5.46,  3.80,  0.00,  3.80,  6.73, 10.33, 11.75, 15.07, 14.23, 13.54, ...
E,  4, k,  8.00,  5.51,  3.80,  0.00,  3.80,  7.00,  9.03, 12.12, 11.68, 11.75, ...
E,  5, m,  9.36,  6.24,  6.73,  3.80,  0.00,  3.82,  6.25,  8.82,  7.93,  8.41, ...``````

All 3 of these charts can be delimited by commas (as above) or spaces,
both of which can be pasted into spreadsheet programs like Excel.

Why use Distance Maps?

Distance and Contact Maps show the protein in a way that does not depend on the protein's orientation in 3D space.

If two structures for the same protein look different in 3D, it could be (1) because they are identical structures viewed from different directions or (2) because they are truly different structures. Looking at their Distance or Contact Maps can tell if the two structures are actually the same or truly different.

Homologous proteins give similar-looking Contact Maps:

p.247 of the book "Proteins: Structures and Molecular Properties"
by Thomas E Creighton, Macmillan, 1993, ISBN 071677030X or 9780716770305
shows Contact Maps for the homologous proteins alpha-chymotrypsin and elastase
that look very similar to each other.

Where to get GSview?

My windows machine is using GSview 4.9 2007-11-18 and GPL Ghostscript 8.71 2010-02-10
to view and convert *.eps image files to other formats like *.png.
The executables for these seem to be gsv49w32.exe and gs871w32.exe.
http://pages.cs.wisc.edu/~ghost/gsview/ has links for GSview 4.9 & 5.0
(gsv49w32.exe & gsv50w32.exe) and GPL Ghostscript 8.64 & 9.01 (gs864w32.exe at least).

Tip: How to export?

Immediately after using the recipe, you can find the log of the script here:

computer C:/Foldit/scriptlog.track (xml file)

For instance, if my track is called evo, I'll find the log in the xml file
C:/Foldit/scriptlog.evo

From there, I can copy-paste the all log (or see it on the full windows), of copy-paste only the sectionin the right format (like excel etc).

LuaRandom2 might make *.png images from DistMap1 output:

It may be possible to use part of the scriptlog file output by DistMap1 as input to the standalone Lua script LuaRandom2 (http://fold.it/portal/recipe/101669) to generate a Distance Map in *.png format, but I have not tried this myself. I would try chway=1,9, or 10, chart spacing=0 or 1, chart labels=0 or 1, and check the box for "Include a full-size chart" in DistMap1 for this.

output for 1189

Hi all,

Just getting caught up on this thread. 1189 output: http://fold.it/portal/files/chatimg/irc_476462_1455386022.png

I'll look into a script that would create png files from the scriptlog. It would be a standalone script, as Jeff mentions.