## "Align Protein to Density" button is confusingly named

Case number: | 699969-993959 |

Topic: | General |

Opened by: | alwen |

Status: | Open |

Type: | Suggestion |

Opened on: | Friday, November 23, 2012 - 22:29 |

Last modified: | Tuesday, October 23, 2018 - 17:57 |

The "Align Protein to Density" button in the Electron Density menu actually flips the protein over inside the density.

The text on the button is confusing.

Especially since the Alignment Tool aligns the protein to the model, and the word "align" does not seem to be used in that sense at all on this button.

Yes, I found that button is ALWAYS make a flipover. Also score ALWAYS drops like hell.

If this is to help us find better place - it is a miss.

I agree that it is confusing. The tool is designed to actually align, however, this is computationally very hard to do. I am considering removing it or renaming it to something like "center protein on density".

Given that it seems to flip the protein in space, I'm wondering if there is an extra or missing minus sign in the spatial calculations somewhere. If so, and that were fixed, the tool might be fine.

It did work on the first two or so ED puzzles, when I was already close.

PCA/SVD gives an ambiguous result, it gives the optimal axes, which are preserved on reflection. I bet susume's right. It's not a missing sign, it's a (50-50? maybe, maybe not. Have to do some monte carlo :-) chance of having a negative determinant. Easily fixed in 3-D. Maybe they just forgot that last step, and it sometimes works and sometimes doesn't for that reason?

I can check the math again, but I think it has more to do with the nature of the things we are trying to align. Using PCA to align two point clouds has issues when the clouds are roughly rotationally symmetric around one of the basis axes. There aren't enough defining features for it to correctly determine its rotation about this axis.

Could it be that we align the bare loop in reverse to the ED and that the button tells us we should have started at the other end in our alignments ? Because that is a situation I am wondering about now. I have it aligned pretty well, yet the score is half of the top scorers and my model gets flipped when I press that button.

Is there a way to help us start at the right end of that loop and save days of work ?

I wonder if the "Align Protein to Density" or "Center Protein on Density"

button would work better if it operated as follows:

(1) Find the average of the protein's alpha-carbon xyz coordinates,

call this the center of the protein, and subtract the center's xyz coordinates

from all of the protein's xyz coordinates so that the center lies at the origin.

(2) Determine a set of xyz coordinates that tell how far the center of the protein

is translated from a different origin fixed within the electron density cloud.

(3) Determine a set of Euler angles that tell how much the protein is rotated

about its center in an xyz coordinate system fixed with respect to the electron

density cloud.

(4) Use the above angles and coordinates to position the protein in the electron

density cloud and note the score.

(5) Use an optimization algorithm like the Nelder-Mead Simplex Direct Search Method

(implemented in Matlab as the fminsearch command) to vary the 3 xyz coordinates in (2)

and the 3 Euler angles in (3) (6 variables in all) to find the best score it can.

While fminsearch is not guaranteed to give the global maximum score, I have seen

fminsearch give good solutions when optimizing 6 variables, so I think it would work here,

at least as well as the algorithm Foldit has been using so far. Exploiting the periodicity

of both the Euler angles and the electron density cloud, as well as varying the size of

the initial simplex, can let multiple uses of fminsearch give increasingly better scores.

For more information, please see

http://www.mathworks.com/help/matlab/ref/fminsearch.html?requestedDomain=www.mathworks.com#moreabout

It might also be good to break the optimization into 2 layers,

like nesting fminsearch for the xyz coordinates inside fminsearch

for the Euler angles. The inner layer varies the xyz translation

coordinates and is done more often. The outer layer varies the

Euler angles and is done less often. Varying the translations more

often makes sense because the translations are just addition

operations (a faster calculation) while the rotations involve

multiplying 3x3 rotation matrices by many sets of xyz coordinates

(a slower calculation).

I would not recommend graphing the intermediate results. Just show

the protein's position for the best score at the end of the

optimization.

Say the protein starts with all 3 Euler angles equal to 0 degrees.

Since the Euler angles give the same results every 360 degrees,

you can say the starting Euler angles are p360,q360,r360, where

p,q,r are all nonzero integers (1,2,3,4,etc.). For 3 variables,

fminsearch will make its first simplex of 4 points as follows:

```
(1) p360,q360,r360 & its score
(2) p378,q360,r360 & its score
(3) p360,q378,r360 & its score
(4) p360,q360,r378 & its score
```

This initial simplex sets the range that fminsearch will explore

for each variable. Using different values for p,q,r will explore

different ranges of angles. Small p,q,r gives a more local search.

Large p,q,r gives a more global search. For example, if p=1, the

initial simplex uses the angles 360 and 378 degrees (equivalent

to 0 and 18 degrees). Also, if p=10, the initial simplex uses the

angles 3600 and 3780 degrees (equivalent to 0 and 180 degrees).

For general p, the angles are p360 and p378 degrees. p360 is

equivalent to 0 degrees. p378 is equivalent to p360+p18 or

p18 degrees. This gives the effective angles below, which repeat

for p=21-40,41-60,61-80,etc.

```
p p378
1 18
2 36
3 54
4 72
5 90
6 108
7 126
8 144
9 162
10 180
11 198
12 216
13 234
14 252
15 270
16 288
17 306
18 324
19 342
20 0
21 18
```

Say the unit cell for the electron density cloud

has x=0-60, y=0-80, and z=0-100 and the protein's

center starts at (x,y,z)=(40,60,20) in the same

coordinate system. Since the unit cell repeats

every 60 units in the x-direction, 80 units in

the y-direction, and 100 units in the z-direction,

the initial position of the protein's center is

equivalent to (x,y,z)=(40+60p,60+80q,20+100r);

that is, the initial position of the protein's

center will give the same electron density score

for any set of integers p,q,r.

If we keep p,q,r as nonzero positive integers

(1,2,3,4,etc.), the initial simplex for the xyz

coordinates for the protein's center will be as

follows:

```
(1) 40+60p,60+80q,20+100r & its score
(2) 42+63p,60+80q,20+100r & its score
(3) 40+60p,63+84q,20+100r & its score
(4) 40+60p,60+80q,21+105r & its score
```

If p=1, the initial simplex uses the x values

100 and 105 (equivalent to 40 and 45). Also,

if p=10, the initial simplex uses the x values

640 and 672 (equivalent to 40 and 12). For

general p, the x values are 40+60p and 42+63p.

40+60p is equivalent to 40, and 42+63p is equivalent

to 42+60p+3p or 42+3p. This and similar logic gives

the effective x,y,z values below, which repeat for

n=21-40,41-60,61-80,etc.

```
x=42+63n y=63+84n z=21+105n
x=42+3n y=63+4n z=21+5n
n x=0-60 y=0-80 z=0-100
1 45 67 26
2 48 71 31
3 51 75 36
4 54 79 41
5 57 3 46
6 0 7 51
7 3 11 56
8 6 15 61
9 9 19 66
10 12 23 71
11 15 27 76
12 18 31 81
13 21 35 86
14 24 39 91
15 27 43 96
16 30 47 1
17 33 51 6
18 36 55 11
19 39 59 16
20 42 63 21
21 45 67 26
```

If the electron density (ED) were not periodic,

as might occur if the ED were only nonzero within a box covering

x=0-60, y=0-80, and z=0-100, for example,

there is another trick you can do with fminsearch

to control the size of the initial simplex, as below:

Say the protein's center starts at (x,y,z)=(40,60,20)

in the same coordinate system as the electron density.

If you feed these coordinates directly into fminsearch,

the initial simplex will be:

```
actual
xyz
(1) 40,60,20 & its score
(2) 42,60,20 & its score
(3) 40,63,20 & its score
(4) 40,60,21 & its score
```

If you instead shifted all coordinates by +100 units outside

of fminsearch and then shifted them back by -100 units before

evaluating their score, you'd get for the initial simplex:

```
shifted actual
xyz xyz
(1) 140,160,120 40,60,20 & its score
(2) 147,160,120 47,60,20 & its score
(3) 140,168,120 40,68,20 & its score
(4) 140,160,126 40,60,26 & its score
```

Next, if you shifted all coordinates by +400 units outside

of fminsearch and then shifted them back by -400 units before

evaluating their score, you'd get for the initial simplex:

```
shifted actual
xyz xyz
(1) 440,460,420 40,60,20 & its score
(2) 462,460,420 62,60,20 & its score (x goes outside 0-60 here, where ED is zero)
(3) 440,483,420 40,83,20 & its score (y goes outside 0-80 here, where ED is zero)
(4) 440,460,441 40,60,41 & its score
```

Finally, if you shifted all coordinates by -80 units outside

of fminsearch and then shifted them back by +80 units before

evaluating their score, you'd get for the initial simplex:

```
shifted actual
xyz xyz
(1) -40,-20,-60 40,60,20 & its score
(2) -42,-20,-60 38,60,20 & its score
(3) -40,-21,-60 40,59,20 & its score
(4) -40,-20,-63 40,60,17 & its score
```

As you can see, the size of the shift controls the size of the initial simplex,

and the sign of the shift controls which direction the simplex will explore.

If you can get the "Center Protein on Density" button

to work better, perhaps as detailed above, it would

be nice if a player could select certain segments

first and then have the "Center Protein on Density"

button optimize the protein's position & orientation

as if the selected segments were the only scoring

segments for the entire protein. This way, if a player

was sure about the structure of a certain section

of the protein, say segments 1-20 and 45-90, he/she

could select just those segments, then press the

"Center Protein on Density" button to find the

position & orientation that gives the best score

for segments 1-20 and 45-90 only.

align protein to density has always been a waste of time - best it is removed as Flat was gonna do years ago. Best to ignore this option.

I had my protein arranged outside of ED; hit Align button, and it flipped my protein so it was 180 degrees off from where I had it. It also translated it into the ED, which I expect, but I had to rotate every piece 180 degrees and move it to the other side of the cloud to correct it.