Back to Recipes Homepage
recipe picture
Recipe: Atom Tables 1.21
Created by brow42 45 1456
Your rating: None Average: 3.8 (4 votes)


Name: Atom Tables 1.21
ID: 41664
Created on: Mon, 05/14/2012 - 23:21
Updated on: Sat, 10/24/2015 - 16:33

Collection of functions to identify donors, acceptors, and polar hydrogens by atom number. 1.21 fixed error in hist table.

Best For


brow42's picture
User offline. Last seen 2 years 1 week ago. Offline
Joined: 09/19/2011
Groups: None
New and Improved

This recipe is a child of which got unshared.

This is not a recipe that does anything, is a table of data and functions for accessing it. Using the table and functions, you can identify which atom is the first in the side chain; which atoms are donors, acceptors, or polar hydrogens; whether a residue is the first or last residue in a peptide chain; and whether a cysteine has a disulfide bond.

Changelog and API:

* Version 1.1 Brow42 May 14, 2012
* CountAtoms renamed to _CountAtoms
* CountAtoms is now a wrapper for structure.GetAtomCount
* GetCount renamed to GetExpectedCount and does not auto-lookup flags
* IsTerminal is now called by the various Get* functions if isFirst = nil
* A small table of nominal atom counts has been added to the top of the file, and the functions
* needed to determine if the AA is bonded also moved to the front and only require this small
* table. This is so if all you need is IsTerminal, you don't need the entire library.

* API (all in the fsl.atom namespace)

* Many functions accept option 2 or 3 boolean arguments, which indicate
* if the segment is the first and/or last segment in a polypeptide.
* The third argument boolean is true if the segment is disulfide bonded.
* These arguments can be replaced by a call to IsTerminal(). If omitted,
* IsTerminal will be called automatically, except for GetExpectedCount,
* which is used to modify the db value.
* Most functions accept either a segment number or AA code as the first argument.

* CountAtoms() and _CountAtoms() are the only useful functions for ligands
* ('M' structure). Other functions will return nil if passed a ligand segment.

* _CountAtoms(number iSeg) -- SlOW manual count all the atoms in the segment. You should use structure.GetAtomCount instead.
* GetExpectedCount(number or string iSeg, isFirst, isLast, isDisulfideBonded) -- total EXPECTED atoms in segment or AA, given flags, default false
* IsTerminal(number iSeg) -- returns bool,bool if start or end of a polypeptide
* GetBackboneHeavyAtoms(number or string iSeg, isFirst, isLast) -- range of atoms on the backbone
* GetSidechainHeavyAtoms(number or string iSeg, isFirst, isLast) -- range of atoms on the sidechain (nil for glycine)
* GetDonorAtoms(number or string iSeg, isFirst, isLast) -- list of donor atoms
* GetAcceptorAtoms(number or string iSeg, isFirst, isLast) -- list of acceptor atoms
* GetPolarHydrogens(number or string iSeg, isFirst, isLast) -- list of polar hydrogen atoms
* _IsDisulfideBonded(number iSeg) -- true if bonded to another cysteine
* _IsTerminalTest(aa,count,disulfide) -- If already called CountAtoms(), don't call IsTerminal(), call this instead.
* _NumberOrCode(number or string iSeg) -- performs a table look up given segment number or AA code.
* Test1() -- Count all atoms, compare with table, perform terminal segment check, for all segments
* Test2(mode) -- Band all atoms of a particular class. mode = sc, bb, donor, polar, acceptor
* db -- Reference table of atom numbers for each class. Should not be accessed directly since bonding changes atom counts.
* atomcount -- table of just the expected total atom counts

brow42's picture
User offline. Last seen 2 years 1 week ago. Offline
Joined: 09/19/2011
Groups: None
Version 1.2

* Version 1.2 Brow42 April 22, 2014
* Added Functions:
* GetAtom(number seg, number atom, isFirst, isLast, isDisulfideBonded) -- returns string,string
* first is O,C,H,S or nil (if not a standard amino acid, a ligand, or proline atom 8)
* second is a,d,b,p,n,v = acceptor, donor, both, polar h, non-polar, virtual atom
* GetHydrogenAtoms(number or string iSeg) -- list of hydrogens (polar and non-polar)
* Test3() -- mutate to all 20 AA and call GetAtom on all the atoms.
* Corrected first hydrogen of proline to be 9.
* Note that atom 8 of proline is not a real atom (although it is very close to atom 1)

This update adds the GetAtom function, which tells you the element of any atom, and if it's a donor or acceptor, all in one function call.

Some usage notes:

A reminder: The three arguments isFirst, isLast, isDisulfide are optional. If you pass in a segment number, the script will automatically determine these flags from the protein if you leave them all blank. If you pass in an amino acid letter, it will assume false if you leave them all blank.

To just find out the element:

ele = fsl.atom.GetAtom(seg,1) -- atom 1 is always a nitrogen

To find out if it's a donor or acceptor (or polar hydrogen):

_,role = fsl.atom.GetAtom(seq,1) -- _ is the customary placeholder for un-needed values
donor = role == 'd' or role == 'b'
acceptor = role =='a' or role == 'b'

Another reminder: Use GetDonorAtoms/GetAcceptorAtoms if you want to find donors and acceptors instead of looping over the atoms and calling GetAtom!

Special proline shenanigans:

Atom 8 in proline an imaginary is near but not exactly on top of atom 1. If you call GetAtom on this atom, it will return nil,'v' (no element, v for virtual). Don't let that nil crash your script!

A final reminder:

If you pick a reside that is the last residue in the peptide, all the sidechain and hydrogen atoms shift up by one. If you pick a residue that is the first residue in the pepide, all the hydrogens shift up by 2. So, for example, if a proline is the last residue, then everything I said about atom 8 actually applies to atom 9 for that residue! If you are looking for a specific atom, use IsTerminal() to find these cases and adjust your atom number accordingly!

Joined: 09/24/2012
Groups: Go Science
Impressed !

I'm really impressed by all the science behind this recipe! For me simple player, it's quite difficult to understand. Could you give some examples of possible further uses of this recipe (or of some of the functions it contains)?

Is it the idea to scrip other recipes using this info to, for example, find out opportunities of selective banding?

brow42's picture
User offline. Last seen 2 years 1 week ago. Offline
Joined: 09/19/2011
Groups: None
Example Uses

Yes, this is a library to help other scripts do bonding. Ordinary scripted bands connect the center carbons, so they can pull segments together but won't twist them to make the bond. For that, you need to band the actual bonding atoms.

The donors and acceptors on the backbone have well defined atom numbers, but everything else depends on the sidechain. The library tells you which atoms are what type.

I usually use this information to bond crooked sheets without straightening them, or bonding sidechains to ligands. I usually do this in atom view (ctrl-shift-V) but that is very difficult and confusing for me, and time consuming.

Sometimes I make a script band all possible acceptor-donor pairs. Then I delete the ones I don't want. It's much faster than making the bonds by hand.

You can try to use the distance between the donor and acceptor to guess if they are bonded or not.

A complicated, shared script that does this is Sespis-2 Bonding . Like Contact Enforcer, it uses bands to pull sidechain and backbone atoms to bondable atoms on the sepsis sugar molecule and adjusts them until they seem to be bonded.

A simpler shared example is H Bonds 532 (Beginner Flu Puzzle) (uses Atom Tables 1.0). This script bands bondable atoms in selected_segments (the "ligand") to all matching atoms on nearby segments. Then it it optionally deletes them all except for the shortest band, for each ligand atom. (there's no way to get the distance without making a band) I usually band hydrogens to acceptors instead of donors to acceptors. It's just a personal preference.

Joined: 09/24/2012
Groups: Go Science
Great !

I know and I used the impressing sepsis-2 Bonding

brow42's picture
User offline. Last seen 2 years 1 week ago. Offline
Joined: 09/19/2011
Groups: None
How it works

The script works by assuming a default numbering and then looking up special cases, like donor atoms or terminal residues. This is how foldit numbers atoms:

First, the backbone heavy atoms starting at 1: N, C-alpha, C-prime, O. Then, all of the sidechain heavy atoms starting at 5. If there's a branch, do both atoms of the branch before going farther. Finally, number all the hydrogens attached to the heavies, starting with the hydrogens on the backbone N, in the same order as before.

Now the exceptions:

Some sidechain atoms are not carbons, so record those numbers and elements.

Some sidechain atoms are acceptors or donors or both, so record those numbers. Same with the polar hydrogens. In general, acceptor/donors are those non-carbon atoms and polar hydrogens are attached to donors.

When amino acids bond, an O and 2H break off to make a water, so the ends of the chain have more atoms. At physiological pH, the backbone N has 3 polar hydrogens and the backbone C-prime has an O (double bond) and and O- ion. The O- ion is lost when the C-prime bonds to the N of the next residue.

At other pH, the O- might acquire a H and the N might lose an H (not necessarily at the same time). The same thing might happen to the donor and acceptor atoms in in the sidechains (this is why some are charged). Atom Tables doesn't handle pH changes, only the default hydrogens.

Cysteines lose the hydrogen attached to the sulfur when they form a disulfide bond.

Glycines don't have a sidechain, so the polar hydrogens attached to the N follow immediately after the backbone heavies.

Prolines are strange; they loop around and bond to the backbone nitrogen. But rosetta can't support this bonding pattern. Instead, the last sidechain atom, 7, is bonded to a virtual atom 8, which is forced to be in the same spot as atom 1, the N. (however strong bands can overcome this constraint, breaking the proline). Also, the sidechain bond uses one of the bonds that would normally be a polar hydrogen. SInce there's usually only one polar hydrogen attached to the backbone N, proline's N is NOT a donor and cannot form an H-bond. The first hydrogen is 9, which is attached to C-alpha. There are no polar hydrogens.

Atom tables lists all the acceptors, donors, polar hydrogens, and non-carbons for non-terminal residues. Everything else is either a carbon (if <= the last sidechain atom), or a non-polar hydrogen (if after the last sidechain atom). If the residue is a terminal, then there are extra atoms and the atom numbers have to be shifted up. This is why you should use the access functions instead of the table directly.

To find termnal residues: compare the total number of atoms to the expected number. If there are 1 or 3 extra atoms, then it has an extra O and it is an O-terminal. If there are 2 or 3 extra atoms, then there are 2 extra H and it is an N-terminal. (isn't it lucky that 1 and 2 are different numbers!) Cysteine has 10 or 11 atoms normally, so for cysteine you have to check the disulfide score (converted to a string) and see if it's '-0' or not. If it's -0, then there are supposed to be 11 atoms.

If the residue is O-terminal, then atom 5 is an O, and the sidechain atoms and all the hydrogens are shifted up by 1. If the residue is N-terminal, then all the hydrogens are shifted up by 2, plus 2 polar hydrogens are inserted to the start of the list.

Since it's a lot of work to do all this, you can pass in previously computed flags, or just set them all to false. You should also use the functions to get lists of atoms which have the necessary offsets applied, rather than looping over the atoms. Also, once you find the terminal flags, you can access the table directly and add in the offsets yourself if you really want to (for atom elements, for example).

I worked this out by making bands to atoms in design puzzles in atom view and CPK coloring. But, here are resources for how things are officially numbered and labeled:
Foldit data dir/cmp-database*/database/chemical/residue_type_sets/fa_standard/residue_types
"Recommendations for the Presentation of NMR Structures of Proteins and Nucleic Acids"

Want to try?
Add to Cookbook!
To download recipes to your cookbook, you need to have the game client running.



Developed by: UW Center for Game Science, UW Institute for Protein Design, Northeastern University, Vanderbilt University Meiler Lab, UC Davis
Supported by: DARPA, NSF, NIH, HHMI, Amazon, Microsoft, Adobe, Boehringer Ingelheim, RosettaCommons