TOPS diagram (thing) by The Alchemist

A more rigorous representation of protein topology than a TOPS cartoon. Instead of a fully planar graph, the diagram is linear:


                       _______H:A_______
           __H:P__    /                 \
          /       \  /                   \
         /         \/         ____      __\___       ____
        /\         /\        /    \    \      /     /    \       __
|\ | __/  \_______/  \______/      \____\    /_____/      \____ /
| \|  /    \     /    \     \      /     \  /      \      /     \__
     /______\   /______\     \____/       \/        \____/
                                \                     / 
                                 \_______C:R_________/

...and the labelled edges are hydrogen bonds (H:P or H:A for parallel or anti-parallel) and chiralities (C:R and C:L for left and right). The backbone is the straight line throught the center - obviously, some information has been lost.

Indeed, diagrams are not much use as a visualisation tool (which is what the cartoons are for). Instead, they are a type of graph (an ordered, directed one). This means they are amenable to subgraph isomorphism matching and therefore machine learning techniques. A generalisation of the TOPS diagram is the pattern.

An even more simplified representation is the string form, where strands are 'E' or 'e' (up and down) and helices are 'H' and 'h'. Edges in the graph are denoted by integer pairs, separated with a ':' and terminated by a label. For the example graph shown above, this is:

Example NEEHehC 1:2P2:5A3:4R

While this representation has the disadvantage of being even more unreadable than the linear graph, it is compact. This means that storage of tens of thousands of protein secondary structures in one file is possible.

TOPS cartoon	Universal Syntax	machine learning	isomorphism theorems
Aphorism	Heroin