A more rigorous representation of

protein topology than a

TOPS cartoon. Instead of a fully

planar graph, the diagram is linear:

_______H:A_______
__H:P__ / \
/ \ / \
/ \/ ____ __\___ ____
/\ /\ / \ \ / / \ __
|\ | __/ \_______/ \______/ \____\ /_____/ \____ /
| \| / \ / \ \ / \ / \ / \__
/______\ /______\ \____/ \/ \____/
\ /
\_______C:R_________/

...and the labelled edges are

hydrogen bonds (H:P or H:A for parallel or anti-parallel) and

chiralities (C:R and C:L for left and right). The backbone is the straight line throught the center - obviously, some information has been lost.

Indeed, diagrams are not much use as a visualisation tool (which is what the cartoons are for). Instead, they are a type of graph (an ordered, directed one). This means they are amenable to subgraph isomorphism matching and therefore machine learning techniques. A generalisation of the TOPS diagram is the *pattern*.

An even more simplified representation is the string form, where strands are 'E' or 'e' (up and down) and helices are 'H' and 'h'. Edges in the graph are denoted by integer pairs, separated with a ':' and terminated by a label. For the example graph shown above, this is:

Example NEEHehC 1:2P2:5A3:4R

While this representation has the disadvantage of being even more unreadable than the linear graph, it is compact. This means that storage of tens of thousands of protein secondary structures in one file is possible.