ACM Communications magazine of January has an
article, written from a team of bio-researchers at the Pacific
Northwest National Laboratory, about storing information within DNA
sequences of... living organisms! The main purpose of the
research was to find a way to protect vital information in case of a
major nuclear catastrophe. So, given the fact that some bacteria
like the very common Escherichai coli and Deinococcus, can endure
something like 1000x more radiation than humans can, and can also
survive extreme environment conditions (ultraviolet, desiccation,
partial vacuum), they are very good candidates to be used for
information retrieval in case of a large-scale nuclear accident or
war or catastrophe or alien attack or whatever.
Any information, in order to be represented and then saved, must be
somehow encoded. Even when we leave a note on the door: "I'll be back in
4 minutes", we use the 'English language' representation (which uses the
Latin alphabet). Since DNA has four basic building units called
deoxyribonucleosides (A (Adenine), C (Cytosine), G (Guanine),
and T (Thymine)), these must be the "bits" of our information
representation. These for bases form pairs, and more specifically,
Adenine pairs with Thymine (AT) and Cytosine with Guanine (CG). The
researchers developed a simple encoding scheme in order to represent the
Latin alphabet plus some other basic symbols using sequences of three
bases (triplets). Below is the encoding scheme:
AAA: 0 | AAC: 1 | AAG: 2 | AAT: 3 | ACA: 4 | ACC: 5 | ACG: 6 | ACT: 7
AGA: 8 | AGC: 9 | AGG: A | AGT: B | ATA: C | ATC: D | ATG: E | ATT: F
CAA: G | CAC: H | CAG: I | CAT: J | CCA: K | CCC: L | CCG: M | CCT: N
CGA: O | CGC: P | CGG: Q | CGT: R | CTA: S | CTC: T | CTG: U | CTT: V
GAA: W | GAC: X | GAG: Y | GAT: Z | GCA: SP| GCC: : | GCG: , | GCT: -
GGA: . | GGC: ! | GGG: ( | GGT: ) | GTA: ` | GTC: ‘ | GTG: “ | GTT: "
TAA: ? | TAC: ; | TAG: / | TAT: [ | TCA: ] | TCC: | TCG: | TCT:
TGA: | TGC: | TGG: | TGT: | TTA: | TTC: | TTG: | TTT:
These triplets can be used in order to encode any English text, in much the same way that computers use the binary digits 0 and 1. Of course, I guess that we could use a slightly more complex encoding scheme using 256 different triplets (or tetraplets) in order to represent all byte values with obvious advantages.
One of the best parts of this new storing
technology, is that the information inserted in the DNA sequences of the
hosts like the bacteria, remains intact for hundreds of generations
and, possibly, more. The technological background to achieve this, was
developed by God Laboratories, during the last million years, in a
try-and-error study that produced efficient mechanisms to detect and
correct errors caused by random mutations in the DNA of living
organisms. The researchers of PNNL (Pacific Northwest blah blah), said:
"With the extremely efficient DNA repair
mechanisms associated with Deinococcus, we did not detect any mutations
in our experiment in which we retrieved the DNA after the bacteria that
carried the message was allowed to propagate for about a hundred
generations."
And also the storing potential of such technology
is awesome: If we consider, the scientists say, that a litre of liquid
can containg up to 1012 bacteria, it is clear then that the
storing capabilities are enormous. They do not say though, how much
information can each bacterium hold, and me not being a biologist cannot
know the details, but even if each bacterium could hold just one single
bit, then a litre of water could store 1 Terabyte... Who needs those
damn 720Kb floppies anymore!
Even best is the fact that information stored within
DNA sequences of living organisms does not need backups! Since the
organisms reproduce, they actually create backups themselves and spread
around!
Potential Applications
Bibliography
ORGANIC DATA MEMORY Using the DNA Approach, January 2003/Vol. 46, No. 1 COMMUNICATIONS OF THE ACM