The ability to distinguish between individuals via their DNA and decode the human genome has revolutionized biology and medical science. Applications such as DNA evidence in court and testing for genes which increase the risk of cancer have already changed the face of their fields. With the knowledge of our genome sequence comes the potential for identifying the proteins which compose our very beings and shape the course of our lives. Only in the last 30 years has the technology become available for these sorts of advances to be possible. It was the Human Genome Project which spurred the refinement of the technology to meet the public project's goal of sequencing the genome within a decade, but it was the combination of advances in a number of fields which contributed to the technology's ultimate success. Of all of the advances biology has made in the past century, the ability to sequence DNA has been one of the most influential.
Although a DNA molecule has two strands, they are complimentary and so only one half of the strand needs to be decoded in order to know the sequence of bases on either side. Likewise, each of the twenty-two autosomes has what should be an identical partner and only one of the two needs to be examined. Obviously, when there are about 3.1 billion bases in the 22 autosomes, two sex chromosomes and the mitochondrial DNA that are examined, the pairing saves a great deal of time and processing power. What is most surprising about this figure is that about 98.5% of it does not code for any genes and is left over from millions of years of evolution. While it seems that most of these sequences could be ignored, and there were numerous calls for only sequencing the "useful" sections of the genome, ultimately all 3.1 billion bases were decoded. (Micklos and Freyer 184)
The first major innovation in DNA technology was the development of the Southern blot (Southern hybridization) by Edward Southern in 1975. With a Southern blot, a person's DNA can be "fingerprinted" and compared to the DNA of their parents or the suspect of a crime for analysis. While this is not a DNA sequencing technique per se, it did allow different peoples' DNA sequences to be compared. In order to perform a Southern blot, a sample of DNA is taken or a small sample is amplified with PCR. After cutting the DNA with one or more restriction enzymes, the fragments are run via electrophoresis on a gel. This gel is treated with hydrochloric acid to make non-specific cuts in the DNA fragments and then NaOH, which disrupts the hydrogen bonds to give single strands of DNA from the original double stranded fragments. The gel is then transferred to a nylon membrane, which is far sturdier than the gel and is easier to work with, and probed using single stranded DNA with a radioactive marker such as 32P in it, or in more recent times a bioluminescent marker. These probes are complimentary to sequences that correspond to a polymorphic locus on one of the chromosomes. Every sequence that corresponds to the probe in the sample DNA will show up on the final "fingerprint", which is taken with X-ray film, and will reveal the location of and number of occurrences of the locus the probe corresponded to. The position of the band on the film is the same as its position on the original gel, enabling comparison between the different samples run in different wells. Generally, the blot is probed a number of times with different probes which correspond to polymorphic loci on other chromosomes to give as many points of comparison as possible. (Micklos and Freyer 163)
The technology for decoding DNA is a fairly recent development. The first two methods for deciphering the genetic code were developed in 1977 by two different groups using different approaches. Allan Maxam and Walter Gilbert at Harvard took the chemical cleavage approach, also known as chemical degradation. Supposedly it can sequence DNA fragments up to about 500 nucleotides in length. Using a polynucleotidkinase, the DNA strand is labeled at the 5' end with 32P or some other radioactive marker. This labeled DNA is then divided up into four identical portions and each portion is exposed to a different reaction which cuts the DNA at either the G, G and A, C and T or C bases and each strand is cut only once. By running the resulting fragments on an electrophoresis gel, the locations of each of the bases in the strand can be determined. C and G can be read easily, while bands that are in A and G and not in G indicate A and bands that are in C and T and not in C are T. The shorter fragments run farther, so the bases at the beginning of the sequence are at the bottom of the gel. (“Sequencing of DNA”)
Fred Sanger concurrently developed the enzymatic method (dideoxy method) which is the precursor to all of the methods used today. When a sample of DNA is exposed to the four deoxynucleotides (dNTPs), dATP, dTTP, dGTP and dCTP, dideoxynucleotides (didNTPs) and a primer in the presence of DNA polymerase and a cofactor such as Mg2+, a new DNA strand will be synthesized using the sample as a template. Using a different reaction mixture for each nucleotide containing the sample DNA, a primer whose sequence is complimentary to the 5’ end of the target DNA, DNA polymerase, the four deoxynucleotides (with the nucleotide one is trying to sequence in that tube being radioactively marked), and the corresponding didNTP, the polymerase will synthesize DNA strands which are complimentary to the sample DNA template. When a didNTP is added to the copied strand, synthesis stops. The result is that millions of copies of the original sample are made, all ending at different points in the sequence and due to the sheer mass of the sample there must be a strand which ends at each of the original nucleotides. Note that because only the didNTP for the nucleotide of interest is in each reaction tube, the fragments in the tube will only terminate at points in the structure where the complimentary nucleotide is the one of interest. (“DNA Sequencing by the Enzymatic Method”)
Adding formamide to the reaction mixtures denatures the copied DNA fragments and then each reaction mixture is run on the different lane of a polyacrylamide gel. As in the chemical cleavage method, the bands run varying lengths depending on size and their placement on the final gel reflects the placement of that nucleotide in the complimentary strand. Once again these gels can be read from bottom to top, originally by a person and later by optical scanners designed for the job. This method of gel reading helped make the human genome project possible, as human and even optical scanning of gels is time consuming and in the case of humans rather error-prone. (Micklos and Freyer 197)
The technique that made automated DNA sequencing a reality was the polymerase chain reaction (PCR), which was developed in 1985 by Kary Mullis. Originally, each fragment of DNA that was to be sequenced was purified and amplified was transformed into a culture of bacteria or yeast. PCR provided a simple and time efficient means of purifying and amplifying a DNA fragment up to 6000 base pairs in length using DNA polymerase. A pair of olidonucleotide DNA primers which have sequences which correspond to either 5’ end of the target DNA sequence are synthesized first. These primers are added to a reaction tube containing the target DNA fragment, a DNA polymerase which can stand high temperatures (such as Taq DNA polymerase), dNTPs, and Mg2+ (a cofactor necessary for efficient enzyme activity). The reaction mixture is then taken through a number of cycles containing a series of three steps, taking only two minutes in total. The first of these steps is denaturation, where the sample is heated up to 94ºC in order to break the hydrogen bonds between the DNA strands and give single-stranded DNA. Next comes annealing at 65ºC where the primers anneal to their complimentary single-stranded DNA fragments. Last is the extension step at 72ºC where DNA polymerase works optimally to extend the primers into copies of the target DNA sequence. In as little as an hour, the DNA sample can be amplified 1,000,000 fold. Without this technique, every DNA fragment to be sequenced would still be grown out in a bacteria or yeast culture and the human genome project would likely still be years from completion. (Micklos and Freyer 192-194)
It was Lee Hood and Lloyd Smith at the California Institute of Technologywho came up with the brilliant idea of associating a dye color with each nucleotide in 1986. When a fluorescent dye is added to the replication reaction of each base pair, it colors the fragments which end in the base in question. All four reactions can then be run on the same lane, with the color indicating which nucleotide is at that point in the DNA fragment. These gels can be automatically read with an argon laser which will cause the fragment to give off light of the characteristic frequency that correlates to the color of the dye as the laser passes over it. (Micklos and Freyer 197) By running all four nucleotides in one lane of a gel, the other three lanes are freed up for three other DNA fragments and the genome can now be sequenced at four times its previous rate.
Today, a modified version of Sanger sequencing known as cycle sequencing is used in automated DNA sequencing. In cycle sequencing, only a small amount of the target DNA sequence needs to be used for sequencing. First, using PCR, the target DNA is amplified in four different tubes. Each tube contains the modified reaction mixture, each with its own fluorescently tagged didNTPs so the necessary fragments of multiple sizes will be generated. These samples can then be combined and run on a gel, then read by a laser, enabling the entire process to be automated. (Knight and Ivor) This automation enables current machines to sequence up to 400,000 nucleotides a day.
Cycle sequencing is still a time-consuming process. Even at 400,000 base pairs a day, one machine working 365 days a year would take almost 120 years to decode one person's genome, which leaves no room for error or machine maintenance. The solution to the speed problem is multiple automated machines and multiple teams working concurrently to decode the parts of the genome which are identical in every human being to ensure the fastest speed with the least error.
Recent machines which do DNA sequencing used gels which can hold up to 36 different samples at a time. According to Joe Balch of Lawrence Livermore National Laboratory of the University of California in 1997, "When cleaning, loading, and running times are all taken into account, it takes between five to seven hours to complete a run. Each sample contains about 500 bases, which means each run of 36 samples yields no more than 18,000 bases." Using currently technology, there are a number of ways you can increase the efficiency of DNA sequencing. One option is simply larger gels and machines, while another is upping the voltage applied to the gels so they run faster. (Balch) These methods have been applied to current technology and successfully increased the machine's capacity, but ultimately new technology will be needed to make significant changes in the speed of DNA sequencing. Gels and machines can only be so large, and the time to clean up and change the samples also has a lower limit.
Ultimately, the goal of genetic sequencing is to increase the speed while decreasing the cost per nucleotide. While the human genome has been finished, as well as those of a few frequently used research animals, the rest of Earth's life remains and the secrets contained within their genetic structure may be the key to curing diseases or preventing them before conception. Without the genome project to drive the development of technology, it seems likely that major refinement in the sequencing process will be somewhat less likely to occur in the foreseeable future. Even so, available technology has made it possible to test for a number of diseases and conditions with a genetic basis and future applications may spur the development of sequencing technologies yet again. Thanks to the brilliant work of a few pioneers, we as a species now have the chance to try and decipher the proteins which can make or break our lives.
Balch, Joe. DNA Sequencing Machine. Lawrence Livermore National Laboratory of the University of California. 6 Dec. 2004 <http://www.llnl.gov/str/Balch.html>.
Current Sequencing Technologies. GeneticEngineering.org. 6 Dec. 2004 <http://www.geneticengineering.org/dna1/23.html>.
DNA Sequencing by the Enzymatic Method. Royal Veterinary College, University of London. 13 Nov. 2000 RVC 6 Dec. 2004 <http://www.rvc.ac.uk/Extranet/DNA_1/6_Sequencing.htm>.
Knight, Ivor and Jonathan Monroe. Cycle Sequencing. James Madison University 18 Aug. 1998 JMU 12 Dec. 2004 < http://csm.jmu.edu/biology/courses/bio480_580/mblab/cycle.html>.
Micklos, David A. and Greg Freyer. DNA Science: A First Course. Cold Spring Harbor, New York: Cold Spring Harbor Laboratory Press, 2003.
Sequencing of DNA. German Human Genome Project. 6 Dec. 2004 <http://www.dhgp.de/intro/strategies/methoden01.html>.
Southern Blotting: Gel Transfer. AccessExcellence.org. 6 Dec. 2004 <http://www.accessexcellence.org/RC/VL/GG/southBlotg.html>.