Shotgun Sequencing


Shotgun sequencing is a method of whole-genome sequencing that is particularly suited to high-throughput, assembly-line style methodology, allowing the entire genomes of organisms to be sequenced relatively rapidly. It is much quicker than other techniques such as chromsome walking (which was used by the International Human Genome Consortium). This method made the news when Celera Genomics used shotgun sequencing to complete its draft sequence of the three billion bases of the human genome in only three years, although there were claims that Celera had used publicly available genome data to construct its pay-to-view, annotated sequence.


How Shotgun Sequencing Works


In a shotgun sequencing experiment, the first step is to create several genomic libraries. A genomic library is a set of bacterial colonies (or YACs, or cosmids), each of which contains a small piece of the human genome. Together, the colonies contain the entirety of the genome. The reason this must be done is that automated Sanger sequencing, the preferred method, is limited to sequence reads of several hundred base pairs at the most, after which accuracy drops off dramatically. It would be nice if we could start at the beginning of the genome and read until the end, like a book, but for now we must rely on these techniques.

To create these libraries, restriction enzymes are used to cut the genomic DNA of the desired organism into thousands of pieces of smaller DNA, which are then cloned into bacterial plasmids and transformed into bacteria (typically e. coli). The experimenter now possesses thousands of distinct bacterial colonies, each with a small piece of the human genome. Several such libraries are made with different restriction enzymes to achieve overlapping coverage of the genome.


Now, each of the genome pieces of the library is individually sequenced and catalogued, usually using robotic equipment. After all the sequencing is finished, the result is a database containing thousands of sequences. High-powered computers match up the sequences by looking for overlaps in the sequences, eventually building thousands of basepairs of contiguous sequence called a contig. These contigs are then matched together using various methods to provide a draft sequence.

This approach of sequencing the genome in thousands of sections is evocative of the hundreds of pellets in a shotgun shell, giving rise to the name.


Problems with Shotgun Sequencing


A serious problem with shotgun sequencing is the prevalence of repeat DNA. A significant proportion of eukaryotic genomes is composed of repeated ‘junk’ DNA. These repeats range from tens to thousands of base pairs in length, and the number of repeats may also vary widely. Because a stretch of repeat DNA is composed of identical subunits, it is impossible to determine from the overlap of sequence reads how many repeats there are. In addition, many repeats are widely distributed throughout the genome, further complicating attempts to nail down their location and extent. These problems can be worked around by more traditional genetic methods, such as recombinant analysis.

Log in or register to write something here or to contact authors.