For fifteen years, gene expression analysis has been becoming faster at a rate almost as great as Moore's Law. When Affymetrix's first publicly available microarray was offered for sale in mid-1997, this method of research took another leap skyward. That first GeneChip, and all of its successors, combine microphotography/microlithography techniques and biological chemistry into a few centimeters of quartz that can analyze up to an entire genome. Combined with the company's chemistry equipment and scanner, a sample can go from biopsied material to bioinformatic data in roughly 24 hours. Just so this won't sound like a press release, the GeneChips themselves start at a grant-breaking US $300 per (with custom arrays starting at $4000 a pop, plus research costs) and a scanner to read them costs in the neighborhood of 150 grand.

Overview of gene expression analysis:

Gene expression analysis aims to quantify the level of expression of some genes given the introduction of a stimulus, drug, hormone, or whatever. In a usual experiment, both a control group and a group exposed to the drug have some of their biological material analyzed side-by-side. This way, a comparison can be made to discern what effects that drug has had on the experimental group's gene expression. With this knowledge, the drug can be refactored, or a new drug created entirely, to hopefully avoid activation of genes which cause nasty side-effects.

One way of measuring a gene's expression, the one which the GeneChip and other array methods use, is to convert the sample's mRNA, which has been made directly from various genes (after splicing or polyadenylation), back into DNA. These methods involve the reverse transcriptase enzyme, which reads an mRNA strand and generates its complementary DNA, or cDNA. From there, most methods go on to build cRNA from the cDNA in vitro, using amino acids that have been tagged with a phosphorescent or radioactive marker. After that step, it's mostly a question of measuring the amount of this tagged cRNA, and that's where the GeneChip comes in.

How the GeneChip works:

On each 1.28 cm2 chip, there are roughly 500,000 probes, each of which has millions of identical captive DNA strands. Strands are composed of 25 oligonucleiotides each, and are selected to complement the strand in a given gene. For each gene that needs to be measured, between 11 and 16 probes are chosen from all of the possible 25-mers, based on how well they will combine with cRNA strands and on lack of potential cross-matching between them. As a side note, the computational complexity of choosing these probes for, say, a 25,000 gene rat genome is pretty shocking -- there's little doubt that Affymetrix must have some serious number crunching machinery.

In a process known as hybridization, phosphorescent-tagged cRNA is "cooked" with the GeneChip's complimentary DNA in an oven sold as part of the processing package. Hybridization happens overnight, I'm guessing about twelve to fourteen hours of time at 80°C. During the process, each probe picks up much of the ambient cRNA that matches it and bonds into a new double helix, one strand of which happens to be phosphorescent. After this step, and a chemical fixing step not conceptually unlike that used to process film, the GeneChip is ready to be scanned and read from. I couldn't find any specific data, but something a professor of mine mentioned (about losing some of his data to a power outage every three or four months) leads me to believe that, once hybridized and fixed, a GeneChip must be kept below a certain temperature so it will not denature and lose its contents.

After this is done, the GeneChip must be scanned to get the experimental data off of it and into a usable form. As of this writing, two scanners are available, one built and branded by HP and the other built by HP and branded Affymetrix. The former is the older of the two, has lower resolution and needs an external laser feed which must be cooled and hand-tuned every few scans. Both use green (570 nm) laser light to fluoresce the markers, though in the latter scanner the light is generated by an internal laser diode. A scan takes between three and five minutes, and has 16 bits of dynamic range.

Affymetrix software takes the data file generated by the scanner, which is essentially an enormous high-res image of half a million little green dots, and generates data in a flat file from it. For each gene, statistical analysis is done and its relative amount of activation is uncovered. The system also supports running experimental and control cRNA on one chip, where each sample has markers that phosphoresce at a different wavelength. In this way any differences between experimental and control subjects can be immediately identified and grouped together, without a perl script acting as intermediary. Plus, running this way cuts grant expenditure firmly in half :-)

Notably, each probe on the GeneChip also has a near-identical probe with strands which have the middle nucleotide, the thirteenth on the strand, changed. This is referred to as the Perfect Match / Mismatch probe strategy, and accounts for any cross-hybridization problems not taken care of in probe selection. In any cases where it's not known whether the mRNA is from sense or anti-sense DNA, both probes are generated and used -- with 500,000 probes available and most genomes having well under 40,000 genes, there's some room for error.

Manufacturing and design:

(This is probably boredom central to the biology heads out there, but is pretty damn interesting when compared with the remarkably similar methods used to make the silicon in processors and whatnot.)

After a probe set has been designed in software, it needs to get on to the GeneChip to be of any use. Traditional gene expression analysis brews up the cDNA and distributes it on a membrane by hand or possibly a glass slide mechanically. To fit an entire genome, with redundancy, in such a small space, Affymetrix chose to use photolithographic techniques borrowed from processor manufacture. That is, instead of dripping anything anywhere, they build the cDNA layer by chemically doped layer, much the same way pathways and semiconductors are etched into a slab of silicon valley art glass.

GeneChip probe arrays begin life as a thin wafer of quartz, notably made of SiO2 rather than the unoxidized Si used in computer chips. This is washed in silane, which reacts with it to form the base layer of what will become billions of individual strands. Also, light sensitive linker molecules are attached just after this step, such that when exposed to light they will allow another linker molecule to bond to them. To begin the actual cDNA synthesis, a mask is placed over the wafer with holes cut where the arbitrary first nucleotide (say, atropine) should attach to a probe. Ultraviolet light is shown through the mask, and then the wafer is submerged in a solution containing atropine nucleotides with appropriate linker molecules bonded to both sides. These attach to the exposed linker molecules, and bingo, a single nucleotide strand of cDNA. The process continues with all of the other nucleotides, and is repeated with different masks until all 25-mer strands have been generated.

This would seem to need 4 nucleotide types * 25 nucleotides long = 100 steps of exposure to complete, but Affymetrix has crunched some more data and lowered that number. Algorithms are used during probe selection to keep probe growth rates roughly matched during this step, and to figure out when masks can be used twice for the same or different nucleotides (!).