"PubGene" is the result of a study into whether data on gene-gene interactions could be "mined" from gene names found in journal article abstracts in the Medline literature database. The basic idea is that if two or more genes are referred to in the same article (or actually in the abstract/keywords), there is a high probability they have a meaningful biological interaction. The database of connections produced was tested against several experimental data sets: it correctly predicted 51% of the interactions contained in the Database of Interacting Proteins and 45% of the interactions contained in Online Mendelian Inheritance in Man. While this may not seem very reliable in practical terms, statistically it's very noteworthy. The PubGene data was also compared to gene expression data from DNA microarray experiments, and correctly predicted a number of gene interactions from different human cell types.

The authors of the study openly admit that PubGene currently has limited practical use. Some of the current stumbling blocks which produce errors are inconsistent gene nomenclature, with the same name referring to different genes and different names for the same gene, and the fact that not all Medline abstracts and keyword lists contain gene names.

Sources:
Masys, D.R. (2001) Linking microarray data to the literature. Nature Genetics 28(1): 9-10.
Jenssen T-K. et al. (2001) A literature network of human genes for high-throughput analysis of gene expression. Nature Genetics 28(1): 21-28