of a short segment
(or its reverse complement
), usually towards its 3' end
. ESTs may be produced a sequence
ly and rapid
ly, by reading them once. An EST is around 400bp
long, and in the absence of alternative splicing
is supposed to contain enough information to identify
the RNA molecule from which it originated.
A public database of ESTs, dbEST, contains > 2 * 106 ESTs.
Problems with this naïve approach:
- Alternative splicing is more common than was thought.
- Sequence quality deteriorates along the EST; sometimes the dependable bases are few.
- It is not known how many types of RNA exist.
- ESTs may also come from hnRNA (instead of the intended mRNA), and may well be from a non-coding region.
At work, I work on Compugen's LEADS project, which aims (among other things) to reconstruct the original RNA from given ESTs. This is nontrivial.