Category of computer-aided human translation
(CAHT or CAT) software. In its most basic form, a database
application which stores previously translated strings
in the source and target languages and parses new source documents and retrieves parts which have been translated previously.
Most fully-fledged TM programs perform segmentation of a source text on the basis of individual sentences (the identification of delimiters is obviously important and varies between languages), and offer glossary management and concordancing facilities, as well as providing a working environment for the translation of the remaining text with automatic propagation of internally repeated text, and offering facilities to align existing translations to generate memory databases. For obvious commercial reasons, most tend to work with at least Microsoft Word and HTML files as well as plain text, allowing source document formatting to be replicated painlessly, at least as long as it was done competently in the first place.
Clearly the suitability of these tools varies with different types of translation work. They are best for documents such as technical specifications and manuals where many similar texts exist, but are also dead handy for bureaucratic, legal and paralegal documents where boilerplate text is used, and other documents which may be subject to multiple revisions.
Their main drawbacks are a tendency to push the translator into mimicking the form (particularly the sentence structure) of the source text, and the facility with which they cause errors to proliferate, particularly if they are let loose in unskilled hands. And, fundamentally, the best translation of a given sentence is not the same in all circumstances.
Leading programs (almost all exclusively for Windoze; the translation business is generally tied to the customer's choice of file formats, unfortunately):
- Trados: heavy Microsoft investment; uses Word as an interface mechanism. De gustibus non disputandum est.
- Déjà Vu: plain database grid interface, best choice of source file formats, best tech support, somewhat quirky; an eternally forthcoming release, DVX promises substantial improvements, including the ability to run under WINE.
- IBM Translation Manager/2
- Star Deluxe
- Cypresoft TransSuite 2000
- Winfast: a Word-based macro package, also usable on a Mac.
- OmegaT: an open source Java program, probably the best bet at time of this update (December 2002) for Linux, although it does not yet handle the ubiquitous Word format directly.
An open standard - TMX, translation memory exchange - has been developed for exporting the valuable memory databases between several of the above program's different formats.
Although these programs are very distinct from Machine Translation, there are trends in corpus-based MT (example-based MT) which seem likely to cross over to some extent, while TM programs increasingly offer MT-like use of smaller text units and fuzzy matches.