A controlled vocabulary
is a system which organizes words
sets and picks one as an exemplar
or "preferred term". The other, non-preferred strings are often called "use-fors". Thesauri
are controlled vocabularies.
Controlled vocabularies are used for document indexing and retrieval (important components of Document Management). When new documents are added to a repository, they are tagged (usually by humans, but sometimes automatically or semi-automatically) with the preferred terms for the topics the document is about. This tagging step is also called indexing. At retrieval time, the retriever can pose their query using whatever words they wish, and the controlled vocabulary is used to translate it into one that only uses preferred terms. This makes matching documents faster.
Because of this way controlled vocabularies are used, there are certain restrictions that most developers impose on them.
- No two synonym sets can have the same Preferred Term.
- A single "use for" (alternate term) may not be in two different synonym sets.
The first rule prevents imprecision at retrieval time. The second prevents imprecision at indexing time. It's more controversial though, because words are clearly polysemous
(have more than one meaning