Can Classification Systems
such as the Library of Congress Classification System
(or other ontologies
) possibly classify all documents
? Would it be useful to classify
? I used to think
so, but now I'm not so sure:
Consider a classification system that denoted whether a document used humor, and further, whether or not the humor was funny. Consider an author writing a piece of humor which relied entirely for it's humor on being classified as being not funny. If classified as funny, the humor fails and the document is mis-classified; if classified as not funny, the humor succeeds and the document is mis-classified. Either way, the document is mis-classified.
Such classification systems
exist and are useful in the real world
---consider for example the newsgroup rec.humour.funny
, a moderated newsgroup
which tries to carry only funny humour
jokes have been been attempted (by myself) and submitted, but without response from the moderators (who must judge the humour of the joke
Cathy suggested that this apparent paradox can be resolved because the joke is impossible to construct as it contains an internal paradox (i.e. it's only true when it's false). The problem with this argument is that jokes are a literary form which has no requirement internal consistency, indeed many famous examples (much of Lewis Carroll's works for example) contain many internal contradictions.
This involves trying to prove either:
- That every document can be classified using the Library of Congress Classification System, or;
- That not every document can be classified using the Library of Congress Classification System.
There are a number of approaches to this:
Consider a document that described completely a classification scheme that apparently identical to the Library of Congress Classification System (but without any direct or indirect reference to the Library of Congress Classification System) but also asserted that a referenced document had proved the classification scheme was incomplete. This document is then submitted to the Library of Congress (LoC) for classification. If the Library of Congress Classification System is incomplete, then the classification scheme described is the Library of Congress Classification System and the document is classified with the Library of Congress Classification System materials (``Classification, Library of Congress''---Z696) . It the Library of Congress Classification System is complete, then the classification scheme described is not the Library of Congress Classification System and the document is classified elsewhere (probably under ``Subject cataloging''---Z695).
Unfortunately this approach is flawed
in that it assumes that the LoC
(or anyone else) always correctly classify documents, which is known to be untrue.
Consider a document that who's subject was the fact that the document was mis-classified in the Library of Congress Classification System.
The Library of Congress Classification System
has no difficulty classifying this document, because it is not making judgments about the relative truth of the contents of a document and the document is clearly about the Library of Congress Classification System
, so it is classified with Library of Congress Classification System
Should all documents be classified?
Consider a new document that is sufficiently metaphorical and allusionary that it could be about anything. Any assignment of subject classification by a classifier to the document instantly places that subject at the forefront of a readers mind when interpreting the book, thus the classifier biases all subsequent readers of the document.
The correct classification
is not under ``Metaphor'' or ``Allusions'' (both valid Library of Congress Classification System
classes) because these classifications are for documents that are about metaphor and allusion, not documents that use metaphor and allusion. The document could be about metaphor and/or allusion as well as using metaphor and allusion, but as previously stated they could equally well be about anything.
If the document remains unclassified, then it is largely inaccessible to library users, since much searching and browsing is performed by subject---this is certainly true of new, recently published works by unknown authors.
A document with an associated classification is a different document to one without and this classification can have a profound influence on the documents interpretation---this is made concretely true by the inclusion of cataloging-in-publication data in many modern books.
The work describing how to catalog and classify using the Library of Congress Classification System
is ``Subject Cataloging Manual: Classification''
which includes a section ``General Principles of Classification'' listing the 8 principles. Of 8 principles, all require interpretative evaluation (are not clear, simply and directly implementable using computers as we know them), 6 refer to large external schedules and several use terminology such as ``intent of the author,'' ``influence'' and ``appropriate'' without clear definition.
A more significant problem in attempting to prove the Library of Congress Classification System complete or in-complete is that in ``Subject Cataloging Manual: Classification'' F10 page 2 gives ``Generally Principles of Classification'' and states:
7: Unless instructions in the schedules or past practice dictate otherwise, class works on the influence of one subject on another with the subject influenced.
Any deliberately written pathological document
(a document written to cause problems) which couldn't be classified
using the normal rules could be classified with the Library of Congress Classification System
materials using this rule. Undoubtedly human classifiers
have the capability to detect pathological document
s (trying giving a self-referential
text about classification
to a classifier sometime). It is not clear, however, whether a computer can be programmed to be a complete detector of deliberately written pathological documents