The curse of dimensionality
arises in problems that map an input to an output, for instance, classification problems in supervised learning
An input has a dimension, which is the number of attributes used to define that input. For example, if we are classifying animals, we might use the attributes:
- Number of legs
Thus, instances of animals can be represented as points in a 4-dimensional space. Points which are close together in this space will represent animals which are similar to each other.
If we were to add more attributes, we add more dimensions to the input space, which increases the 'size' of this space. If we were to introduce more attributes to the animal classification, such as:
- Number of apples eaten last Tuesday
which are not particularly relevant to the classification of the species of animal, we are increasing the size of the input space. This could make the 'distance' between two similar instances greater.
If the classification was being learnt by a 'dumb' machine, for instance, a neural network, it would not be able to tell which attributes were irrelevant, and so it will try to use them all when performing the classification. This is not a problem if the learner will only ever see the examples it was shown during its training because they will correspond to things that it already knows, but if the learner is required to generalise to new instance, it may have a problem. For example, if we trained the learner to classify horses by showing it an instance of a horse called Bob, it should be able to classify all horses called Bob. However, if we then showed it a horse that was identical to Bob, except it was called Seigfreid McFadden III (or whatever), this new horse might be a long way from Bob in the instance space and therefore might not be classified as a horse.
This is known as overfitting, where the learner tries to closely represent all of the data presented to it during training, including incorrect or noisy data. Using too many irrelevant attributes (and therefore dimensions) can result in overfitting, hence the curse of dimensionality.
One solution is to add a weight to the attributes, so less relevant attributes are less relevant to the classification. This corresponds to stretching (or shrinking) the axes of the instance space. It is even possible to use non-constant stretching factors so that the distribution of attributes can be controlled. However, it is important to note that by giving the learner more degrees of freedom, such as non-constant stretching factors, we risk increasing the possibility of overfitting.
Another solution is to eliminate the irrelevant attributes (essentially setting the stretching factor to zero) by using a cross-validation approach to determine which attributes are unnecessary to the classifaction.