A form of

instance-based learning used to learn a

classification. For example, assume you have some articles that need to be classified into groups by a machine. You should 'seed' the instance space (an

*n*-dimensional space that contains the articles) with samples that you know the classification of. When you encounter a new instance, you need to assign it a classification based on what you know from the existing samples.

In order to classfiy a new instance, we must define a measure of 'distance' between the new instance and another. Possible metrics are the Euclidean distance or the Manhattan distance.

Given that we have a training set of instances, with known classifications

```
s = ((x
```_{1},y_{1}),(x_{2},y_{2}),...,(x_{m},y_{m}))

a new instance *x* is classified by looking at the *k* nearest examples in *s*.

```
N
```_{k}(x,s) = {the k examples (x',y') in s with x' closest to x}

For classification problems with classes W = {w_{1}, ... , w_{c}}, we take the most common class among the *k* nearest examples.

If *k*=1, we have 1-nearest neighbour. This effectively divides the instance space into convex polyhedra surrounding each instance encountered so far. A new instance takes the classification of the instance whose polyhedra it falls into.

For regression problems where W is the real numbers, we use the average of the *k* nearest neighbours. This is a more general form of the algorithm that can be used to locally approximate continuous functions.

One way to refine the *k*-nearest neighbour learning algorithm is to distance weighting on the neighbouring instances. When we add up the neighbouring instances (either in the discrete classification case, or the contiuous function case), we scale the values by it's inverse squared distance from the new instance. This makes closer instance more relevant to the classification than distant ones. One upshot of this is that we can now use every known instance to classify a new instance, but this will obviously affect performance.

Distance weighting offers some protection against noisy data by smoothing out the impact of noise, providing a sufficiently large *k* is used.