k-nearest neighbour learning

A form of instance-based learning used to learn a classification. For example, assume you have some articles that need to be classified into groups by a machine. You should 'seed' the instance space (an n-dimensional space that contains the articles) with samples that you know the classification of. When you encounter a new instance, you need to assign it a classification based on what you know from the existing samples.

In order to classfiy a new instance, we must define a measure of 'distance' between the new instance and another. Possible metrics are the Euclidean distance or the Manhattan distance.

Given that we have a training set of instances, with known classifications

s = ((x₁,y₁),(x₂,y₂),...,(x_m,y_m))

a new instance x is classified by looking at the k nearest examples in s.

N_k(x,s) = {the k examples (x',y') in s with x' closest to x}

For classification problems with classes W = {w₁, ... , w_c}, we take the most common class among the k nearest examples.

If k=1, we have 1-nearest neighbour. This effectively divides the instance space into convex polyhedra surrounding each instance encountered so far. A new instance takes the classification of the instance whose polyhedra it falls into.

For regression problems where W is the real numbers, we use the average of the k nearest neighbours. This is a more general form of the algorithm that can be used to locally approximate continuous functions.

One way to refine the k-nearest neighbour learning algorithm is to distance weighting on the neighbouring instances. When we add up the neighbouring instances (either in the discrete classification case, or the contiuous function case), we scale the values by it's inverse squared distance from the new instance. This makes closer instance more relevant to the classification than distant ones. One upshot of this is that we can now use every known instance to classify a new instance, but this will obviously affect performance.

Distance weighting offers some protection against noisy data by smoothing out the impact of noise, providing a sufficiently large k is used.

Manhattan distance	instance-based learning	curse of dimensionality	The $100,000 infield
Do You Speak English?	Voronoi Diagram	statistical clustering	Cepheid distance scale
The Inverse Square Law	machine learning	operational amplifier	YUV
Everything 1	reinforcement learning	dysvidia	Tristan
Extended Euclidean algorithm	CDP	administrative distance	advances in AI
10-4	Bender

k-nearest neighbour learning

Recommended Reading

About Everything2

User Picks

Editor Picks

New Writeups

Login
Password

k-nearest neighbour learning

Sign In

Recommended Reading

About Everything2

User Picks

Editor Picks

New Writeups