A common problem in the field of machine learning is over-fitting. What this means is that as a learning algorithm is trained using existing and labeled training data, there is a danger that the algorithm will tune itself too finely to the training data and will be unable to generalize to new, unseen data points.

Cross validation attempts to prevent this over-fitting by estimating how well a learning method will be able to predict future data, assuming that the new data is drawn from the same distribution as the training data. Cross validation can also be used to choose between several learning algorithms, by finding the lowest cross validation error for the given data distribution.

There are several methods of cross validation, each with its own benefits and drawbacks: