k-fold cross validation - Everything2.com

K-fold cross validation is used in the field of machine learning to determine how accurately a learning algorithm will be able to predict data that it was not trained on. When using the k-fold method, the training dataset is randomly partitioned into k groups. The learning algorithm is then trained k times, using all of the training set data points except those in the kth group. The form of the algorithm is as follows:

Divide the training set into k partitions.

For each k:

Make T the dataset that contains all training data points except those in the kth group.

Train the algorithm using T as the training set.

Test the trained algorithm, using the kth set as the test set. Record the number of errors.

Report the mean error over all k test sets.

K-fold cross validation is extremely useful, if the correct value of k is chosen. It is less 'wasteful' of data than test set cross validation, and less 'expensive' than leave-one-out cross validation. In general, if the correct value of k is used, k-fold cross validation provides the best estimate cross validation error.

Unfortunately, there is no theoretically 'perfect' way of determining the appropriate k value. Using the value k = 10 seems to be a good rule of thumb, although the true best value differs for each algorithm and each dataset. It is interesting to note that when k is allowed to increase until it is the size of the total dataset, k-fold cross validation behaves identically to leave-one-out cross validation.

leave-one-out cross validation	IP over Avian Carriers with Quality of Service	test set cross validation	cross validation
machine learning	Bayesian Network	Regression	Huh?
Artificial Intelligence