aka bin sort or histogram sort

To sort an array of numbers with n possible values:

  1. allocate n "buckets," each corresponding to a different element value and holding a count
  2. for each element in the array, increment the count in the bucket corresponding to that element's value
  3. when you've finished, go through each successive bucket in order, writing the value assigned to that bucket as many times as indicated by the count in that bucket

Limitations:

  • You need to know the maximum numer of times each value may be repeated, and each bucket needs to be able to hold that count.
  • You need to know the range of possible values and have enough memory to store that many buckets.

Complexity is O(n) if the range of element values is fairly small, otherwise O(n+m), where m is the number of element values.

This algorithm is probably at its best when you have a small range of possible values which are about equally frequent in your array.

sorting algorithms

The algorithm that rabidcow has described in the previous writeup on this node is more properly identified as counting sort. Counting sort is a special case of bucket sort.

Bucket sort does not sort elements into homogenous piles by key, as does counting sort. Counting sort only works when your input is discrete and integer-like; inputs that have clearly defined ranges, but not clear values, or an infinite range of values (such as floating-point numbers) are not candidates for this algorithm.

A bucket sort instead divides the input into ranges of values, approximately evenly divided along the total input range. These ranges are, of course, known as "buckets". The efficiency of a bucket sort depends on how well the number of buckets and their dividing points are selected.

Buckets are usually represented as linked lists, which may be empty. To perform the sort, move each item of the input array to the bucket-list that covers it. If that list is empty, start it. If the list is not empty, insert the new element in sorted order (at potential O(n) cost).

For many varieties of input, the buckets are values that can be computed rather than just arbitrary limits that must be tested, so finding the right bucket for an element can be an O(1) operation. Assuming this, then for each element, finding the right bucket is O(1), but putting it into the bucket is proportional to the number of elements already in the bucket. The cost of a bucket sort is therefore O(NB) where B is the size of the largest bucket. If the Bucket Sort works perfectly, which it never does, then B is approximately equal to N/k, where k is the number of buckets. If k is a function of N, however, as it should be in a well-designed bucket sort, N/k will be a constant and will disappear in the time complexity, leaving a perfect bucket sort with perfect bucket count at O(N). If the input hates you, however, everything will be in the same bucket and all those sorted linked-list insertions will require O(N) time each, making the bucket sort perform identically to an insertion sort, O(N2), because it will be an insertion sort.

Bucket sort is a classic example of an unreliable algorithm. It will perform extremely well in a well-defined but restricted set of situations. In other situations, however, it will perform extremely poorly. If you can prove that your input will be in one of the good situations, or will at least not hit the pathological case too often, bucket sort may be an extremely good choice of algorithm. If you cannot prove anything about your input, however, quick sort or merge sort is very likely to be a better, more reliable choice.

Log in or register to write something here or to contact authors.