quicksort (idea) by falkkin

"... it is sometimes said to run in time O(n log n), but this is false"

Actually, this is true.

As joeytsai points out, quicksort "chooses an element (called the pivot), and divides the remaining elements in two groups - elements smaller and greater than the pivot." Quicksort's optimal running time of O(n log n) doesn't happen unless the choice of pivot roughly cuts the input in half each time. If we cut the input nearly exactly in half at every iteration, the running time of quicksort can be found by solving the recurrence:

T(n) = 2T(n/2) + O(n).

In other words, the cost to solve a problem of size n is equal to the cost to solve 2 subproblems of size n/2, plus some constant factor times n. The easiest way to solve this recurrence is with the Master Theorem: case 2 applies here. The solution is therefore O(n lg n).

The problem arises in that the "naive" method of splitting the input doesn't result in evenly-sized subproblems every time. However, with a better method of splitting the input, we can be guaranteed to achieve the optimal worst-case runtime. Note that we can split the array exactly in half if we always choose its median as the pivot. The median is also known as the n/2th order statistic, which can be found in O(n) time. Doing this at each step of quicksort does not increase the asymptotic worst-case running time; the recurrence above simply becomes

T(n) = T(n/2) + O(n) + O(n).

... but O(n) + O(n) = O(n), since O(n) basically means "n times any positive constant". Adding the median-selection step makes this constant higher, but the asymptotic running time of quicksort is still O(n log n).

On the other hand, the SELECT algorithm for finding the median is relatively expensive for an O(n) algorithm; i.e. the constant is high. For the best wallclock running time, you'd want to empirically determine at what input size the O(n lg n) quicksort given above is actually faster than the O(n^2) naive or randomized-pivot quicksort, and only call the O(n lg n) quicksort when the input is sufficiently large.

yerricde pointed out that heapsort's O(n lg n) performance may be better in this case. I tend to agree; there is a practical difference between the running times of, say, heapsort and quicksort, even if the "theoretical" running times are the same. In fact, NIST's directory of algorithms and data structures (http://www.nist.gov/dads/HTML/introspectiveSort.html) lists a type of sort called "introspection sort" and defines it as: "A variant of quicksort which switches to heapsort for pathological inputs, that is, when execution time is becoming quadratic."

Sorting Algorithms	Quick Sort	Peter Landin	stupid-sort
stooge sort	C.A.R. Hoare	Merge Sort	bubble sort
Heap Sort	Wet T-shirts and hot summers: a fifteen year old's definition of love	Heapsort	Cool Mint Listerine PocketPaks
stable sort	Haskell	Time complexity	Selection Sort
Insertion Sort	binary search tree	The Future Sound of London	introspection sort
SAS	Median	Big O notation

quicksort (idea)

Recommended Reading

About Everything2

User Picks

Editor Picks

New Writeups