The name given to a programming idiom used when a list of records
is to be re-ordered according to a function of the record, rather
than the value of the record itself. It is named[1] after
Randal Schwartz, who popularized its use in Perl, where it is
often implemented as a one-liner, though the concept is language-independent
and the first use of it is lost in the shrouded mists of time.
The idea is very simple. Rather than sorting the list of records
using a comparison function that computes the function of the two
given records and returns the comparison value of the two function
values, a new list is created, each element of which is a record
consisting of the function's value for the corresponding record in
the original list, and that record itself. Thus, an original list
R1, R2, R3, ...
is converted to
(f(R1), R1), (f(R2), R2), (f(R3), R3), ...
That list is sorted, and then converted again, removing the first item from
each element, which produces the desired ordered list.
Note that the point of this exercise is to compute the function's value for
each record only once; the simplistic approach of computing the value in the
comparison function can compute the same function for the same record many
times, depending on the size of the list, the sorting algorithm used, and
the implementation thereof. While this is done in order to speed up the
entire sorting process, and it generally will, it is possible for this
sorting to take longer because of the fact that a bigger list is
being sorted than in the simplistic case, which may raise swapping issues,
though that would be a pathological case.
Also, when used in interpreted languages such as Perl or Python,
another source of speedup is that the supplied sort function is generally
faster (much faster in the case of Python) if it is called
without a user-defined comparison function.
Here is a simple Python function which sorts a list using a Schwartzian
Transform.
def STsort(list, f):
"""
STsort(list, f) --> list
Sorts the given list using a Schwartzian Transform with function f
"""
list = [ (f(item), item) for item in list ]
list.sort()
return [ item[1] for item in list ]
This could be used, for example, to sort a list of names of people according
to how old they are, by looking up their age in a database (a relatively
slow operation that you wouldn't want to repeat).
People = ['Nolan', 'Clarence', 'Rebecca']
def Age(Person):
return DBLookup(Person).age
ByAge = STsort(People, Age)
which would return ['Rebecca', 'Nolan', 'Clarence'].
It's just as easy
to sort according to the number of vowels in the name:
def nVowels(Name):
i = 0
for c in Name: i += c in 'aeiou'
return i
ByVowels = STsort(People, nVowels)
which would return ['Nolan', 'Clarence', 'Rebecca'].
Update, 1/16/2005
With the advent of Python version 2.4, released November, 2004,
the machinery of the Schwartzian Transform is built in to the
sort method of list objects. It can, of course, be used just as
before by either specifying no argument for a sort using native
comparisons of the list elements, or by specifying a comparison
function to use instead. That argument is now formally called
cmp, because other optional arguments have been
added.
The one that concerns us here is key. This argument
receives a function that takes one argument, a value from the
list, and returns the sort key to be associated with it. Before
sorting the list, key is called for each element, and
the list of those keys is sorted instead; then the list is
re-ordered according to the sorted keys.
Thus, to use the above examples, one can get the ST effect
simply with
People.sort(key=Age).
There is one difference to note however: the standard list sort
method, since version 2.3, guarantees a stable sort (which means that
two elements of the list having the same key value will appear
in the same order relative to each other after sorting as before). In the STsort function above, we see that
the key and the element are treated together as one element of
the list that was actually sorted, which meant (according to
Python's standard tuple comparison rules) the original value
acted as a tie-breaker among elements with the same key value.
The difference can be seen here:
People = ['Rebecca', 'Clarence', 'Nolan']
# The nVowels values are 3, 3, and 2 respectively
print STsort(People, nVowels)
# (prints ['Nolan', 'Clarence', 'Rebecca'])
People.sort(key=nVowels)
print People
# (prints ['Nolan', 'Rebecca', 'Clarence'])
[1] According to m_turner, this construct was christened by Tom Christiansen, in a Usenet conversation in which he derided it.