Introduction

Generic programming is a philosophy of programming that emphasizes the algorithm, invented/discovered by Alexander Stepanov. Its most famous incarnation is in the STL of C++; however it is quite distinct. Best to say that C++, as a multi-paradigmatic language, also supports generic programming.

At work we have a fairly complex Message class. There are at least two different possible views of the contents of Message, neither of which exists in contiguous memory. Just last week I got asked...

"Do we have a substring find algorithm on Messages? And what about regular expressions? Do we have algorithms for those? I looked and couldn't find any relevant method in class Message."

Algorithms

Algorithms are a sequence of steps. For example, an algorithm for finding a minimal element is:

  1. Initialise the current minimal element to the first element
  2. For every succeeding element, if it is smaller than the current minimal element, set the current minimal element to that element
  3. At the end, the current minimal element is a minimal element.

This algorithm is horribly unspecified! Typically, you'll find it completely specified: the elements are assumed to be stored in an array, they are assumed to be of type (say) float, the size of the array is assumed to be known, etc. Given these properties, it is possible to prove that the algorithm works. But all of these details are completely irrelevant to the algorithm.

C++ comes with a standard library. For historical reasons, the std::string class has a "find substring" algorithm; for equally historical reasons, it doesn't have a "find regexp" algorithm. std::string is not a good example of generic programming.

The header file <algorithm> contains the find_end algorithm; if your read it carefully you'll see it's just a "find substring" algorithm, and that it seems to work on std::strings.

Generic programming suggests we look at the algorithm as the important part, then see what is required for it to work.

Concepts

What does an algorithm need in order to work? Types for the arguments (and return value) that satisfy certain requirements.

The requirements a type needs to fulfill in order for an algorithm to work on it are part of the description of the algorithm. In the above "minimal element" example, we need these requirements:

  • The elements must be ordered
  • A non-empty Range of elements must be given as input
  • Given an element, it must be possible to access the next element
  • It must be possible to determine when we have reached the end of the range
Note that the algorithm works even if the elements are only partially ordered! Of course, in that case we shall only find a (rather than the) minimal element -- but the same algorithm delivers.

It turns out that some collections of properties get used by many algorithms. These are called concepts. The "minimal element" algorithm uses a (nonempty) Range specified by Forward Iterators that can be dereferenced to a Comparable value type. In English:

  • A pair of iterators b,e, guaranteeing that if b is advanced enough times it will equal e;
  • b ≠ e;
  • The type of the iterator b supports advancing to the next element;
  • The type of the iterator b can be saved and used later;
  • The value type *b can be compared.

That's it! In one stroke, our simple algorithm will work for... anything for which it can work!

"Both views of Message are defined in terms of methods returning start and end iterators. These are bidirectional iterators. The value types are some unsigned type, so:

  1. For finding substrings, the std::find_end algorithm (which we didn't write) works unchanged;
  2. The boost::regex::match algorithm (which we didn't write, either) works if you add a small definition to adapt it to our value type.
And almost every other algorithm that doesn't require Random Access Iterators also works for Message."

Satisfying a concept sounds like inheritance, but isn't (necessarily) that. It is more like satisfying an interface -- but not in the object oriented sense. Algorithms are instantiated for particular types. Indeed, something like "forward iterator" is not an OO interface -- that would require specification of the value type!

C++ concepts specifically allow for non-objects -- the typical random access iterator is a naked pointer.

Libraries

The upshot of generic programming is that you get supremely capable libraries. Libraries are composed of algorithms -- bits of code -- and objects -- things you can give the algorithms to use. Algorithms are specified in terms of the concepts they require in order to work correctly. Objects are specified in terms of the concepts they implement.

The libraries are extensible in two ways:

  • New algorithms can be constructed, their required concepts elucidated, and take as inputs existing objects;
  • New objects can be created, the concepts they satisfy specified, and passed as inputs to existing algorithms.
Typically, of course, libraries will also contain various adaptors, to create new objects that bend existing objects to fit new concepts.

Implementations

Stepanov originally hoped to create a generic library for Ada. Compilers were unready, and template programming not understood well enough, to allow this to happen. Eventually he created the Standard Template Library for C++, which got standardized.

The STL supplies mostly algorithms, but it is more famous for its containers: a set of models of many of the concepts needed for the algorithms. Ironically, the containers are so well-known because of their genericity: they fit so many different situations that it often makes little sense to create a new container.

STL algorithms are pretty basic (and the containers, while good, are rather boring). The brilliance is, again, in their genericity. Even better, it turns out that STL algorithms are every bit as efficient as hand-written code!

std::find_end works with both views of our Messages, despite the fact the Stepanov didn't know of our Message types.

For further concepts, algorithms, and objects, the place to turn is BOOST -- the staging ground for new C++ libraries.

Java 5 (aka Java 1.5, don't ask me what happened to "1.") salvages the mess that are Java (object-oriented, not generic) Collections by adding some measure of genericity. It also tries to formalize concepts in the language. It does not try to match C++ STL efficiency, that being left to the Java compiler and JITs.

The upcoming version of C++, ISO C++ 0x, may support concept checking. Thanks to the magic/horror of C++ template metaprogramming, this requires rather fewer changes to the language than might be expected. Run-time efficiency will not be compromised, of course -- this is C++.