This is a C++ writeup, and a somewhat arcane one at that.


The C++ Standard Template Library is a vast candy-box full of good things, most of which don't show on the surface. Two good things that do show on the surface are the string class (a specialization of the basic_string template), and the map template, which gives you an associative array. We'll be discussing both of those, and we'll also venture a tentative inch or two into the great darkness that lies below them.


We have an associative array: std::map<string, foo *>. Our problem today is that we happen to need our lookups to be case blind. std::map::find() is efficient: That's why we're using std::map in the first place. We don't want to reinvent the wheel here. We also need to preserve the case of the keys. Therefore, we want to alter the behavior of string comparisons taking place deep in the guts of STL code that we didn't write and don't care to modify. We don't want to touch anything else, because everything else is hunky dory. Fortunately, the STL is designed to allow just that.

The STL std::tree template is used for the guts of the std::map, and it uses the operator< member function of the key type to do comparisons. If the key type is a primitive type like int or char, == isn't even a member function, but a C++ template doesn't care about the distinction either way. But our key type isn't primitive at all: It's a string class.

Clearly, there are (in principle) two ways to approach the problem: We can alter the behavior of string::operator<, or we can alter the behavior of map.

So. We'll start with altering std::string, because it's the solution I tried first, it looks like the wrong solution, and it's the solution I decided to use. It works, but it's mighty verbose.

The std::basic_string template has three type parameters:

  1. The character type (char, wchar_t, or whatever turns you on);
  2. A "character traits" class, usually std::char_traits or a variant thereof;
  3. The obligatory allocator class.

The character traits class defines the behavior of the characters in the string. It's got a bool eq(T, T) member to test two characters for equality, and an bool lt(T, T) member to test if one character is "less than" another. It's also got compare (T *, T *, size_t n) which compares sequences of characters, and there are a few other goodies besides. Well, now. All we've got to do is subclass std::char_traits<char> and override those three member functions and a couple others:


template<class _E>
struct blind_traits : public char_traits<_E>
{
    static bool eq(const _E& x, const _E& y)
        { return tolower( x ) == tolower( y ); }
    static bool lt(const _E& x, const _E& y)
        { return tolower( x ) < tolower( y ); }

    static int compare(const _E *x, const _E *y, size_t n)
        { return strnicmp( x, y, n ); }

    //  There's no memichr(), so we roll our own.  It ain't rocket science.
    static const _E * __cdecl find(const _E *buf, size_t len, const _E& ch)
        {
            //  Jerry says that x86s have special mojo for memchr(), so the 
            //  memchr() calls end up being reasonably efficient in practice.
            const _E *pu = (const _E *)memchr(buf, ch, len);
            const _E *pl = (const _E *)memchr(buf, tolower( ch ), len);
            if ( ! pu )
                return pl;  //  Might be NULL; if so, NULL's the word.
            else if ( ! pl )
                return pu;
            else
                //  If either one was NULL, we return the other; if neither is 
                //  NULL, we return the lesser of the two.
                return ( pu < pl ) ? pu : pl;
        }

    //  I'm reasonably sure that this is eq() for wide characters.  Maybe.
    static bool eq_int_type(const int_type& ch1, const int_type& ch2)
        { return char_traits<_E>::eq_int_type( tolower( ch1 ), 
                                                     tolower( ch2 ) ); }
};

//  And here's our case-blind string class.
typedef basic_string<char, blind_traits<char>, allocator<char> 
>  blindstr;

//  . . . and our case-blind map:
typedef std::map<blindstr, foo *>  foomap;

But look at all that code! What a mess. Let's take a look at altering the behavior of the map instead. First, we'll RTFM on std::map. What does it give us that we might find useful?

The std::map template has three type parameters:

  1. The key type;
  2. The "value" type;
  3. A "predicate" class: This is a class with one relevant member function, operator():

    bool operator()(const T& x, const T& y) const;

    That predicate is used to compare instances of the key type. By default, std::map uses the generic predicate std::less, which uses the operator< members of the two arguments

    Bingo. He's our boy.

  4. . . . and several butcher's aprons: The obligatory allocator class.

Alright, then. Let's make a predicate. We RTFM some more, write a little code, and here's what it looks like:


struct case_blind_string_compare 
    : public binary_function<string, string, bool>
{
    bool operator() (const string& x, const string& y) const
        { return stricmp( x.c_str(), y.c_str() ) < 0; }
};

To use it, we modify very slightly the typedef where we define our map:

    typedef std::map<string, foo *, case_blind_string_compare>  foomap;

Now get this: I used the former method, and left the much cleaner predicate stunt in a comment. I'm not wholly convinced that I did the Right Thing, but I'm reasonably confident. Here's why: The key type is a path in an operating environment where the filesystem is case blind. In that environment, paths are strings of which case is not a meaningful property. It makes sense to represent them that way all the time, rather than just remembering to store them in case-blind containers.

If we wanted to port the code, we could put an #ifdef block around the definition of blind_traits, and switch in a case sensitive version when the thing is compiled for a case sensitive environment. We could do the same with the predicate, of course, but that handles only one special case of a more general issue.

My boss suggests that we might consider the filesystem as the thing which has the property of case blindness, in which case the latter method would make more sense. Or equal sense. I'm not sure I buy that, but then again I'm not sure I don't, either. He didn't object when I decided to stick with my view. It's a fun question, isn't it?

If I'm wrong, I suppose they'll put me in the dunk tank at the company picnic (again), but I think I'm right.

Thus do I earn my daily bread.

Log in or register to write something here or to contact authors.