There is a very real distinction when writing code such as the above between language extension and program implementation. The above code implements swap, a really basic operation, using templates. Although hiding implementation is Good (tm), in this case it's not really important: you know how swap will work. The header really still isn't giving out any implementation secrets.

In my experience (which is by no means definitive - msg me!), this is the general case. Templates are solely a tool for adding value. They should only be used to wrap a flexible implementation with variable types, not to implement the implementation itself. Part of the beauty of templates is that they encapsulate metaprogramming. In other words, you should have some kind of implementation that works without templates.

To the casual observer, my claim is patently false. The only templates the average newbie sees are the STL. The STL has no ".cpp" implementation files. But, is the STL implementing a program, or merely extending the language?

Templates serve the purpose of bridging gaps where the C++ language is insufficient to express some generalization. Generalization is what templates do, and that is a separate thing from implementation. STL is a bunch of really useful generalizations, but not complete implementations. To sort a vector, you first have to define comparators. If you talk about how to sort, without knowing what you are sorting, you should not really have any implementation details to hide - it's too basic. The underlying reason for hiding encapsulated implementation is that programmers shouldn't rely on side effects of internal details. Something as simple as sorting should not have side effects.

My argument thus far is that generic-ness excludes the need for privacy. That's unfortunately pretty idealistic. The edges are fuzzy. For example, how does the sort behave with a partial ordering of distinct items? There is an orthogonal fuzzy edge, though, which allows the programmer to (attempt to) balance the implementation.

In most cases, including many in the STL, C++ supports enough high-level programming to encapsulate the engine of even a generic algorithm in the implementation file. This requires writing the header as a discrete type-translation layer. Function pointers can package up code which needs to be called, pointer to members and regular pointers package data. More abstract values can be produced with sizeof or offsetof. With elbow grease, actual runtime algorithms can move into .cpp files, compiled once and nicely hidden.

This is still not a particularly strong argument. To prove that templates do not force the exposure of any particular kind of implementation detail, I would have to define some kind of multidimensional space for all implementation details. I'm way too lazy to do that. As you can see, I'm too lazy even to provide an example of proper template factoring as I've defined it. All I can state is that my ideal template methodology differs from the way STL implementations are universally written, and that my methodology has been practicable for me so far.

References: having written a few STL-style templates, such as an allocator, and having systematically parameterized an existing system (see why C++ doesn't suck).