Come on, rp, that's a bit of a violent objection. Admittedly, the above proof isn't very illuminating, but what you need to do when looking at reasoning presented in this manner is to spend a moment to work out what it's actually saying, and then rebuild your concepts and understanding from this.

Firstly the lemma isn't very important (although symbolically it accounts for most of the proof) - it just states a property of orthonormal bases that we (mathematicians) wanted to be true anyway (otherwise they'd not be of much use): you can expand any vector out over an orthonormal basis in a way that the coefficients come out to be about the simplest possible thing they could be. In fact, the lemma, when extended to Hilbert Spaces, makes good intuitive sense in The Dirac Formalism of Quantum Mechanics.

Now, as for the theorem (which, incidentally, some people prefer to call an algorithm): it says that if you take a bunch of linearly independent vectors that span a space (a basis), you can convert it into one where they're all at right angles. Firstly, imagine you're just doing this with a pair of vectors (a and b) to form a pair of vectors at right angles (u and v). You can take one of them as it is (u = a, divided by its length since we wanted an orthonormal basis), and construct the other by starting with b and taking away its component which is in the direction of a (I(a,b) a in thax' notation): i.e., v = b - I(b,a) a (again, now divide v by its length). Also, because the original vectors were LI, this guarantees that w is not zero. Makes sense, yeah?

But we were doing this in n dimensions, of course. So we start with vectors (a,b,c...) and want to end up with (u,v,w...). (I know I will run out of alphabet, but I don't know how to do subscripts.) We find u and v in exactly the same way as before. As for w, we have to take away the components in the directions of both u and v, and so now w = c - I(c,b) b - I(c,a) a (...divided by its length). At this point I think you'll be able to guess the general formula, so I'll say et cetera at this point. Note again that in every case the linear independence of the original vectors ensures that none of the new vectors is zero, and indeed they are LI themselves.

Now have another look back at the above proof: it says exactly the same thing!

After going through reasoning like this in such a carefree manner, it's a good idea to have another look over it and see how each of the assumptions in the statement of the Theorem was used:

  • The non-zero space part is a bit of a formality really.
  • Euclidean vector space: we wanted a space in which the concept of length was meaningful.
  • We needed the inner product to compute the lengths. Technically, we also need it to be non-degenerate to prevent (say) distances being measured in 3-space by distances when projected onto some plane, so length means what we think it means.
  • Non-degenerate: we divided by lengths, so didn't want any of them to be zero when they ought not to be.

...and then note that each of these appeared in the reasoning I gave. All is well.

I personally find it fascinating how very advanced and abstract mathematical arguments tie in with the mental ideas that are used to produce and subsequently understand them. The way I see it the frustration you are expressing above is at the way it is taught - all rigour and formalism but no conceptual meaning (lecturers of my degree course are particularly guilty of this).

In saying all this I'm not in favour of arm waving proofs; I'm saying that understanding and intellectual rigour complement each other, and should be taught in this way.

Post script: much though I try, I can't come up with a way of doing this for every theorem I encounter!