The mean value theorem is the most important theorem of differential calculus; it is a crucial tool in the proof of such basic results as the inverse function theorem, Taylor's theorem and the equality of mixed partial derivatives.
The form of the mean value theorem discussed in the other writeups in this node is standard in first-year calculus books, but it does not generalize to higher dimensions. To see this, think about paths in the plane, that is functions γ: [0, 1] → R2. If the mean value theorem were true in the same form for functions into R2, it would follow that whenever γ(0) = γ(1), there is a point 0 ≤ t ≤ 1 with γ′(t) = 0. That is, any differentiable closed path has a critical point. Of course, this is not true. Just think of a circle traversed at constant speed. The problem is that when you have more than one range dimension, you can come back to your starting point without passing through zero: the intermediate value theorem is no longer true.
What is true is that you cannot travel more distance than your maximum derivative permits; Jurph's tollbooth speeding ticket example still works, with a caveat. The toll collector can say "you were going at least 78 miles per hour", but not specify the exact value of your derivative, that is, she can't say "at some point you were going exactly 78 miles per hour in the direction between your endpoints". For remember, in two dimensions, a derivative is a velocity vector.
Thus, to establish the general mean value theorem, we just have to remove the appeal to the intermediate value theorem at the end, and prove an inequality rather than an existence theorem:
Mean value theorem. Suppose V is a (not necessarily finite dimensional) Banach space, and f: [0, 1] → V is a differentiable function. For any points x ≠ y ∈ [0, 1], put xt = (1 − t) x + t y. Then
||f(y) − f(x)||V ≤ |y − x| sup {||f′(xt)||V | 0 ≤ t ≤ 1}.
This isn't the "linear transformation" derivative, which lives in V* = L(R; V), but the "first-year calculus" derivative, limh→0 (f(x+h) − f(x)) / h, which is a point of V. It turns out not to matter. If you don't understand this remark, ignore it and use your calculus intuition.
The words are bigger, but all this says is that if f is differentiable, its change between two points (the left side) is limited by the maximum speed of that change between the points (the right side).
Proof of the theorem. The mean value theorem is a fundamental theorem of analysis which depends crucially on the completeness of the real numbers, and the proof follows a common pattern for such theorems. A one-sentence summary is, "Suppose it were false, then by the least upper bound axiom there must be a highest number for which it is still true; but then it must extend a bit past this point." Formally, choose any
M > sup {||f′(xt)||V | 0 ≤ t ≤ 1};
that is, M exceeds our maximum speed. Let E be the set of all t for which f doesn't break the higher speed limit given by M:
E = {t ∈ [0, 1] | ||f(xt) − f(x)||V ≤ M t |y − x|}.
We want to show that 1 ∈ E, because this implies that
||f(y) − f(x)||V ≤ M |y − x|
and since this is true independent of M, ||f(y) − f(x)||V is actually bounded by the infimum of all the M we could have chosen; that is
||f(y) − f(x)||V ≤ |y − x| sup {||f′(xt)||V | 0 ≤ t ≤ 1},
which is the desired conclusion. Hence let s be the largest element of E, and let us show that s = 1. Suppose not, and choose t > s, t ∈ [0, 1]. By the triangle inequality,
||f(xt) − f(x)||V ≤ ||f(xt) − f(xs)||V + ||f(xs) − f(x)||V.
The second term here is no problem: since we supposed s ∈ E, it must be that
||f(xs) − f(x)||V ≤ M s |y − x|.
As for the first, remember that M exceeds the derivative of f at every point of [0, 1], in particular at xs: M > ||f′(xs)||V. Thus, if t is sufficiently close to s,
||f(xt) − f(xs)||V < M |xt − xs| = M (t − s) |y − x|,
since ||f(xt) − f(xs) − f′(xs)(xt − xs)||V = o(|xt − xs|) = o(t − s) as t approaches s. (This is the definition of the derivative.) Hence
||f(xt) − f(x)||V ≤ M (t − s) |y − x| + M s |y − x| = M t |y − x|,
that is t ∈ E, which contradicts the choice of s as the largest element of E. Hence s = 1 and we are done. ///
This proof comes in essence from volume 1 of Lars Hörmander's masterpiece The analysis of linear partial differential operators. A more accessible reference for general differential calculus is Foundations of modern analysis by Jean Dieudonné.