The inverse function theorem is the foundation stone of

calculus on manifolds, that is, of

multivariable calculus done properly. It says that if f:

**R**^{n} →

**R**^{n} is continuously differentiable, and the

derivative Df(x) at a point x is an

invertible matrix, then f itself is actually

invertible near x, and the inverse is also continuously differentiable. Succinctly put, when a function is

smooth enough,

*infinitesimal invertibility implies local invertibility*. The

chain rule then forces the

derivative of f

^{-1} to be the right thing, that is Df

^{-1}(f(x)) = Df(x)

^{-1}. You may remember from one-variable

calculus a rule of the form (f

^{-1})′(y) = 1 / f′(f

^{-1}(y)). The inverse function theorem is the correct generalization of that rule to several variables.

Here is a setup for a formal statement and proof of the inverse function theorem. You don't need to understand every word to get the proof, but (except for Banach spaces) the following notions should at least be familiar.

- X and Y are Banach spaces. (If you don't know about Banach spaces just read both X and Y as
**R**^{n}. The inverse function theorem holds for maps of Banach spaces using *exactly* the same proof as for **R**^{n}, so we might as well use that generality.)
- U is an open neighborhood of x
_{0} ∈ X; f: U → Y is a function, and y_{0} = f(x_{0}).
- The derivative Df(x
_{0}) of f at x_{0}, if it exists, is a member of the Banach space L(X, Y) of continuous linear operators from X to Y. If there exists T ∈ L(X, Y) such that
||f(x) - f(x_{0}) - T(x - x_{0})||_{Y} = o(||x - x_{0}||_{X}) --- that is, ||f(x) - f(x_{0}) - T(x - x_{0})||_{Y} / ||x - x_{0}||_{X} → 0 as x → x_{0} ---

for x in a neighborhood of x_{0}, then we say that f is *differentiable* at x_{0} and T is the *derivative* Df(x_{0}). (In the case X = Y = **R**^{n}, every linear map from X to Y is continuous, and L(X, Y) is just the space of all n-by-n matrices.)

- Since the derivative Df takes values in a Banach space, we can ask whether it is continuous. If Df: U → L(X, Y) is continuous, we say that f is
*continuously differentiable*, or C^{1} for short.
- Of course, Df may also be differentiable, and we may get a continuous D
^{2}f: U → L(X, L(X, Y)), in which case f is said to be C^{2}. (Actually, D^{2}f always lies in the subspace S of L(X, X; Y) := L(X, L(X, Y)) consisting of bilinear maps which are symmetric in their two arguments; you may know this fact as the equality of mixed partial derivatives.) In general the kth derivative of f at a point is a symmetric k-multilinear map on X with values in Y. We say that f is C^{∞} or *smooth* if it is C^{k} for every natural number k. (There are a few contexts in which "smooth" means only C^{1} rather than C^{∞}.)

Now we can give the statement:

**Inverse function theorem.** Suppose f: U → Y is C^{1}. Say that g: V → X is a *local inverse* for f at x_{0} if

- V is an open neighborhood of y
_{0} = f(x_{0}), and g is C^{1} on V; - there is a smaller neighborhood x
_{0} ∈ U' ⊂ U so that f(U') ⊂ V and (g o f)|_{U'} is the identity map **1**_{U'} (g is a left inverse of f near x_{0}); - there is a smaller neighborhood y
_{0} ∈ V' ⊂ V so that g(V') ⊂ U and (f o g)|_{V'} is the identity map **1**_{V'} (g is a right inverse of f near y_{0}).

Then for such a local inverse g to exist, it is necessary and sufficient that the derivative Df(x

_{0}) ∈ L(X, Y) be

bijective (a linear

homeomorphism); and in this case g is unique.

A pedant might insist that g is only unique in the "sheaf-theoretic" sense that any two choices g_{1} and g_{2} coincide when restricted to the intersection of their domains --- since f winds up having a local inverse over *any* sufficiently small neighborhood of x_{0} and pedantically speaking two functions with different domains are unequal. This is strictly true but it's morally not the point. If you don't understand the significance of this remark, ignore it.

In fact the two conditions can be separated: f has a local left inverse at x_{0} iff Df(x_{0}) has a left inverse A ∈ L(Y, X) (that is, A Df(x_{0}) = **1**_{X}), and f has a local right inverse at x_{0} iff Df(x_{0}) has a right inverse B ∈ L(Y, X) (Df(x_{0}) B = **1**_{Y}). (In case X is finite dimensional --- and only in this case --- A exists iff Df(x_{0}) is injective, and B exists iff Df(x_{0}) is surjective.) However uniqueness no longer holds in the one-sided case.

*Proof of the theorem.*

There are an awful lot of words in this proof because I'm trying to explain the motivation for what we do. If you want the concise and elegant version, read the reference I'm expanding on for this writeup, that is Theorem 1.1.7 of *The analysis of linear partial differential operators* by Lars Hörmander.

**1.** Necessity is obvious from the chain rule: If g is a local inverse for f at x_{0}, then the equations

(g o f)|_{U'} = **1**_{U'} and (f o g)|_{V'} = **1**_{V'}

imply (taking derivatives) that

Dg(y_{0}) Df(x_{0}) = **1**_{X} and Df(x_{0}) Dg(y_{0}) = **1**_{Y}

and this says exactly that Df(x_{0}) is invertible in L(X, Y), with inverse Dg(y_{0}). --- The other direction is the meat of the theorem:

**2.** (If you get lost skip to **3** below.) First let's simplify the problem a bit. Notice that if g_{L} is a local left inverse and g_{R} is a local right inverse for f at x_{0}, then for y in the intersection of their domains,

g_{L}(y) = (g_{L} o f o g_{R})(y) = g_{R}(y);

hence g_{L} = g_{R} on a smaller neighborhood of y_{0}, and this function is a local two-sided inverse for f. Thus it's enough to prove separately that each local one-sided inverse exists.

Next, observe that if A is a left inverse for Df(x_{0}), if we set F = A o f, then by the chain rule

DF(x_{0}) = DA(f(x_{0})) Df(x_{0}) = A Df(x_{0}) = **1**_{X}

since the derivative of a continuous linear map is itself. Now if F has a local left inverse G near x_{0} (F and G are both maps X → X) then G o F = G o A o f = **1** in a neighborhood of x_{0}; thus defining g = G o A gives a local left inverse for f itself. Similarly, if B is a right inverse for Df(x_{0}), put

F = f o B; DF(x_{0}) = Df(x_{0}) B = **1**_{Y};

and if G is a local right inverse for F near y_{0} (now F and G are maps Y → Y) then F o G = f o B o G = **1** shows that g = B o G is a local right inverse for f. What we have done is reduce the problem of constructing a local left or right inverse for f, to that of constructing a local left or right inverse for a map F whose derivative is known to be the identity (on either X or Y, it works the same).

**3.** Now let's adjust our notation a little bit to the simplified situation: we have a C^{1} function F: Z → Z, with F(x_{1}) = y_{1} ∈ Z, and DF(x_{1}) = **1**_{Z}. Here Z is either X or Y, x_{1} is either x_{0} or Bx_{0}, and y_{1} is either Ay_{0} or y_{0}, according as we chose F = A o f to get a local left inverse for f, or F = f o B to get a local right inverse for f. By the last paragraph, we are reduced to proving that in this case F has a local two-sided inverse at x_{1}. Any norm || || without a subscript is the norm on Z, || ||_{Z}.

To get a local inverse we first need f to be locally injective near x_{1}. Because Df(x_{1}) = **1**_{Z}, and Df is continuous (f is C^{1}), there must be a small neighborhood of x_{1} where Df is almost **1**_{Z}: choose δ > 0 such that

||Df(x) - **1**_{Z}||_{L(Z; Z)} < 1/2 when ||x - x_{1}|| ≤ δ.

Suppose x and y are two points in this ball B(x_{1}; δ). Then applying the mean value theorem to the function g(x) = f(x) - x gives

||f(y) - f(x) - (y - x)|| ≤ ||y - x|| sup_{0<t<1} ||Dg(x + t(y - x))||_{L(Z; Z)}.

Since Dg(x) = Df(x) - **1**_{Z}, and we just said that ||Df(x) - **1**_{Z}||_{L(Z; Z)} < 1/2 for every point in B(x_{1}; δ), what this says is that

||f(y) - f(x) - (y - x)|| ≤ ||y - x|| / 2, i.e., ||f(y) - f(x)|| ≥ ||y - x|| / 2.

*In particular,* for x, y ∈ B(x_{1}; δ), if x ≠ y then f(x) ≠ f(y). That is, f is locally injective near x_{1}. This pattern of argument may seem complicated but is quite fundamental.

Now we can attempt to solve the equation f(x) = y for x, given y near y_{1}; the local injectivity of f tells us that if we find one solution for x it's the *only* solution. We do this by iterative approximation. Fix y ∈ B(y_{1}; δ/2), and define x_{2}, x_{3}, ... ∈ B(x_{1}; δ) by

x_{k+1} = x_{k} + y - f(x_{k}).

We show by induction that ||x_{k+1} - x_{k}|| < 2^{-k} δ, and consequently (by the triangle inequality) x_{k+1} ∈ B(x_{1}; δ) for each k. First of all

||x_{2} - x_{1}||_{Z} = ||y - y_{1}|| < 2^{-1} δ,

and then by the mean value theorem inequality above,

||x_{k+1} - x_{k}|| = ||x_{k} - f(x_{k}) - (x_{k-1} - f(x_{k-1}))|| ≤ ||x_{k} - x_{k-1}|| / 2 < 2^{-k} δ.

But this tells us that {x_{k}} is a Cauchy sequence, and since Z is complete there is a limit x_{∞} ∈ B(x_{1}; δ). By continuity of f, x_{∞} is a fixed point of our iteration:

x_{∞} = x_{∞} + y - f(x_{∞}), i.e., f(x_{∞}) = y.

So we have constructed a function g(y) = x_{∞}, defined for ||y - y_{1}|| < δ/2, which is a local inverse for f.

If you have recently studied metric spaces you may recognize that I have essentially repeated the proof of the contraction mapping theorem. The construction, not the theorem *per se*, is what's important.

**4.** It remains to prove that g is actually C^{1} near y = y_{1} (it would be no good if our smooth function had a rough inverse). Choose two points y, y + k ∈ B(y_{1}; δ/2), and write g(y) = x, g(y + k) = x + h. We know that f is differentiable at x, so that

k = f(x + h) - f(x) = Df(x)h + o(||h||).

What we really want is the reverse, where Dg(y) ought to be Df(x)^{-1}:

h = g(y + k) - g(y) = Df(x)^{-1}k + o(||k||).

The first equation is equivalent to

h = Df(x)^{-1}k - o(||Df(x)^{-1}||_{L(Z; Z)}||h||);

since we know ||Df(x)^{-1}||_{L(Z; Z)} < 2 for every x ∈ B(x_{1}; δ) (from ||Df(x) - **1**_{Z}||_{L(Z; Z)} < 1/2), it suffices to prove that a function which is o(||h||) is also o(||k||). But again our mean value theorem relation gives

||k - h|| < ||h/2||, hence ||h||/2 < ||k|| < 2||h||

which shows that h and k have the same asymptotic order near zero, thus that

g(y + k) - g(y) = Df(x)^{-1}k + o(||k||), i.e., Dg(y) = Df(g(y))^{-1}.

Since Df(g(y))^{-1} is continuous in y (f is C^{1}, g is continuous, and the inversion map is smooth), this shows that g is C^{1} near y_{1}, which completes the proof. **///**

*Reference:* Lars Hörmander,

*The analysis of linear partial differential operators*, volume 1, theorem 1.1.7. Springer-Verlag 1983, 1990.