Notation: We will choose all

vectors to be column vectors, and symbols denoting vectors
or matrices will be written in

**bold**.

The Jacobian is defined for a vector function of multiple
variables. Consider a function
**F(X)**, where **X** is an *n*-element vector and **F(X)**
is an *m*-element vector. We can write it as

/ / x_{1} \ \ / f_{1}(x_{1}, ..., x_{n}) \
| | x_{2} | | | f_{2}(x_{1}, ..., x_{n}) |
**F**| | ._{ } | | = | ._{ } |
| | ._{ } | | | ._{ } |
| | ._{ } | | \ f_{m}(x_{1}, ..., x_{n}) /
\ \ x_{n} / /

Here we have explicitly written the

*m*-

dimensional vector function

**F** as a vector of

*m* real-valued

functions, one to provide
each component of the vector. We have also split the input vector

**X**
into its

*n* components.

Formally, the Jacobian is defined as an *m* by *n* matrix such that

_{ } ∂f_{i}
J_{ij} = ---
_{ } ∂x_{j}

The Jacobian can be though of as a vector and multi-variable extension
of the derivative of a real-valued function of a single variable.
For example, given an arbitrary real-valued function *f(x)*, we might know the value
of the function at one point *x = a* and want to determine the value of
*f* at other points very close to *a* (*a* is constant):

f(a + Δx) = f(a) + Δf

Single-variable

calculus tells us that

Δf df
-- ≈ --
Δx dx

for a small change Δ

*x*, which means that

df
Δf ≈ Δx --
dx

That means that

df |
f(a + Δx) ≈ f(a) + Δx -- |
dx | x = a

Now imagine a different

function *f(x, y)*; this is a function of two
variables. We are interested in the value of

*f* near

*(x, y) = (a, b).* If

*y* is held

constant while

*x* varies, then

*f* is really just a function of a
single

variable *x*, so that as before

∂f |
f(a + Δx, b) ≈ f(a, b) + Δx -- |
∂x | (x, y) = (a, b)

and

| ∂f |
Δf | ≈ Δx -- |
| Δy = 0 ∂x | (x, y) = (a, b)

exactly as before. (I wrote ∂f instead of df to indicate that all
derivatives are now

partial derivatives, which are what you need when
you deal with a function of multiple variables. A partial derivative is
obtained by differentiating a

function of multiple variables by a single
variable, while treating all the other variables as constants.)

Similarly, if *x* is held constant while *y* varies then

| ∂f |
Δf | ≈ Δy -- |
| Δx = 0 ∂y | (x, y) = (a, b)

So if we change

*x*, holding

*y* constant, and
then we change

*y*, holding

*x* constant (or if we do it the other way around), then
as long as the changes were small enough that the partial derivative stayed roughly constant we can write the total change in

*f* as

/ ∂f ∂f \ |
Δf ≈ | Δx -- + Δy -- | |
\ ∂x ∂y / | (x, y) = (a, b)

This can be extended to a function of arbitrarily many variables. But what
does any of this have to do with the Jacobian? Consider our function

*f*. From
the definition above, the Jacobian will be a 1x2

matrix, which is almost

degenerate but will do for the sake of example. Then

/ ∂f ∂f \
**J** = | -- -- |
\ ∂x ∂y /

So that we can use

vector notation for everything let

/ Δx \
Δ**X** = | |
\ Δy /

which is 2x1, and let Δ

**F** = [ Δf ], which is 1x1 (like I said, almost
degenerate). Then

/ ∂f ∂f \ / Δx \ ∂f ∂f
**J** Δ**X** = | -- -- | | | = Δx -- + Δy -- ≈ Δ**F**
\ ∂x ∂y / \ Δy / ∂x ∂y

So that means that

Δ**F** ≈ **J** Δ**X**

which is the

equivalent of the single-variable relation

df
Δf ≈ -- Δx
dx

with the Jacobian standing in for the single-variable derivative. The
simplest possible multi-variable case was shown here, but this generalizes
to a vector of functions

*f*_{1}, f_{2}, ..., f_{m} of
a vector of variables

*x*_{1}, x_{2}, ..., x_{n}.

Many single-variable results are easily generalized to the multi-variable case
simply by replacing the derivative with the Jacobian. For example, Newton's
method for the solution of non-linear equations generalizes to the
Newton-Raphson method, and chain rules are pretty much exactly what you'd guess.

None of this is rigorous.