In calculus class you learn a rule, called the product rule, for finding the derivative of the product of two functions:

(fg)′ = f′ g + f g′.

Mathematicians like to generalize things, and there are several generalizations of the product rule, each in its own context. But we also like to associate theorems with a famous name, and Leibniz has gotten his name stuck to this one; so whenever we construct a calculus with an idea of differentiation in it, the formula that tells what the derivative of the product of two things is, is called a *Leibniz rule*. In this node I present two commonly encountered cases.

# 1. Bilinear maps on Banach spaces

The first one generalizes the idea of "product". If you know vector calculus, you may know that you differentiate the dot product as if it were an ordinary product: if **x**(t), **y**(t) are two vectors varying with time,

(**x** ⋅ **y**)′ = **x**′ ⋅ **y** + **x** ⋅ **y**′.

You can prove this by calculation, reducing to the ordinary product rule, but it is more interesting to notice the general phenomenon. If you're not familiar with the derivative as linear transformation instead of number, this theorem will be hard to understand; read derivative first.

**Theorem 1.** Let E, F, G be Banach spaces, B a continuous bilinear map E × F → G. Then B is differentiable everywhere on E × F, and its derivative is DB(x, y)(u, v) = B(u, y) + B(x, v).

*Examples*

- The ordinary product rule corresponds to the case E = F = G =
**R** (or **C**), B(x, y) = μ(x, y) = x y. Numerical multiplication is the prototypical example of a bilinear map: it is linear in each slot for fixed values of the other slot:
(α x_{1} + β x_{2}) y = α (x_{1} y) + β (x_{2} y), x (α y_{1} + β y_{2}) = α (x y_{1}) + β (x y_{2}).

In this case the derivative is DB(x, y)(u, v) = u y + x v. Note that this tells you the derivative of *multiplication*. To get the first-year calculus product rule from this, you have to remember that if f: **R** → **R** is a numerical function, its "first-year calculus derivative" is f′(t) = Df(t) **1**, where **1** is 1 thought of as a tangent vector to **R**. Now apply the chain rule:

f(t) g(t) = (μ o (f, g))(t) so (f g)′(t) = D(μ o (f, g))(t) **1** = Dμ(f(t), g(t)) (Df(t) **1**, Dg(t) **1**) = f(t) g′(t) + f′(t) g(t).

- Dot product is the same deal, with E = F =
**R**^{n}, G = **R**, B(x, y) = δ(x, y) = x ⋅ y. If you have two vector functions of time **x**(t), **y**(t), it is still true that **x**′(t) = D**x**(t) **1**, so the same application of the chain rule gives
(**x** ⋅ **y**)(t) = D(δ o (**x**, **y**))(t) **1** = **x**(t) **y**′(t) + **x**′(t) **y**(t).

- But there are many more interesting examples which justify the generalization. For instance, in the space L(E; E) of continuous linear operators on a Banach space E,
*composition* is a bilinear map:
T (α S_{1} + β S_{2}) = α T S_{1} + β T S_{2}, etc.

So if you have two operator-valued functions T(x), S(x): F → L(E; E) (where F is some other space), you can differentiate the composition:

D(T S)(x) U = T (DS(x) U) + (DT(x) U) S.

Note that these derivatives live in L(L(E; E); L(E; E)), so they take an operator argument U ∈ L(E; E).

*Proof of Theorem 1.* Like most simple theorems about derivatives, this one is proved by a direct computation: given a point (x, y) and an increment (u, v) in E × F,

B(x + u, y + v) − B(x, y) = B(u, y) + B(x, v) + B(u, v).

To show that DB(x, y)(u, v) = B(u, y) + B(x, v) it therefore suffices to prove that B(u, v) = o(||(u, v)||_{E×F}), that is, B(u, v) vanishes to second order at zero. Here the norm on E × F is the max-norm ||(u, v)||_{E×F} = max {||u||_{E}, ||v||_{F}}; it is uniformly equivalent to the Euclidean product norm, and easier to compute with. But since B is a continuous bilinear map, we know

||B(u, v)||_{G} ≤ ||B||_{L(E, F; G)} ||u||_{E} ||v||_{F} ≤ ||B||_{L(E, F; G)} ||(u, v)||_{E×F}^{2} = o(||(u, v)||_{E×F}). **///**

# 2. Higher derivatives, but finite dimensions

What if you want to take higher derivatives of the product of two functions? Applying the product rule twice gives

(f g)″ = f″ g + 2 f′ g′ + f g″

and iterating yields the binomial coefficients. This result extends to functions of several variables and partial derivatives, becoming the theorem that students of partial differential equations know as the Leibniz rule.

**Theorem 2.** Fix an open set U ⊂ **R**^{n}, and let P(x, ∂) = ∑_{|α|≤m} a_{α}(x) ∂^{α} be any linear partial differential operator of order m, with smooth coefficients a_{α} ∈ C^{∞}(U). (α denotes a multi-index.) If u, v are two C^{k} functions (k ≥ m) on U, then

P(x, ∂)(u v) = ∑_{|&alpha|≤m} P^{(α)}(x, ∂)u ⋅ ∂^{α}v / α!

where P^{(α)}(x, ∂) denotes the operator whose symbol is the α derivative in the ξ variables of the symbol of P; that is,

P^{(α)}(x, ∂) = Q(x, ∂) where Q(x, ξ) = ∂^{α}P(x, ξ), the derivative applied to the ξ slots of P.

*Example*

The simple case is n = 1, so that U is just an open set in **R**. In this case P(x, ξ) = ∑_{0≤i≤m} a_{i}(x) ξ^{i} is the symbol of an ordinary differential operator of order m (∂ is just d/dx), and the ξ derivatives of P are

P^{(j)}(x, ξ) = ∑_{j≤i≤m} i (i − 1) ... (i − j + 1) a_{i}(x) ξ^{i−j} = ∑_{j≤i≤m} j! C(i, j) a_{i}(x) ξ^{i−j}.

(C(i, j) is the binomial coefficient i! / j! (i − j)!.) Hence in this case our formula reads

P(x, ∂)(u v) = ∑_{0≤j≤m} P^{(j)}(x, ∂)u ⋅ ∂^{j}v / j! = ∑_{0≤j≤m} ∑_{j≤i≤m} C(i, j) a_{i}(x) ∂^{i−j}u ∂^{j}v.

If P(x, ∂) is just (d/dx)^{m}, so that a_{m}(x) = 1 and a_{i}(x) = 0 if i ≠ m, this reduces to

P(x, ∂)(u v) = ∑_{0≤j≤m} C(m, j) ∂^{m−j}u ∂^{j}v

which agrees with the result of iterating the familiar product rule.

*Proof of Theorem 2.* Observe that both sides of the desired equation are linear in the terms of the operator P and the C^{∞} coefficients a_{α} (the right side since the derivatives P^{(α)} are taken in the ξ slots, not the x slots, of P(x, ξ)). Thus it is enough to prove the formula in the case where P(x, ∂) = ∂^{β} is a single (monomial) differential operator, with constant coefficient 1, of order |β| = m. In this case the desired formula reduces to

∂^{β}(u v) = ∑_{α≤β} P^{(α)}(x, ∂)u ⋅ ∂_{x}^{α}v / α! where P^{(α)}(x, ξ) = ∂_{ξ}^{α}ξ^{β}.

Here the subscripts ∂_{ξ} and ∂_{x} denote differentiations taken on the ξ and x slots respectively. (The sum restricts from all multi-indices of weight |α| ≤ m = |β|, to only those with α ≤ β (that is α_{j} ≤ β_{j} in every coordinate), since ∂_{ξ}^{α}ξ^{β} = 0 unless α ≤ β.) Now

∂_{ξ}^{α}ξ^{β} = ∏_{1≤j≤n} β_{j}(β_{j} − 1) ... (β_{j} − α_{j} + 1) ξ_{j}^{βj−αj} = ξ^{β−α} β! / (β − α)!,

so the formula we want to prove is now

∂^{β}(u v) = ∑_{α≤β} C(β, α) ∂^{β−α}u ∂^{α}v

where C(β, α) is the multi-index binomial coefficient β! / α! (β − α)! = ∏_{1≤j≤n} C(β_{j}, α_{j}). This last formula may be proved by induction on the weight |β| = m. When m = 0 it is trivially true. Suppose then the result known for β of weight up to m, and let γ be a multi-index of weight 1 with γ_{i} = 1 and γ_{j} = 0 for j ≠ i. Compute

∂^{β+γ}(u v) = ∂_{j} ∑_{α≤β} C(β, α) ∂^{β−α}u ∂^{α}v

= ∑_{α≤β} C(β, α) (∂^{β−α+γ}u ∂^{α}v + ∂^{β−α}u ∂^{α+γ}v)

= ∑_{-γ≤α≤β} (C(β, α) + C(β, α+γ)) ∂^{β−α}u ∂^{α+γ}v

= ∑_{0≤α+γ≤β+γ} C(β+γ, α+γ) ∂^{(β+γ)−(α+γ)}u ∂^{α+γ}v

which is the desired formula with α+γ substituted for α. Here C(β, α) + C(β, α+γ) = C(β+γ, α+γ) is just the multi-index translated formula of the familiar addition identity C(n, k) + C(n, k+1) = C(n+1, k+1); it is not true in general, but works in this case since γ has weight 1 and so only one of the n factors of the product changes. The proof is complete. **///**

*References:*

Section 1. Jean Dieudonné, *Foundations of modern analysis*, (8.1.4). Academic Press 1960, 1969 (out of print).

Section 2. Lars Hörmander, *The analysis of linear partial differential operators*, Volume 1, (1.1.10). Springer-Verlag 1983, 1990.