Let X,Y be normed spaces with norms || ||X and || ||Y.
A function f: X -> Y is called differentiable in the point x of X, iff there is a continuous linear map A and the limit
            f(x+h) - A(h) - f(x)
lim         ---------------------------
||h|| -> 0            ||h||
exists and equals 0 (||h||=||h||X, h of X).
A is called the derivative of f in x and is usually written as Df(x). (This is a linear map, to get any values you would have to write Df(x)(z).)

Let L(X,Y) the normed space of continuous linear functions from X to Y.
If the derivative of f exists in an open neighborhood of x, then the derivative of the map Df: X -> L(X,Y), x |-> Df(x) might exist. It's called the second derivative of f in x, written as D2f(x).
If the second derivative exists in an open neighborhood of x, then the derivative of D2f : X -> L(X,L(X,Y)) might exist and is called the third derivative.
In fact the n-th derivative is the derivative of the function Dn-1f : X -> L(X,...L(X,Y))...)), x-> Dn-1f(x).
However these "stacked" spaces of linear functions L(X,...L(X,Y))..)) are difficult to use. Therefore one uses the fact that L(X,..L(X,Y)...) with n L's stacked is isometric to B(X,Y,n) is the space of n-linear continuous functions (Note: B(X,Y,n) is not canonical for this space, I just made it up)
The isomorphism is defined per: h of L(X,...L(X,Y)..) goes to g of B(X,Y,n) via g(x1,...,xn):= h(x1)...(xn).
So one takes as the n-th derivative the function Dnf(x1)...(xn) instead of Dnf(x1) of L(X,...L(X,Y)..) (n-1 L's stacked)

The function f is called n times continuous differentiable in x iff the map Dnf: X -> B(X,Y,n), x |-> Dnf(x) is continuous (Note: this is not a linear map !)

Now comes the question: "What has this to do with the usual derivative of R1 -> R1 ?"
The derivative of R1 is scalar. Multiplication with a scalar is the form of linear maps from R1 to R1. Set Df(x)(y) := f'(x)· y and you get the above form.

This definition allows you to differentiate in really sick spaces like function space, spaces of matrices etc.
The derivatives are quite difficult to determine there but some simple laws still hold:

  • The derivative of a continuous linear map is the map itself at any point of X.
  • The chain rule always holds: D(f(g))(x)(y) = (Df)(g(x))(Dg(x)(y)) where (Df)(g(x)) is the derivative of f at point g(x), the formula means: to get the image of y under D(f(g)) first apply Dg(x) on y and then apply (Df)(g(x)) ( f(g) is the function you get when you apply f to the images of g)