Definition and Some Common Properties

A transformation T: V->W is simply a rule for taking elements in V and turning them into elements of W such that there is only one element in W for each element in V (note that the converse isn't necessarily true). V and W are both vector spaces over some field F. A linear transformation is a transformation with some additional properties; namely that

For all x, y ∈ V, c ∈ F
T(x+y) = T(x) + T(y)
T(c*x) = c*T(x)

These are the only requirements that have to be satisfied for linear transformations. Linear transformations are very important in linear algebra and are used extensively throughout. From these requirements a number of properties can be derived, such as

  1. T(0)=0
    Proof: T(x) = T(x+0) = T(x) + T(0)
    So by cancellation T(0) = 0
  2. T(c*x+y) = c*T(x)+T(y)
    In fact it can be proven that this is equivalent to the two requirements above, meaning that if this is true then T must be linear.
    Proof: T(c*x + y) = T(c*x) + T(y) = c*T(x) + T(y)
  3. T(x-y) = T(x) - T(y)
    Proof: T(x-y) = T(x + -y) = T(x) + T(-1*y) = T(x) + -1*T(y) = T(x) - T(y)

There are many more interesting properties to be discovered, but to find them we'll have to look at some more definitions. Before we do that, however, we would do well to look at some interesting examples of linear transformations.

One linear transformation of note is taking the derivative of something. Because we are looking at vector spaces we will consider polynomials as an example. Consider the transformation

T: Pn(R) -> Pn-1(R) = f'(x)

Where Pn(R) means all polynomials of degree less then or equal to n over the reals, and f'(x) means the derivate of f(x). To show that T is a linear transformation we must simply verify property 2 above.

g(x), h(x) ∈ Pn(R), c ∈ R
T(c*g(x) + h(x)) = (c*g(x) + h(x))' = c*g'(x) + h'(x) = c*T(g(x)) + T(h(x))

Therefore taking the derivative is a linear transformation. Some other linear transformations are reflection, rotation, and projection, which are normally used on R2 or R3, or integrals of continuous functions. There are also two special transformations that have their own special notation. These are

I: V->V: I(x) = x
T0: V->W: T0(x) = 0

I is known as the identity transformation and it simply transforms each element into itself. T0 is known as the zero transformation and it transforms any element in V into the 0 of W.

The Range and Null Space

There are two properties of a linear transformation special enough to have their own names. These are the range and null space of the transformation. These are defined as

T: V->W
null space = N(T) = {x ∈ V: T(x) = 0}
range = R(T) = {T(x): x ∈ V}

These mean that N(T) is the set of all elements, x, in V such that T(x) is 0. R(T) is the set of all elements in W that are the result of the linear transformation, T, of an element in V. Something to note is that some people call the null space the kernel of the transformation and call the range the image of the transformation, the terms are interchangeable. There are many important aspects of these functions. One is that both N(T) and R(T) are both vector spaces themselves. In fact N(T) is a subspace of V, and R(T) is a subspace of W. Look at the node on subspaces for the proof of this.

Because both the null space and the range are vector spaces we can consider their dimensions. These are known as the nullity and rank, respectively, of the transformation and are defined simply as

nullity(T) = dim(N(T)) ≤ dim(V)
rank(T) = dim(R(T)) ≤ dim(W)

If you pause to consider these two spaces, you may realize that they are related to each other in size. The more elements of V that map to 0, the smaller the range can be. Vice-versa, the more unique elements V maps to in W, the fewer elements of V it maps to 0. There is in fact a precise relationship between the size of the null space, the nullity, and the size of the range, the rank. This relation is known as the dimension theorem, and is written as

nullity(T) + rank(T) = dim(V)

It is important to note that the dimension of V must be finite, otherwise the theorem is meaningless. There are two more terms, generic for mappings, that you should understand. A "one-to-one" mapping means that for each element in V there is a unique element in W it maps to; in other words, no two elements of V map to the same element of W. A mapping is "onto" if every element in W is the result of T(x) for some x in V. These terms are very closely related to the rank and nullity of the transformation. Specifically,

  1. T is one-to-one iff N(T) = {0} or nullity(T) = 0
    Proof: By the dimension theorem
    dim(V) = nullity(T) + rank(T) = 0 + rank(T) = dim(R(T))
    and because the dimension of the range space is equal to the dimension of the source space, and they are over the same field, they are of equal "size" and T is one-to-one.
  2. T is onto iff R(T) = W or rank(T) = dim(W)
    Proof: because dim(R(T)) = dim(W) and R(T) is a subspace of W, R(T) = W and T is onto
  3. If dim(V) = dim(W) then T is onto, T is one-to-one, nullity(T) = 0, and rank(T) = dim(V) are all equivalent.
    Proof: If nullity(T) = 0, then rank(T) = dim(V) by the dimension theorem. Because dim(V) = dim(W), rank(T) = dim(W) and by the previous two theorems, the other two results follow.

The importance of these properties is realized in situations where it is easy to find either only the range or only the null space, because it allows properties of one to be derived from the other. It is also generally easiest to determine if a transformation is one-to-one and/or onto by examining the rank and nullity.

Linear Transformations and Matrices

A final fact to show how useful linear transformations are is how closely they are related to matrices. If you have ever used a matrix multiplication you have used a linear transformation. Specifically, multiplying an element of a vector space by an m x n matrix is equivalent to a linear transformation from an m-dimensional vector space to an n-dimensional one. However, because vectors and matrices aren't necessarily multipliable, think of the polynomials, we must first convert the vector into a suitable form. This is done with the use of an ordered basis. An ordered basis is similar to a regular basis, except it is no longer a set because its order is maintained. In order to multiply the vector we must change it into the form of an n-tuple. This is done using the equation

x ∈ V
β = {u1, u2, ..., un} is an ordered basis for V
x = Σ ai*ui from i = 0 to n
[x]β = [a1, a2, ..., an]

[x]β is known as the coordinate vector of x in basis β. This vector can then be multiplied with a matrix in order to transform it. The coordinate vector is then converted back into a normal vector by the "x = ..." formula above.

The only step remaining is to figure out what matrix to use. Consider the following.

T: V->W
β = {u1, u2, ..., un} is an ordered basis of V
γ is an ordered basis of W
A = [T]βγ is the matrix representation of the transformation

[T]βγ should be written as T in brackets with β subscripted and γ superscripted above it. In order to find A we transform each element of β using T and then convert the result into coordinate vectors using γ. The resulting vectors are arranged in the matrix with each vector making up a column. They should be entered in the same order as their source in the bases.

An example will help to illustrate the process. Consider the transformation and bases

T(a, b) = (a+2b, 3a-b, 2a+2b)
β = {(1, 0), (0, 1)}
γ = {(1, 0, 0), (0, 1, 0), (0, 0, 1)}

Then find the transformation of β in γ and plug them into A

T(1, 0) = (1, 3, 2)
T(0, 1) = (2, -1, 2)
    / 1   2 \
A = | 3  -1 |
    \ 2   2 /

This method will work no matter the complexity of the vector space or transformation. Once your transformation is a matrix there are a number of things you can do with it. Computers normally deal with matrices better than having to run the transformation by hand each time, so it can speed up your program. In the case of transformations between two vector spaces of the same dimension, the matrix can be raised to a power to represent applying the same transformation a number of times. There are also many optimizations, such as diagonalization, that can be done using matrices.

Even all of this is still not the whole story of linear transformations, however this should be enough to show you how they work and what they can do. For further knowledge I recommend a linear algebra book or class. For constructing this w/u I used Linear Algebra by Friedberg as a reference, which is the textbook used at Cal.