Jordan forms — grow your own!
I remember how infuriating it was to read about the Jordan form when I was first learning linear algebra. Nobody ever wrote down what the algorithm was! They all blabbed on about generalised eigenvectors, generalised eigenspaces, minimal polynomial, algebraic multiplicity,and geometric multiplicity, and kept beating around the bush with all the abstraction without ever writing down a complete and concise procedure on how to put a matrix in Jordan form. Eventually, after reading at least three linear algebra textbooks (that actually went deep enough into the subject to talk about the Jordan form) I managed to decipher what the algorithm should be. I knew that it should be almost the same as the diagonalisation algorithm for a matrix, but with a little twist and a few more steps. It took a little effort. Eventually, I got it.
My purpose here is to save you the deciphering effort and to just tell you what the damn algorithm is. Plain language, or as plain as linear algebra ever gets. Warning: the algorithm that I present is the stupid algorithm. It's only meant to be used by hard-working adult humans and is not particularly recommended for computers or small children. I'm sure there are smarter numerical methods for computing Jordan forms if this is the sort of thing you need to do for a living (although matrices that cannot be diagonalised and require Jordan forms instead are relatively rare in Nature). In fact, one such numerical method can be found here:
http://portal.acm.org/citation.cfm?id=355912
The stupid algorithm I will be describing instead plays to this tune:
- Compute the characteristic polynomial.
- Subtract a variable down the diagonal of your matrix.
- Compute the determinant of resulting matrix
- Simplify resulting polynomial if necessary. This is the
characteristic polynomial.
- Factor the characteristic polynomial.
- A difficult problem in general. Consult polynomial factoring
algorithms.
- The roots of the characteristic polynomial are the
eigenvalues of the matrix.
- Find the eigenvectors for each eigenvalue.
- Subtract one at a time each of the eigenvalues from the previous step
down the diagonal of the matrix.
- Row-reduce the resulting
matrices.
- Interpet row-reduced matrices as homogeneous systems of linear equations where all the
constants on the right are zero, and write down as
many linearly independent solutions as possible. These are the
eigenvectors.
- Find generalised eigenvectors.
- If there aren't as many eigenvectors as size of the matrix, proceed with
this step.
- Again subtract each eigenvalue down the diagonal of the matrix, and
augment the resulting matrix on the right with the corresponding
eigenvector. Row-reduce again.
- Interpret the row-reduced matrix as a system of inhomogeneous
linear equations and solve. If there is more than one free variable,
cut down the degrees of freedom by specifying that the solution must be
orthogonal to one or more eigenvectors belonging to this eigenvalue
(use enough orthogonality conditions until there is only one free
variable.) A solution to this problem is a generalised eigenvector.
- Repeat step 4.c if necessary, augmenting the same matrix instead
with the solution just found. Stop when there are as many solution
vectors as size of matrix, or if no solution to the inhomogeneous system
exists. In the latter case, pick another eigenvector and find its
generalised eigenvectors.
- Form the transition matrix and the Jordan form.
- Put each eigenvector followed by its generalised eigenvectors in the
order they were found as column vectors in a matrix. This is the
transition matrix
- Conjugate the original matrix by the transition matrix (meaning multiply
the original matrix on the left by the inverse of the transition matrix
and on the right by the transition matrix). This is the Jordan
form, which might be the diagonalisation of a matrix in case that
Step 4 was superfluous.
I am going to demonstrate this algorithm below. We will be working throughout with the following nontrivial three (yikes!) examples of sizes 4 × 4, 3 × 3, and 2 × 2 because we are hard-core:
A B C
[ -1 -3 3 0 ] [ -2 -2 2 ] [ -1 -6 ]
[ -2 -1 0 -3 ], [ -2 1 1 ], and [ 2 6 ].
[ -2 0 -1 -3 ] [ -9 -7 7 ]
[ 0 2 -2 -1 ]
Let's get to work. *cracks knuckles*
An ominous melody slowly builds up.
To compute the characteristic polynomial, take your matrix, subtract a variable down the diagonal (traditionally λ or x; I'll use λ) and compute the determinant of the resulting matrix. In this case we are computing the determinants of
[ -1 - λ -3 3 0 ]
A - λI = [ -2 -1 - λ 0 -3 ]
[ -2 0 -1 - λ -3 ],
[ 0 2 -2 -1 - λ ]
[ -2 - λ -2 2 ] [ -1 - λ -6 ]
B - λI = [ -2 1 - λ 1 ], and C - λI = [ 2 6 - λ ].
[ -9 -7 7 - λ ]
"Determinant of a 4 × 4?" I hear you whine, maggot? Oh, boo-hoo! Real Mathematicians™ do this three times before breakfast in the morning. As I once remarked to a colleague, "I am not a pussy" (and you should say it too before going on), so we shall proceed with the computations. Now get down and give me 4!
For matrix A we begin with a Laplace expansion along the first row to obtain
|-1 - λ 0 -3 | |-2 0 -3 | |-2 -1 - λ -3 |
det(A - λI) = (-1 - λ)| 0 -1 - λ -3 | + 3|-2 -1 - λ -3 | +3|-2 0 -3 |.
| 2 -2 -1 - λ| | 0 -2 -1 - λ| | 0 2 -1 - λ|
Now we compute the three minor 3 × 3 determinants that arose.
det(A - λI) = (-1-λ)[(-1 - λ)3 + 6(-1 - λ) -6(1 - λ)] + 3[-2(-1 - λ)2 - 12 + 12] + 3[12 + 2(1 - λ)2 -12]
Most of this cancels to yield, in a pretty factorised form,
det(A - λI) = (-1 - λ)4.
With similar efforts but slightly fewer computations, we find that the other two characteristic polynomials are
det(B - λI) = 8 - 12λ + 6λ2 - λ3,
det(C - λI) = 6 - 5λ + λ2.
Le Step Deux: Factor the characteristic polynomial
A lively French interlude plays.
That is, take the characteristic polynomial and write it as a product of linear factors (which can be done in this case by the fundamental theorem of algebra, right?) so that we may find its roots. The roots of the characteristic polynomial are the eigenvalues of the corresponding matrix. We need these eigenvalues before we can proceed with the Jordan form.
Factoring polynomials is in itself a difficult problem in general (and actually one of the more stupid ways to find eigenvalues). In this case, however, it is actually rather easy. The characteristic polynomial for the A matrix came pre-factored during its computation, and the other two aren't hard to factor. This is because I deliberately chose computationally easy examples to work with so as to highlight the steps in the algorithm and not get bogged down with too many computations. It is not hard to see that
det(A - λI) = (-1 - &lambda)4,
det(B - λI) = (2 - λ)3,
det(C - λI) = (2 - λ)(3 - λ),
so that the eigenvalues are -1 and 2 for matrices A and B correspondingly, while matrix C has the two eigenvalues 2 and 3.
Step The Third: Find the eigenvectors for each eigenvalue
A recognisable and familiar hymn leads us on.
This means, take each eigenvalue, subtract it down the diagonal of the corresponding matrix, and row reduce the resulting matrix using the Gaussian algorithm. Then interpret this row reduced matrix as a homogeneous linear system and find as many linearly independent solution vectors as possible. These vectors are the eigenvectors corresponding to the eigenvalue you subtracted down the diagonal.
For matrix A this is done thusly:
[ 0 -3 3 0 ] [ 0 -3 3 0]
[ -2 0 0 -3 ] R3-R2 [-2 0 0 -3]
A + I = [ -2 0 0 -3 ] --------> [ 0 0 0 0]
[ 0 2 -2 0 ] R4-(2/3)R1 [ 0 0 0 0]
-R1 [ 2 0 0 3 ] [ 2 0 0 3]
-------> [ 0 3 -3 0 ] --------> [ 0 1 -1 0]
-R2 [ 0 0 0 0 ] (1/3)R3 [ 0 0 0 0]
R1<->R2 [ 0 0 0 0 ] [ 0 0 0 0]
Ok, so I did several row operations in one step, and I didn't do them in the strict order that the Gaussian row-reduction algorithm requires. Also, because I like integers, I didn't divide the first row by its leading term in order to get a leading one. These are merely cosmetic enhancements for human eyes. It should nevertheless be clear that except for dividing each row by its leading term, I have already put the matrix into reduced row-echelon form, and that we are not going to be getting any more rows of zeroes.
Now we reinterpret the row reduced matrix as a homogeneous system of linear equations where all the constant terms on the right are zero (we could have done this from the beginning), and see that there should be two linearly independent solutions and that
u1 u2
[ 3 ] [ 0 ]
[ 0 ] [ 1 ]
[ 0 ] [ 1 ]
[-2 ] [ 0 ]
are two such solutions (that is, matrix A + I sends those two vectors into zero when it multiplies them from the left).
In a similar manner, for matrix B we subtract 2 down the diagonal to obtain
[ -4 -2 2 ]
B - 2I = [ -2 -1 1 ]
[ -9 -7 5 ]
and row reduce it to the form
[ 5 0 -2 ]
[ 0 5 -1 ]
[ 0 0 0 ]
where I have again opted for the cosmetic enhancement to not divide each row by its leading term. This homogeneous system only has one linearly independent solution, such as
v1
[ 2 ]
[ 1 ]
[ 5 ]
For matrix C, we first subtract eigenvalues 2 and 3 down the diagonal to obtain
[ -3 -6 ] [ -4 -6]
C - 2I = [ 2 4 ] C - 3I = [ 2 3],
which row-reduce to
[ 1 2 ] [ 2 3 ]
[ 0 0 ] [ 0 0 ]
yielding the eigenvectors
w1 w2
[ 2 ] [ 3 ]
[-1 ] [ -2 ].
Jordanian Step: Find generalised eigenvectors
An apocryphal extended piece fills the room.
For this step, if we still don't have enough eigenvectors (as many eigenvectors as the size of the matrix), then for each eigenvector vk, we solve the inhomogeneous linear systems (M - λI)vk + 1 = vk successively in order. Yes, more Gaussian algorithm, I'm afraid. If the system has a solution, it should have one-dimensional solutions, meaning, one free variable after we row-reduce, or one leading 1 less than the size of the matrix. It might have more, which will be the case only if there is more than one Jordan block with the same eigenvalue. In that case, we impose the further restriction that the solution vk+1 must be orthogonal to one or more of the eigenvectors belonging to this eigenvalue.
The system might also have no solution, which means that this particular eigenvector doesn't have more generalised eigenvectors. Basically, what happens is that each eigenvector will have a chain of generalised eigenvectors, and all the chains of generalised eigenvectors taken together will give us as many generalised eigenvectors as the size of the matrix. If a particular chain has stopped yielding eigenvectors, it is time to look at another eigenvector and see how many generalised eigenvectors it yields. We will do all of this with our ongoing examples.
All of this step and its various substeps may be superfluous if the matrix is already diagonalisable (has as many eigenvectors as its size).
We begin with matrix A. We have found in the previous step two eigenvectors, dubbed u1 and u2 Let us begin by looking for generalised eigenvectors for u1.
This amounts to solving the following inhomogeneous linear system. Observe that I augmented the matrix A + I on the right by the eigenvector u1 and added another row which says that the solution has to be orthogonal to the eigenvector u1 (I could have also chosen the solution to be orthgonal to u2 instead.)
[ 0 -3 3 0 | 3 ]
[ -2 0 0 -3 | 0 ]
[ -2 0 0 -3 | 0 ]
[ 0 2 -2 0 | -2 ]
[ 3 0 0 -2 | 0 ]
After computations that look awfully familiar, this row-reduces to
[ 1 0 0 0 | 0 ]
[ 0 1 -1 0 | -1 ]
[ 0 0 0 1 | 0 ] .
[ 0 0 0 0 | 0 ]
[ 0 0 0 0 | 0 ]
Observe that there is only one free variable, the third one from left to right. From this we can read that
u3
[ 0 ]
[ 1 ]
[ 2 ]
[ 0 ]
is a generalised eigenvector.
We keep looking for generalised eigenvectors, this time augmenting the matrix with u3. But the resulting matrix
[ 0 -3 3 0 | 0 ]
[ -2 0 0 -3 | 1 ]
[ -2 0 0 -3 | 2 ]
[ 0 2 -2 0 | 0 ]
[ 3 0 0 -2 | 0 ]
row reduces to
[ 1 0 0 0 | 0 ]
[ 0 1 -1 0 | 0 ]
[ 0 0 0 1 | 0 ] ,
[ 0 0 0 0 | 1 ]
[ 0 0 0 0 | 0 ]
which is an inconsistent set of equations with no solution (the fourth row says that 0 = 1). Therefore, u1 only has the generalised eigenvector u3 and it follows that the last generalised eigenvector must come from u2.
We augment the same matrix with u2, and for variety's sake, we now require the solution to be orthgonal to u2 instead of u1:
[ 0 -3 3 0 | 0 ]
[ -2 0 0 -3 | 1 ]
[ -2 0 0 -3 | 1 ].
[ 0 2 -2 0 | 0 ]
[ 0 1 1 0 | 0 ]
The row-reduced matrix this time is
[ 2 0 0 3 | -1 ]
[ 0 1 0 0 | 0 ]
[ 0 0 1 0 | 0 ] ,
[ 0 0 0 0 | 0 ]
[ 0 0 0 0 | 0 ]
yielding the solution
u4
[ 1 ]
[ 0 ]
[ 0 ].
[ -1 ]
Since that gives us already four generalised eigenvectors total, there is no need to check if u4 will give us another eigenvector, because we know it won't. This concludes step 4 for matrix A.
Matrix B only has one eigenvector, so we know that there must be two generalised eigenvectors for this one eigenvector. Also, because there is only one eigenvector for this eigenvalue, there is no need to impose orthogonality conditions.
Proceeding as before, we augment the matrix B - 2I by the vector v1 on the right (look back to Step the Third), yielding
[ -4 -2 2 | 2 ]
[ -2 -1 1 | 1 ],
[ -9 -7 5 | 5 ]
and the corresponding row reduction is
[ 5 0 -2 | -2 ]
[ 0 5 -1 | -1 ].
[ 0 0 0 | 0 ]
A possible solution is
v2
[ 2 ]
[ 1 ] ,
[ 6 ]
which we use again to augment the matrix B - 2I like so:
[ -4 -2 2 | 2 ]
[ -2 -1 1 | 1 ],
[ -9 -7 5 | 6 ]
reducing to
[ 5 0 -2 | -1 ]
[ 0 5 -1 | -3 ]
[ 0 0 0 | 0 ]
and suggesting the solution
v3
[ 3 ]
[ 1 ] .
[ 8 ]
Since that gives us three generalised eigenvectors for B of size three, we are done with the Jordanian step for the matrix B.
Since matrix C is of size two and has two ordinary eigenvectors, there is no need to perform this step on this matrix.
Grand Finale: Form transition matrix and the Jordan form
Cue dramatic music!
The hard work is done. How much of that did you follow? We can enjoy our dessert now, which simply consists of putting the (generalised) eigenvectors we found above into a pretty matrix as columns, taking care to put the generalised eigenvectors in the order that we found them following their ordinary eigenvector. This is the transition matrix. Then we conjugate the original matrix by this transition matrix (multiply the original matrix by the transition matrix on the right and by the inverse of the transition matrix on the left), and this is the Jordan form we sought.
The transition matrix for A is
u1 u3 u2 u4
[ 3 0 0 1 ]
U = [ 0 1 1 0 ]
[ 0 2 1 0 ]
[-2 0 0 -1 ]
where u1 is preceded by u3 and u2 by u4 because u3 is a generalised eigenvector belonging to u1 and u4 belongs to u2. The Jordan form is therefore
JA = U-1AU
or
[ -1 1 0 0 ]
[ 0 -1 0 0 ]
JA = [ 0 0 -1 1 ].
[ 0 0 0 -1 ]
Observe that there are two Jordan blocks, one for each eigenvector, and the lone eigenvalue -1 runs down the diagonal.
For B we find the transition matrix to be
v1 v2 v3
[ 2 2 3 ]
V = [ 1 1 1 ]
[ 5 6 8 ]
where again observe that we put the two generalised eigenvectors following their ordinary eigenvector in the order that we found them. The Jordan form is JB = V-1BV or
[ 2 1 0 ]
JB = [ 0 2 1 ] .
[ 0 0 2 ]
One Jordan block because B only had one eigenvector.
The last transition matrix is
w1 w2
[ 2 3 ]
W = [-1 -2 ],
and the Jordan form JC = W-1CW is
[ 2 0 ]
JC = [ 0 3 ].
Two trivial Jordan blocks because there were two eigenvectors, and the matrix is in fact diagonal because we didn't have to worry about generalised eigenvectors for this one.
Actually, not many people, to be frank. Numerical analysts have little use for the algorithm that I presented above, because as I said, they have better methods. In fact, the Jordan form is a bit of an artificial theoretical gadget, because practically all matrices can be diagonalised and don't need the Jordanian step above of finding generalised eigenvectors. Nothing wrong with theoretical gadgets, except that fewer loons are interested in them.
The Jordan form, although useful, is of more limited scope than the ordinary diagonalised form that most matrices have. If there is indeed a naturally-occuring matrix arising from practical problem out there that needs a Jordan form, (for example, some difference equations, discrete cousins of differential equations, may give rise to matrices that are not diagonalisable), then there is something very special and particular about this situation that merits very careful analysis.
So why did we go through all the above work with three different matrices, if it's all mostly useless? Because people have asked me how to compute Jordan forms by hand, and this writeup is my answer. Because theory is pretty. And because we can, and therefore we should.