Diagonalization is the process or act of turning your common garden matrix into a diagonal matrix. Trust me you, want to do this: diagonal matrices are groovy. But first, a little bit on how you go about this wonderous act.
What we are looking for
Let's consider our n x n matrix A as the matrix of a linear map α with respect to a basis of vectors e1,e2,...,en.
A is diagonal, means that the ith column is all zeroes, apart from the ith row. Since A represents α, the ith column of A is the image of the ith basis vector. The rows of the column are the components with respect to the basis we are using of α(ei). So what we really want is to find non-zero vectors x such that α(x) = λx , where λ is a constant in the field over which our vector space is defined. If we are lucky then we will be able to find a basis of such vectors, the matrix of α with respect to that basis will be diagonal, its elements will be the λ we will have found.
If we have a λ we are trying to find x such that α(x)=λx
In matrix terms we write this as (A-λI)x=0, where I is the nxn identity matrix.
This is a set of simultaneous equations we can solve, using Gaussian elimination or other methods.
The only question that remains is how to find the λs. If (A-λI) is invertible then the only solution is x=0, but we are only interested in non zero vectors, so we need (A-λI) to be singular. A bit of linear algebra will tell you this happens when the determinant of (A-λI) is 0.
The first step of diagonalization is to find all the λ satisfying det(A-λI)=0. If A is an n x n matrix, this means finding the roots of a polynomial in λ of degree n, which isn't particularly easy in itself for large n.
This polynomial is the characteristic polynomial of the matrix, its roots are the eigenvalues of the matrix, the vectors we find by solving (A-λI)x=0 are the eigenvectors. This is where things may go pear-shaped. If we have n distinct eigenvalues then we're ok as each distinct eigenvalues will give us linearly independant eigenvectors.
Proof : suppose e1 and e2 eigenvectors for distinct eigenvalues λ1,λ2 respectively. e1 and e2 are linearly dependant if one of them is zero or if there is a scalar μ such that e1 = μ e2. The first possibility does not occur, as we require our eigenvectors to be non null.
e1 = μ e2 => Ae1= μλ2e2=λ2e1
But e1 is eigenvector for eigenvalue λ1 so Ae1=λ1e1 ! Hooray, a contradiction. Eigenvectors for distinct eigenvalues are therefore linearly independant
If we don't have n disctinct eigenvalues there could be 2 reasons for this:
- Some of the eigenvalues have multiplicities greater than 1. We may be alright in this case, as if (A-λI)x=0 has a solution space of the same dimension as the multiplicity of the eigenvalue, we will have enough eigenvectors. To sum up, if the dimension of the eigenspaces are equal to the multiplicities of their eigenvalues we will still be able to diagonalize
- The other case is when our field does not have algebraic closure: for example, working in the reals λ2=-1 has no solution. When this happens we have no hope of diagonalizing : we are missing out eigenvalues. On good days though we work with the complexes as the fundamental theorem of algebra holds for them.
When you have got this far the beast reveals its true nature : Either you don't have enough linearly independant eigenvectors to form a basis, in which case you try and drown your tears in the Jordan Normal Form or you have, in which case you form a basis and have a huge party. The elements of the diagonalized form of your matrix are the eigenvalues found earlier.
But why bother?
Part of the point of diagonalization is that diagonal matrices are nice to work with : multiplying, inverting (when possible) etc are all nice and easy with diagonal matrices. In physics the eigenvalues of an operator will usually have some physical significance. Lastly I'd like to show another application of diagonalization :
solving coupled linear ordinary differential equations. Consider 2 functions x and y of t such that:
x'(t) = ax +by +f(t)
y'(t) = cx +dy +g(t)
a,b,c,d are constants, f and g are functions of t.
In some cases we may be able to spot a clever linear combination of the 2 equations that gives us decoupled equations (ie. one with only x and one with only y). If we can't then we write the equation in this form:
/x'\= /a b\ /x\ +/f\
\y'/ \c d/ \y/ \g/
and let A = /a b\
Once you see a matrix of course, the first thing you do is try and diagonalize it. Let's assume you can, and we have A=SDS-1
, for some invertible matrix S, and diagonal matrix D. If we let
then multiplying the whole equation by S gives
/u'\ = D /u\ + S /f\
\v'/ \v/ \g/
But since D is diagonal, then what we really have is
u'(t) = λ1
u(t) + r(t)
v'(t) = λ2
v(t) + s(t)
After we have marvelled at the beauty of these equations we can solve them in the usual way and use the transformation again to obtain x and y. You can obtain local approximations to non linear problems by linearizing
them, usually at critical points
because that is where all the fun stufff happens, and applying the method used above.