05 Aug 2021 -

Proof of Cayley-Hamilton using the Zariski Topology

Cayley-Hamilton is a well known theorem typically introduced in first year linear algebra classes. The proof in these classes are usually some variant of using the Jordan normal form, whose construction is a bit disturbing and its really not interesting. Even historically it is a bit messed up: Hamilton initially proved it for only linear functions on his quaternions, this is the 4 by 4 case of Cayley-Hamilton over R. Later Cayley went and proved it for the 2 by 2 and 3 by 3 cases by literally computing by hand(he really only published the 2 by 2 case and asked the reader to believe he verified the 3 by 3 case). However it was Frobenius who actually proved this theorem, in a 63 page article on it. Unfortunately the theorem is not named after Frobenius who really deserves the credit. This post will present a more interesting and less tedious approach to proving Cayley-Hamilton.

The Cayley-Hamilton theorem says that a matrix satisfies its own characteristic polynomial, i.e for each \(A \in M_n (K)\) (n by n matrices over field K) we have the characteristic polynomial \(\phi_A (t) = \text{det}(A-t1)\) and that \(\phi_A (A)=0\) . Note that we cannot simply plug in \(A\) into the determinant, since \(\phi_A (A) \in M_n (K)\) but \(\text{det}(A-A1) \in K\) so this doesnt even make sense. First we shall prove it over Euclidean spaces. I saw this first in Artin's Algebra, where a version of this was a guided exercise. The idea is that we identify the matrix space \(M_n (\mathbb{C})\) with the Euclidean space \(\mathbb{C}^{n\times n} \) , and then give it the Euclidean topology. We then observe that Cayley-Hamilton is true for all diagonal matrices. As a result this is true for diagonalizable matrices: the characteristic polynomial remains the same as determinant does not change under conjugation, and polynomial of the conjugate is conjugate of the polynomial. The final part, which is detailed below, shows that the diagonalizables are dense in the space and hence the extension to the whole space follows from that.

Cayley-Hamilton with Euclidean topology

The map that sends \(A \mapsto \phi_A(A)\) is clearly continuous, as all of its coefficients are polynomials in entries of of \(A\) . Now if we have \(f,g:X \longrightarrow Y\) with \(f\) continuous and \(Y\) Hausdorff(which all metric spaces, including Euclidean spaces, are) and f and g agree on a dense subset then they agree everywhere. So if the diagonalizables are dense in the space, we will be done. The trick to show dense is to consider a subset of the diagonalizables, the matrices with \(n\) distinct eigenvalues. Its easy to see why this has to be dense visually, we only need to "slightly" change a matrix to make it have distinct eigenvalues. To rigorize this we upper-triangularize an arbitrary matrix and observe that the diagonals are the eigenvalues. Now inside every open ball of this matrix we have an element that is simply modifying the diagonals by a small number that makes the eigenvalues distinct. So every open set contains a diagonalizable and we have shown its dense. Since \(\phi_A (A)=0\) on the dense subset of diagonalizables, it is \(0\) on the whole space and we are done. Note that this also implies Cayley-Hamilton for real matrices as they are a subset of the complex matrices.

Cayley Hamilton with Zariski topology.

Now this is a very neat argument to show Cayley-Hamilton, but one of the immediate weaknesses one might observe is that we used the analytic properties of the complex numbers to show this; we cannot use this exactly on arbitrary fields. However it turns out, that we can use something along these lines. As the title hints, we are going to be using the Zariski topology on \(M_n(K)\) . We are going to assume \(K\) is algebraically closed throughout as every field is contained in its algebraic closure so proving it in this case is enough.

Lets define this topology on the affine space \(K^n\) . Consider the ring \(K[x_1 \dots x_{n}]\) , polynomials in \(n\) variables. We identify the inputs of the polynomial with our \(K^{n}\) . Now for a subset \(E\subset K[x_1 \dots x_{n}]\) we say its zero locus is \(V(E) = \{x\in K^{n}: f(x)=0 \forall f\in E\}\) . These are the algebraic sets of \(K^{n} \) and we will make these the closed sets of our zariski topology. Note that \(V(E) = V(I)\) where \(I\) is the ideal generated by \(E\) so we really only need to look at zero locii of ideals. These closed sets satisfy the topology axioms and I leave this to the reader to verify it. An example would be to consider this zariski topology on \(K^1\) , then the non-trivial closed sets are precisely the finite subsets of the space. Now the reason we like this topology is that it makes polynomial functions continuous. To prove this we consider a polynomial map \(g: K^n\longrightarrow K^m\) , and we look at the preimage of a closed set \(V(I)=\{x\in K^m: f(x) = 0\forall f\in I\}\) . This is \(g^{-1}(V(E)) = \{y\in K^n: f(g(y)) = 0\forall f\in I\}\) which is also a closed set, showing its continuous. For one final thing about this topology, any two non-trivial open sets intersect. This is easy to show as \(V(I_1)^c \cap V(I_2)^c = (V(I_1)\cup V(I_2))^c = V(I_1 I_2)^c\) which is none empty if the original ones where non-empty. In particular this means all open sets are dense in the Zariski topology.

Now that we have defined the Zariski topology, we can proceed in almost the same way as before. Unfortunately the Zariski topology is not Hausdorff, so an exact copy of the previous argument wont work, however in this case it is very easy to work around that. So first consider \(\{A\in M_n (K): \phi_A(A) = 0\}\) , this set is closed. Now clearly this set contains the diagonalizable matrices, and so if the diagonalizable matrices were dense in the space then its closure would be the whole space and hence any closed set containing it would also be the whole space. So we just have to once again prove that the diagonalizable matrices are dense, and we do this by again considering the matrices with distinct eigenvalues. The way to characterize this would be that the characteristic poly of those matrices have no repeated root, that is to say the discriminant is non zero. So this subset is precisely \(\{A \in M_n (K): \Delta \phi_A \neq 0\}\) and since the discriminant is a polynomial this is by definition an open set. But we said open sets were dense, so that means the diagonalizables are dense and we are done!

We add a final note to say that the Euclidean case was a specials case of this, the Zariski topology on Euclidean space is coarser than the Euclidean topology. That is to say every closed set in the zariski topology is also closed in the Euclidean topology so dense subsets under the Euclidean topology would also be dense in the Zariski topology.