In this post, we shall make a few experimental observations about the -spectral radius and dimensionality reduction that given evidence that the -spectral radius and dimensionality reduction will eventually have a deep mathematical theory behind it and also become more applicable to machine learning. I have not been able to prove these empirical observations, so one should consider this post to be about experimental mathematics. Hopefully these experimental results will eventually be backed up by formal proofs. The reader is referred to this post for definitions.

This post is currently a work in progress, so I will be updating this post with more information. Let be the collection of all tuples where is not nilpotent. Given matrices , define a function by letting

Given matrices , we say that is an -spectral radius dimensionality reduction of if locally maximizes . We say that an -spectral radius dimensionality reduction is optimal if globally maximizes .

We say that is projectively similar to if there is some and some where for Let denote the equivalence relation where we set precisely when and are projectively similar. Let denote the equivalence relation with respect to

Observation: (apparent unique local optimum; global attractor) Suppose that are -SRDRs of obtained by gradient ascent on the domain . Then are usually projectively similar.

Example: The above observation does not always hold as may have multiple non-projectively similar local maxima. For example, suppose that for where each and each is a -matrix. Then Suppose now that and . Then are usually non-projectively similar local maxima for the function Of course, does not generate the algebra , but there is some neighborhood of where if , then will have at least two non-projectively similar local maxima. This sort of scenario is not as contrived as one may think it is since could represent two clusters where there are few or weak connections between these clusters (for example, in natural language processing, each language could represent its own cluster).

Observation: (preservation of symmetry) Suppose that are Hermitian (resp. real, real symmetric, complex symmetric, positive semidefinite) and is an -SRDR of . Then is usually projectively similar to some tuple where each is Hermitian (resp. real, real symmetric, complex symmetric, positive semidefinite).

Observation: (near functoriality) Suppose that Suppose that generates Let be an -SRDR of . Let be an -SRDR of . Let be an -SRDR of . Then is usually not projectively similar to , but is usually close to

Observation: (complex supremacy) Suppose that . Let be the restriction of to the domain . Then gradient ascent for the function usually produces many local maxima which are not global maxima. Therefore, in order to maximize , it is best to first extend to the complex valued function and obtain a local maximum for . One should then be able to find some which is projectively similar to ; in this case will be a local maximum for that is optimal or at least higher than a local maximum that one would obtain by gradient ascent on .

If , then define a function by setting

We say that is an -SRDR projector if is a local maximum for . We say that is a strong -SRDR projector if is an -SRDR.

Observation: If is an -SRDR projector, then there is usually a constant with and a projection of rank with -SRDRs projectors are usually strong -SRDR projectors. If , then set precisely when there are where . If and are -SRDR projectors, then we typically have In fact, for some and if we set , then for some .

We say that a -SRDR projector is constant factor normalized if . If are two constant factor normalized -SRDR projectors of , then and where if we set , then

Observation: If is an -SRDR of , then there is usually a strong -SRDR projector where for

Observation: If are self-adjoint, then whenever is a constant factor normalized -SRDR projector of , we usually have for some -SRDR projector of with and where . Furthermore, if is another -SRDR projector of with , then is a unitary matrix, and , and If is the inclusion mapping, and is the orthogonal projection, then is an -SRDR projector.

Observation: If are real square matrices, then whenever is a -SRDR projector of , we usually have for some -SRDR projector of where are real matrices. Furthermore, if are real constant factor normalized -SRDR projectors of , then there is a real matrix where .

Observation: If are real symmetric matrices, then whenever is a -SRDR projector of , we usually have for some -SRDR projector of where are real matrices with . In fact, if are real constant factor normalized -SRDR projectors of with , there is a real orthogonal matrix with .

Observation: If are complex symmetric matrices, then whenever is a -SRDR projector of , we usually have for some -SRDR projector of and where . Furthermore, if are constant factor normalized -SRDR projectors of , and , then .

Application: Dimensionality reduction of various manifolds. Let denote the quotient complex manifold where we set precisely when or . One can use our observations about LSRDRs to perform dimensionality reduction of data in , and we can also use dimensionality reduction to map data in and to data in and respectively.

Let be the collection of all positive definite rank-1 real matrices. Let be the collection of all positive definite rank-1 complex matrices. Let be the collection of all symmetric rank-1 complex matrices. The sets can be put into a canonical one-to-one correspondence with respectively. Suppose that and is an -SRDR of . If , then , and every element of can be considered as a dimensionality reduction of . Similarly, if , then , and if , then .

-SRDRs can be used to produce dimensionality reductions that map subsets to where .

Suppose that . Then let for . Then there is some subspace where if are the inclusion and projection mappings, then is a constant factor normalized -SRDR projector for . In this case, is a dimensionality reduction of the tuple .

Suppose that . Then let for . Suppose that is a constant factor normalized -SRDR projector with respect to . Then is a dimensionality reduction of

One may need to normalize the data in the above paragraphs before taking a dimensionality reduction. Suppose that . Let be the mean of . Let . Suppose that is a dimensionality reduction of . Then is normalized dimensionality reduction of .

Observation: (Random real matrices) Suppose that and are random real matrices; for simplicity assume that the entries in are independent, and normally distributed with mean and variance . Then whenever , we have . In particular, .

Observation: (Random complex matrices) Suppose now that and are random complex matrices; for simplicity assume that the entries in are independent, Gaussian with mean , where the real part of each entry in has variance and the imaginary part of each entry in has variance . Then and .

Observation: (Random permutation matrices) Suppose that and is the standard irreducible representation. Let be selected uniformly at random. Then and

Observation: (Random orthogonal or unitary matrices): Suppose that , and let be a collection of unitary matrices select at random according to the Haar measure or let be a collection of orthogonal matrices selected at random according to the Haar measure. Then and .

Suppose that is a complex matrix with eigenvalues with . Then define the spectral gap ratio of as

Observation: Let . Let be random matrices according to some distribution. Suppose that are matrices where is positive semidefinite (for example, could be a constant factor normalized LSRDR projector of ). Let . Suppose that the eigenvalues of are with Then the spectral gap ratio of is much larger than the spectral gap of ratio random matrices that follow the circular law. However, the eigenvalues with the dominant eigenvalue removed still follow the circular law (or semicircular or elliptical law depending on the choice of distribution of ). The value will be approximately . The large spectral gap ratio for makes the version of gradient ascent that we use to compute LSRDRs work well; the power iteration algorithm for computing dominant eigenvectors converges more quickly if there is a large spectral gap ratio, and the dominant eigenvectors of a matrix are needed to calculate the gradient of the spectral radius of that matrix.

Observation: (increased regularity in high dimensions) Suppose that are -complex matrices. Suppose that . Suppose that are -SRDRs of trained by gradient ascent while are -SRDRs of . Then it is more likely for to be projectively similar to than it is for to be projectively similar to . In general, will be better behaved than .

**Quaternions**

We say that a complex matrix is a quaternionic complex matrix if and whenever . Suppose that each is an quaternionic complex matrix for . Suppose that is an even integer with and suppose that is a -SRDR of . Then there is often some and where if we set for , then are quaternionic complex matrices. Furthermore, if is an -SRDR projector, then there is some LSRDR projector where but where are quaternionic complex matrices.

We observe that if are quaternionic complex matrices and are rectangular matrices with , then the matrix will still have a single dominant eigenvalue with a spectral gap ratio significantly above 1 in contrast with the fact that each have spectral gap ratio of since the eigenvalues of come in conjugate pairs. In particular, if are matrices obtained during the process of training , then the matrix will have a spectral gap ratio that is significantly above ; this ensures that gradient ascent will not have any problems in optimizing .

**Gradient ascent on subspaces**

Let be a natural number. Let be the set of all -Hermitian matrices, complex symmetric matrices, and complex quaternionic matrices. Suppose that . Suppose that . Then let be a tuple such that is locally maximized. Suppose that is a tuple where is locally maximized. Then we usually have . In this case, we would say that is an alternate domain -spectral radius dimensionality reduction (ADLSRDR).

**Other fitness functions**

Let be a finite dimensional real or complex inner product space. Suppose that is a Borel probability measure with compact support on . Now define a fitness function by letting .

If is a Borel probability measure with compact support on , then define a fitness function by letting .

If are real inner product spaces, then will generally be disconnected spaces, so gradient ascent will have no hope of finding an or that maximizes respectively simply by using gradient ascent with a random initialization. However, whenever is positive definite, and gradient ascent seems to always produce the same local optimum value for the function when is initialized to be a positive semidefinite matrix. If are linear operators that maximize , then we generally have .

If is a complex inner product space, then gradient ascent seems to always produce the same local optimal value for the fitness function , and if are matrices that maximize the fitness function , then we generally have for some . The local optimal operator will generally have positive eigenvalues for some , but the operator itself will not be positive semidefinite. If and is supported on , then for each operator that optimizes , there is some where , and is up to sign the same matrix found by real gradient ascent with positive definite initialization.

The problem of optimizing is a problem that is useful for constructing MPO word embeddings and other objects for machine learning similar to LSRDR dimensionality reductions. While tends to have a unique up to constant factor local optimum, the fitness function does not generally have a unique local optimum; if are complex inner product spaces, then has few local optima in the sense that it is not unusual to there to be a constant where such that locally optimize . In this case, the matrix that globally optimizes is most likely also the matrix that has the largest basin of attraction with respect to various types of gradient ascent.

MPO word embeddings: Suppose that is a matrix embedding potential, is a finite set, and . Suppose that is are MPO word pre-embeddings with respect to the matrix embedding potential . Then there are often constants for where for . Furthermore, there is often a matrix along with constants for where is a real matrix for .

Even though I have published this post, this post is not yet complete since I would like to add information to this post about functions related to and which satisfy similar properties to the functions of the form .