Empirical observations of L_{2,d} spectral radii and other functions

In this post, we shall make a few experimental observations about the L_{2,d}-spectral radius and dimensionality reduction that given evidence that the L_{2,d}-spectral radius and dimensionality reduction will eventually have a deep mathematical theory behind it and also become more applicable to machine learning. I have not been able to prove these empirical observations, so one should consider this post to be about experimental mathematics. Hopefully these experimental results will eventually be backed up by formal proofs. The reader is referred to this post for definitions.

This post is currently a work in progress, so I will be updating this post with more information. Let M_d(\mathbb{C})^{r++}\subseteq M_d(\mathbb{C})^r be the collection of all tuples (A_1,\dots,A_r) where A_1\otimes\overline{A_1}+\dots+A_r\otimes\overline{A_r} is not nilpotent. Given matrices A_1,\dots,A_r\in M_n(\mathbb{C}), define a function L_{A_1,\dots,A_r;d}:M_d(\mathbb{C})^{d++}\rightarrow\mathbb{R} by letting L_{A_1,\dots,A_r;d}(X_1,\dots,X_r)=\frac{\rho(A_1\otimes\overline{X_1}+\dots+A_r\otimes\overline{X_r})}{\rho(X_1\otimes\overline{X_1}+\dots+X_r\otimes\overline{X_r})^{1/2}}.

Given matrices A_1,\dots,A_r\in M_d(\mathbb{C}), we say that (X_1,\dots,X_r)\in M_d(\mathbb{C})^r is an L_{2,d}-spectral radius dimensionality reduction of (A_1,\dots,A_r) if (X_1,\dots,X_r) locally maximizes L_{A_1,\dots,A_r;d}(X_1,\dots,X_r). We say that an L_{2,d}-spectral radius dimensionality reduction is optimal if (X_1,\dots,X_r) globally maximizes L_{A_1,\dots,A_r;d}(X_1,\dots,X_r).

We say that (X_1,\dots,X_r) is projectively similar to (Y_1,\dots,Y_r) if there is some \lambda\in\mathbb{C}^\times and some C\in M_d(\mathbb{C}) where Y_j=\lambda CX_jC^{-1} for 1\leq j\leq r. Let \simeq denote the equivalence relation where we set (X_1,\dots,X_r)\simeq(Y_1,\dots,Y_r) precisely when (X_1,\dots,X_r) and (Y_1,\dots,Y_r) are projectively similar. Let [X_1,\dots,X_r] denote the equivalence relation with respect to \simeq.

Observation: (apparent unique local optimum; global attractor) Suppose that (X_1,\dots,X_r),(Y_1,\dots,Y_r) are L_{2,d}-SRDRs of (A_1,\dots,A_r) obtained by gradient ascent on the domain M_d(\mathbb{C})^{++}. Then (X_1,\dots,X_r),(Y_1,\dots,Y_r) are usually projectively similar.

Example: The above observation does not always hold as L_{A_1,\dots,A_r;d} may have multiple non-projectively similar local maxima. For example, suppose that A_i= \begin{bmatrix} B_i & 0 \\ 0 & C_i\end{bmatrix} for 1\leq i\leq r where each B_i and each C_i is a d\times d-matrix. Then L_{A_1,\dots,A_r;d}(X_1,\dots,X_r)=\max(L_{B_1,\dots,B_r;d}(X_1,\dots,X_r),L_{C_1,\dots,C_r;d}(X_1,\dots,X_r)). Suppose now that L_{B_1,\dots,B_r;d}(B_1,\dots,B_r)>L_{C_1,\dots,C_r;d}(B_1,\dots,B_r) and L_{C_1,\dots,C_r;d}(C_1,\dots,C_r)>L_{B_1,\dots,B_r;d}(C_1,\dots,C_r). Then (B_1,\dots,B_r),(C_1,\dots,C_r) are usually non-projectively similar local maxima for the function L_{A_1,\dots,A_r;d}. Of course, (A_1,\dots,A_r) does not generate the algebra M_n(\mathbb{C}), but there is some neighborhood U of (A_1,\dots,A_r) where if (D_1,\dots,D_r)\in U, then L_{D_1,\dots,D_r;d} will have at least two non-projectively similar local maxima. This sort of scenario is not as contrived as one may think it is since (D_1,\dots,D_r) could represent two clusters where there are few or weak connections between these clusters (for example, in natural language processing, each language could represent its own cluster).

Observation: (preservation of symmetry) Suppose that A_1,\dots,A_r are Hermitian (resp. real, real symmetric, complex symmetric, positive semidefinite) and (X_1,\dots,X_r) is an L_{2,d}-SRDR of (A_1,\dots,A_r). Then (X_1,\dots,X_r) is usually projectively similar to some tuple (Y_1,\dots,Y_r) where each Y_j is Hermitian (resp. real, real symmetric, complex symmetric, positive semidefinite).

Observation: (near functoriality) Suppose that e<d<n. Suppose that A_1,\dots,A_r generates M_d(\mathbb{C}). Let X_1,\dots,X_r be an L_{2,e}-SRDR of A_1,\dots,A_r. Let B_1,\dots,B_r be an L_{2,d}-SRDR of A_1,\dots,A_r. Let C_1,\dots,C_r be an L_{2,e}-SRDR of B_1,\dots,B_r. Then (C_1,\dots,C_r) is usually not projectively similar to (X_1,\dots,X_r), but [C_1,\dots,C_r] is usually close to [X_1,\dots,X_r].

Observation: (complex supremacy) Suppose that (A_1,\dots,A_r)\in M_n(\mathbb{R})^r. Let L_{A_1,\dots,A_r;d}^R:M_d(\mathbb{R})^{++}\rightarrow[0,\infty) be the restriction of L_{A_1,\dots,A_r;d} to the domain M_d(\mathbb{R})^{++}. Then gradient ascent for the function L_{A_1,\dots,A_r;d}^R usually produces many local maxima which are not global maxima. Therefore, in order to maximize L_{A_1,\dots,A_r;d}^R, it is best to first extend L_{A_1,\dots,A_r;d}^R to the complex valued function L_{A_1,\dots,A_r;d} and obtain a local maximum (X_1,\dots,X_r) for L_{A_1,\dots,A_r;d}. One should then be able to find some (Y_1,\dots,Y_r)\in M_d(\mathbb{R})^r which is projectively similar to (X_1,\dots,X_r); in this case (Y_1,\dots,Y_r) will be a local maximum for L^R_{A_1,\dots,A_r;d} that is optimal or at least higher than a local maximum that one would obtain by gradient ascent on L_{A_1,\dots,A_r;d}^R.

If A_1,\dots,A_r\in M_n(\mathbb{C})^{r}, then define a function T_{A_1,\dots,A_r;d}:\{(R,S)\mid(RA_1S,\dots,RA_rS)\in M_d(\mathbb{C})^{r++}\}\rightarrow[0,\infty) by setting T_{A_1,\dots,A_r;d}(R,S)=L_{A_1,\dots,A_r;d}(RA_1S,\dots,RA_rS).

We say that (R,S) is an L_{2,d}-SRDR projector if (R,S) is a local maximum for T_{A_1,\dots,A_r;d}(R,S). We say that (R,S) is a strong L_{2,d}-SRDR projector if (RA_1S,\dots,RA_rS) is an L_{2,d}-SRDR.

Observation: If (R,S) is an L_{2,d}-SRDR projector, then there is usually a constant \lambda\in\mathbb{C}^\times with RS=\lambda 1_d and a projection \pi:\mathbb{C}^n\rightarrow\mathbb{C}^n of rank d with SR=\lambda \pi. L_{2,d}-SRDRs projectors are usually strong L_{2,d}-SRDR projectors. If R_1,R_2\in M_{d,n}(\mathbb{C}),S_1,S_2\in M_{n,d}(\mathbb{C}), then set [[R_1,S_1]]=[[R_2,S_2]] precisely when there are \mu,\nu\in\mathbb{C}^\times where R_2=\mu QR_1,S_2=\nu S_1Q^{-1}. If (R_1,S_1) and (R_2,S_2) are L_{2,d}-SRDR projectors, then we typically have [[R_1,S_1]]=[[R_2,S_2]]. In fact, R_1S_2R_2S_1=\theta 1_d for some \theta\in\mathbb{C}^\times and if we set Q=R_2S_1, then R_2=\mu QR_1,S_2=\nu S_1Q^{-1} for some \mu,\nu\in\mathbb{C}^\times.

We say that a L_{2,d}-SRDR projector is constant factor normalized if RS=1_d. If (R_1,S_1),(R_2,S_2) are two constant factor normalized L_{2,d}-SRDR projectors of (A_1,\dots,A_r), then R_1S_2R_2S_1=1_d and where if we set Q=R_2S_1, then R_2=QR_1,S_2=S_1Q^{-1}.

Observation: If (X_1,\dots,X_r) is an L_{2,d}-SRDR of (A_1,\dots,A_r), then there is usually a strong L_{2,d}-SRDR projector (R,S) where X_j=RA_jS for 1\leq j\leq r.

Observation: If (X_1,\dots,X_r) are self-adjoint, then whenever (R,S) is a constant factor normalized L_{2,d}-SRDR projector of (X_1,\dots,X_r), we usually have [[R,S]]=[[R_1,S_1]] for some L_{2,d}-SRDR projector of (X_1,\dots,X_r) with S_1=R_1^* and where R_1R_1^*=S_1^*S_1=1_d. Furthermore, if [[R_2,S_2]] is another L_{2,d}-SRDR projector of (X_1,\dots,X_r) with S_2=R_2^*,R_2R_2^*=S_2^*S_2=1_d, then Q=R_1S_2 is a unitary matrix, and S_2R_2=S_1R_1, and \text{Im}(R_1)=\text{Im}(R_2)=\ker(S_1)^\perp=\ker(S_2)^\perp. If \iota:\text{Im}(R_1)\rightarrow\mathbb{C}_n is the inclusion mapping, and j:\mathbb{C}_n\rightarrow\text{Im}(R_1) is the orthogonal projection, then [[\iota,j]] is an L_{2,d}-SRDR projector.

Observation: If (A_1,\dots,A_r) are real square matrices, then whenever (R,S) is a L_{2,d}-SRDR projector of (A_1,\dots,A_r), we usually have [[R,S]]=[[R_1,S_1]] for some L_{2,d}-SRDR projector of (X_1,\dots,X_r) where R_1,S_1 are real matrices. Furthermore, if (R_1,S_1),(R_2,S_2) are real constant factor normalized L_{2,d}-SRDR projectors of (A_1,\dots,A_r), then there is a real matrix Q where R_2=QR_1,S_2=R_1Q^{-1}.

Observation: If (A_1,\dots,A_r) are real symmetric matrices, then whenever (R,S) is a L_{2,d}-SRDR projector of (A_1,\dots,A_r), we usually have [[R,S]]=[[R_1,S_1]] for some L_{2,d}-SRDR projector of (X_1,\dots,X_r) where R_1,S_1 are real matrices with R_1=S_1^*. In fact, if (R_1,S_1),(R_2,S_2) are real constant factor normalized L_{2,d}-SRDR projectors of (A_1,\dots,A_r) with R_1=S_1^*,R_2=S_2^*, there is a real orthogonal matrix Q with R_2=QR_1,S_2=R_1Q^{-1}.

Observation: If (A_1,\dots,A_r) are complex symmetric matrices, then whenever (R,S) is a L_{2,d}-SRDR projector of (A_1,\dots,A_r), we usually have [[R,S]]=[[R_1,S_1]] for some L_{2,d}-SRDR projector of (X_1,\dots,X_r) and where R_1=S_1^T. Furthermore, if (R_1,S_1),(R_2,S_2) are constant factor normalized L_{2,d}-SRDR projectors of (A_1,\dots,A_r), and Q=R_1S_2, then QQ^T=1_d.

Application: Dimensionality reduction of various manifolds. Let \mathbb{CQ}^{n-1} denote the quotient complex manifold (\mathbb{C}^n\setminus\{0\})/\simeq where we set \mathbf{x}\simeq\mathbf{y} precisely when \mathbf{x}=\mathbf{y} or \mathbf{x}=-\mathbf{y}. One can use our observations about LSRDRs to perform dimensionality reduction of data in \mathbb{RP}^{n-1}\times(0,\infty),\mathbb{CP}^{n-1}\times(0,\infty),\mathbb{CQ}^{n-1}, and we can also use dimensionality reduction to map data in \mathbb{C}^n and \mathbb{R}^n to data in \mathbb{C}^d and \mathbb{R}^d respectively.

Let \mathcal{A}_n be the collection of all n\times n positive definite rank-1 real matrices. Let \mathcal{B}_n be the collection of all positive definite rank-1 complex matrices. Let \mathcal{C}_n be the collection of all symmetric rank-1 complex matrices. The sets \mathcal{A}_n,\mathcal{B}_n,\mathcal{C}_n can be put into a canonical one-to-one correspondence with \mathbb{RP}^{n-1}\times(0,\infty),\mathbb{CP}^{n-1}\times(0,\infty),\mathbb{CQ}^{n-1} respectively. Suppose that (A_1,\dots,A_r)\in M_n(\mathbb{C})^r and (X_1,\dots,X_r)\in M_d(\mathbb{C})^r is an L_{2,d}-SRDR of (A_1,\dots,A_r). If (A_1,\dots,A_r)\in\mathcal{A}_n^r, then [X_1,\dots,X_r]\cap\mathcal{A}_d^r\neq\emptyset, and every element of [X_1,\dots,X_r]\cap\mathcal{A}_d^r can be considered as a dimensionality reduction of (A_1,\dots,A_r). Similarly, if (A_1,\dots,A_r)\in\mathcal{B}_n^r, then [X_1,\dots,X_r]\cap\mathcal{B}_d^r\neq\emptyset, and if (A_1,\dots,A_r)\in\mathcal{C}_n^r, then [X_1,\dots,X_r]\cap\mathcal{C}_d^r\neq\emptyset.

L_{2,d}-SRDRs can be used to produce dimensionality reductions that map subsets W\subseteq K^n to K^d where K\in\{\mathbb{R},\mathbb{C},\mathbb{H}\}.

Suppose that a_1,\dots,a_r\in\mathbb{C}^n. Then let A_k=a_ka_k^* for 1\leq k\leq r. Then there is some subspace V\subseteq\mathbb{C}^n where if \iota:V\rightarrow\mathbb{C}^n,j:\mathbb{C}^n\rightarrow V are the inclusion and projection mappings, then (\iota,j) is a constant factor normalized L_{2,d}-SRDR projector for (A_1,\dots,A_r). In this case, (j(a_1),\dots,j(a_r)) is a dimensionality reduction of the tuple a_1,\dots,a_r.

Suppose that a_1,\dots,a_r\in\mathbb{C}^n. Then let A_k=a_ka_k^T for 1\leq k\leq r. Suppose that (R,S) is a constant factor normalized L_{2,d}-SRDR projector with respect to (A_1,\dots,A_r). Then (S(a_1),\dots,S(a_r)) is a dimensionality reduction of (a_1,\dots,a_r).

One may need to normalize the data a_1,\dots,a_r\in K^r in the above paragraphs before taking a dimensionality reduction. Suppose that (b_1,\dots,b_r)\in K^r. Let \mu=(b_1+\dots+b_r)/r be the mean of (b_1,\dots,b_r). Let a_k=b_k-\mu. Suppose that (T(a_1),\dots,T(a_r)) is a dimensionality reduction of (a_1,\dots,a_r). Then (T(a_1)+\mu,\dots,T(a_r)+\mu) is normalized dimensionality reduction of (b_1,\dots,b_r).

Observation: (Random real matrices) Suppose that d<<n and U_1,\dots,U_r\in M_n(\mathbb{R}) are random real matrices; for simplicity assume that the entries in U_1,\dots,U_r are independent, and normally distributed with mean 0 and variance 1. Then whenever X_1,\dots,X_r\in M_d(\mathbb{C}), we have \frac{\rho(U_1\otimes X_1+\dots+U_r\otimes X_r)}{\rho(X_1\otimes\overline{X_1}+\dots+X_r\otimes\overline{X_r})^{1/2}\sqrt{n}}\approx 1. In particular, \rho_{2,d}(U_1,\dots,U_r)/\sqrt{n}\approx 1.

Observation: (Random complex matrices) Suppose now that d<<n and U_1,\dots,U_r\in M_n(\mathbb{C}) are random complex matrices; for simplicity assume that the entries in U_1,\dots,U_r are independent, Gaussian with mean 0, where the real part of each entry in U_r has variance 1/2 and the imaginary part of each entry in U_r has variance 1/2. Then \frac{\rho(U_1\otimes X_1+\dots+U_r\otimes X_r)}{\rho(X_1\otimes\overline{X_1}+\dots+X_r\otimes\overline{X_r})^{1/2}\sqrt{n}}\approx 1 and \rho_{2,d}(U_1,\dots,U_r)/\sqrt{n}\approx 1.

Observation: (Random permutation matrices) Suppose that d<<n and \phi:S_n\rightarrow V is the standard irreducible representation. Let g_1,\dots,g_r\in S_n be selected uniformly at random. Then \frac{\rho(\phi(g_1)\otimes X_1+\dots+\phi(g_r)\otimes X_r)}{\rho(X_1\otimes\overline{X_1}+\dots+X_r\otimes\overline{X_r})^{1/2}}\approx 1 and \rho_{2,d}(\phi(g_1),\dots,\phi(g_r))\approx 1.

Observation: (Random orthogonal or unitary matrices): Suppose that d<<n, and let (U_1,\dots,U_r) be a collection of unitary matrices select at random according to the Haar measure or let (U_1,\dots,U_r) be a collection of orthogonal matrices selected at random according to the Haar measure. Then \frac{\rho(U_1\otimes X_1+\dots+U_r\otimes X_r)}{\rho(X_1\otimes\overline{X_1}+\dots+X_r\otimes \overline{X_r})^{1/2}}\approx 1 and \rho_{2,d}(U_1,\dots,U_r)\approx 1.

Suppose that X is a complex matrix with eigenvalues \lambda_1,\dots,\lambda_n with |\lambda_1|\geq\dots\geq|\lambda_n|. Then define the spectral gap ratio of X as |\lambda_1|/|\lambda_2|.

Observation: Let 1\leq d\leq n. Let A_1,\dots,A_r\in M_n(\mathbb{C}) be random matrices according to some distribution. Suppose that R\in M_{d,n}(\mathbb{C}),S\in M_{n,d}(\mathbb{C}) are matrices where RS is positive semidefinite (for example, RS could be a constant factor normalized LSRDR projector of (A_1,\dots,A_r) ). Let X=A_1\otimes\overline{RA_1S}+\dots+A_r\otimes\overline{RA_rS}. Suppose that the eigenvalues of X are \lambda_1,\dots,\lambda_{nd} with |\lambda_1|\geq\dots\geq|\lambda_{nd}|. Then the spectral gap ratio of X is much larger than the spectral gap of ratio random matrices that follow the circular law. However, the eigenvalues (\lambda_2,\dots,\lambda_{nd}) with the dominant eigenvalue \lambda_1 removed still follow the circular law (or semicircular or elliptical law depending on the choice of distribution of A_1,\dots,A_r). The value \lambda_1/|\lambda_1| will be approximately 1. The large spectral gap ratio for X makes the version of gradient ascent that we use to compute LSRDRs work well; the power iteration algorithm for computing dominant eigenvectors converges more quickly if there is a large spectral gap ratio, and the dominant eigenvectors of a matrix are needed to calculate the gradient of the spectral radius of that matrix.

Observation: (increased regularity in high dimensions) Suppose that A_1,\dots,A_r are n\times n-complex matrices. Suppose that 1\leq d<e\leq r. Suppose that (W_1,\dots,W_r),(X_r,\dots,X_r) are L_{2,d}-SRDRs of (A_1,\dots,A_r) trained by gradient ascent while (Y_1,\dots,Y_r),(Z_1,\dots,Z_r) are L_{2,e}-SRDRs of (A_1,\dots,A_r). Then it is more likely for (Y_1,\dots,Y_r) to be projectively similar to (Z_1,\dots,Z_r) than it is for (W_1,\dots,W_r) to be projectively similar to (X_1,\dots,X_r). In general, (Y_1,\dots,Y_r) will be better behaved than (W_1,\dots,W_r).

Quaternions

We say that a 2d\times 2d complex matrix is a quaternionic complex matrix if a_{2i,2j}=\overline{a_{2i-1,2j-1}} and a_{2i,2j-1}=-\overline{a_{2i-1,2j}} whenever i,j\in\{1,\dots,d\}. Suppose that each A_j is an n\times n quaternionic complex matrix for 1\leq j\leq r. Suppose that d is an even integer with d\leq n and suppose that (X_1,\dots,X_r) is a L_{2,d}-SRDR of (A_1,\dots,A_r). Then there is often some C\in M_d(\mathbb{C}) and \lambda\in\mathbb{C}^\times where if we set Y_j=\lambda CX_jC^{-1} for 1\leq j\leq r, then Y_1,\dots,Y_r are quaternionic complex matrices. Furthermore, if (R,S) is an L_{2,d}-SRDR projector, then there is some LSRDR projector [[R_1,S_1]] where [[R,S]]=[[R_1,S_1]] but where R_1,S_1 are quaternionic complex matrices.

We observe that if A_1,\dots,A_r\in M_n(\mathbb{C}) are quaternionic complex matrices and R\in M_{n,d}(\mathbb{C}),S\in M_{d,n}(\mathbb{C}) are rectangular matrices with RS=1_d, then the matrix A_1\otimes\overline{RA_1S}+\dots+A_r\otimes\overline{RA_rS} will still have a single dominant eigenvalue with a spectral gap ratio significantly above 1 in contrast with the fact that A_1,\dots,A_r each have spectral gap ratio of 1 since the eigenvalues of A_1,\dots,A_r come in conjugate pairs. In particular, if X_1,\dots,X_r\in M_n(\mathbb{C}) are matrices obtained during the process of training L_{A_1,\dots,A_r;d}(X_1,\dots,X_r), then the matrix A_1\otimes\overline{X_1}+\dots+A_r\otimes\overline{X_r} will have a spectral gap ratio that is significantly above 1; this ensures that gradient ascent will not have any problems in optimizing L_{A_1,\dots,A_r;d}(X_1,\dots,X_r).

Gradient ascent on subspaces

Let n be a natural number. Let \mathcal{H}_n,\mathcal{S}_n,\mathcal{Q}_n be the set of all n\times n-Hermitian matrices, complex symmetric matrices, and complex quaternionic matrices. Suppose that \mathcal{C}\in\{\mathcal{H},\mathcal{S},\mathcal{Q}\}. Suppose that (A_1,\dots,A_r)\in\mathcal{C}_n^r. Then let (X_1,\dots,X_r)\in\mathcal{C}_d^r be a tuple such that L_{A_1,\dots,A_r;d}|_{\mathcal{C}^r}(X_1,\dots,X_r) is locally maximized. Suppose that (Y_1,\dots,Y_r)\in M_d(\mathbb{C})^r is a tuple where L_{A_1,\dots,A_r;d}(Y_1,\dots,Y_r) is locally maximized. Then we usually have L_{A_1,\dots,A_r;d}(X_1,\dots,X_r)=L_{A_1,\dots,A_r;d}(Y_1,\dots,Y_r). In this case, we would say that (X_1,\dots,X_r) is an alternate domain L_{2,d}-spectral radius dimensionality reduction (ADLSRDR).

Other fitness functions

Let X,Y be a finite dimensional real or complex inner product space. Suppose that \mu is a Borel probability measure with compact support on X\times Y. Now define a fitness function f_\mu:L(X,Y)\rightarrow[-\infty,\infty) by letting f_\mu(A)=\int \log(|\langle Ax,y\rangle|)d\mu(x,y)-\log(\|A\|_2).

If \nu is a Borel probability measure with compact support on X, then define a fitness function f_\nu:L(X)\rightarrow[-\infty,\infty) by letting f_\mu(A)=\int \log(|\langle Ax,x\rangle|)d\mu(x)-\log(\|A\|_2).

If X,Y are real inner product spaces, then \{A\in L(X,Y)\mid f_\mu(A)>-\infty\},\{A\in L(X)\mid f_\nu(A)>-\infty\} will generally be disconnected spaces, so gradient ascent will have no hope of finding an A\in L(X,Y) or A\in L(X) that maximizes f_\mu,f_\nu respectively simply by using gradient ascent with a random initialization. However, f_\nu(A)>0 whenever A is positive definite, and gradient ascent seems to always produce the same local optimum value f_\nu(A) for the function f_\nu when A is initialized to be a positive semidefinite matrix. If A,B are linear operators that maximize f_\nu, then we generally have A=\pm B.

If X is a complex inner product space, then gradient ascent seems to always produce the same local optimal value f_\nu(A) for the fitness function f_\nu, and if A,B are matrices that maximize the fitness function f_\nu, then we generally have A=\lambda B for some \lambda\in\mathbb{C}^\times. The local optimal operator \lambda A will generally have positive eigenvalues for some \lambda\in\mathbb{C}^\times, but the operator \lambda A itself will not be positive semidefinite. If X=\mathbb{C}^n and \nu is supported on \mathbb{R}^n, then for each operator A that optimizes f_\nu, there is some \lambda\in\mathbb{C}^\times where \lambda A[\mathbb{R}^n]\subseteq\mathbb{R}^n, and A is up to sign the same matrix found by real gradient ascent with positive definite initialization.

The problem of optimizing f_\mu,f_\nu is a problem that is useful for constructing MPO word embeddings and other objects for machine learning similar to LSRDR dimensionality reductions. While f_\nu tends to have a unique up to constant factor local optimum, the fitness function f_\mu does not generally have a unique local optimum; if X,Y are complex inner product spaces, then f_\mu has few local optima in the sense that it is not unusual to there to be a constant \lambda where A=\lambda B such that A,B locally optimize f_\mu. In this case, the matrix A that globally optimizes f_\mu is most likely also the matrix that has the largest basin of attraction with respect to various types of gradient ascent.

MPO word embeddings: Suppose that N is a matrix embedding potential, A is a finite set, and a_1\dots a_n\in A^*. Suppose that f,g:A\rightarrow M_d(\mathbb{C}) is are MPO word pre-embeddings with respect to the matrix embedding potential N. Then there are often constants \lambda_a\in S^1 for a\in A where f(a)=\lambda_ag(a) for a\in A. Furthermore, there is often a matrix R along with constants \mu_a\in S^1 for a\in A where \mu_a Rf(a)R^{-1} is a real matrix for a\in A.

Even though I have published this post, this post is not yet complete since I would like to add information to this post about functions related to and which satisfy similar properties to the functions of the form L_{A_1,\dots,A_r;d}.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: