In this post, we shall make a few experimental observations about the -spectral radius and dimensionality reduction that give evidence that the
-spectral radius and dimensionality reduction will eventually have a deep mathematical theory behind it and also become more applicable to machine learning. I have not been able to prove these empirical observations, so one should consider this post to be about experimental mathematics. Hopefully these experimental results will eventually be backed up by formal theoretical mathematics. The reader is referred to this post for definitions.
This post is currently a work in progress, so I will be updating this post with more information. Let be the collection of all tuples
where
is not nilpotent. Given matrices
, define a function
by letting
Given matrices , we say that
is an
-spectral radius dimensionality reduction of
if
locally maximizes
. We say that an
-spectral radius dimensionality reduction is optimal if
globally maximizes
.
We say that is projectively similar to
if there is some
and some
where
for
Let
denote the equivalence relation where we set
precisely when
and
are projectively similar. Let
denote the equivalence relation with respect to
Observation: (apparent unique local optimum; global attractor) Suppose that are
-SRDRs of
obtained by gradient ascent on the domain
. Then
are usually projectively similar. Suppose now that
is the random variable consisting of all equivalence classes
where
is an
-SRDR trained by some variant of gradient ascent. Then our computations indicate that the random variable
has typically has low entropy in the case when
is a non-constant random variable.
Example: The above observation does not always hold as may have multiple non-projectively similar local maxima. For example, suppose that
for
where each
and each
is a
-matrix. Then
Suppose now that
and
. Then
are usually non-projectively similar local maxima for the function
Of course,
does not generate the algebra
, but there is some neighborhood
of
where if
, then
will have at least two non-projectively similar local maxima. This sort of scenario is not as contrived as one may think it is since
could represent two clusters where there are few or weak connections between these clusters (for example, in natural language processing, each language could represent its own cluster).
Observation: (height and breadth correlation for local maxima) Suppose that is the probability distribution of the (real, complex,or quaternionic) equivalence classes
of
-SRDRs obtained from
using gradient ascent. Then the random variables
(mapping the outcome to the probability of observing that outcome) and
tend to have a positive linear correlation. The probability
is maximized typically when
is maximized instead of just locally maximized.
Observation: (preservation of symmetry) Suppose that are Hermitian (resp. real, real symmetric, complex symmetric, positive semidefinite) and
is an
-SRDR of
. Then
is usually projectively similar to some tuple
where each
is Hermitian (resp. real, real symmetric, complex symmetric, positive semidefinite).
Observation: (near functoriality) Suppose that Suppose that
generates
Let
be an
-SRDR of
. Let
be an
-SRDR of
. Let
be an
-SRDR of
. Then
is usually not projectively similar to
, but
is usually close to
Observation: (Suboptimal real LSRDRs) Suppose that . Let
be the restriction of
to the domain
. A local maxima
for
shall be called a real LSRDR. Gradient ascent for the function
often produces more local maxima which are not global maxima than the complex version
has. These supoptimal local optima are tuples
where if
has eigenvalues
with
, then
are complex conjugates. In this case,
, but
is not as large as it can be since the gradient ascent process must make the absolute value of two eigenvalues
large instead of simply making the absolute value of a single eigenvalue large. Fortunately, the most efficient way of computing real LSRDRs also naturally avoids the case when
. To compute LSRDRs, one will need to compute the gradient of the expression
, but the most efficient way to compute the gradient of the spectral radius of a matrix
, one will need to use the dominant left and right eigenvectors of
. The dominant left and right eigenvectors of
can be computed using a power iteration, but the power iteration technique (unless generalized) only effectively computes the dominant left and right eigenvectors of
if
has a single eigenvector
where
whenever
is an eigenvalue different than
Therefore, a variation of gradient ascent that uses the dominant left and right eigenvectors to approximate the gradient of
will be inaccurate whenever
has multiple dominant eigenvectors. In practice, the gradient ascent process will still converge, but it will converge to a value
where
has a single dominant eigenvalue. In this case, the gradient ascent process avoids supoptimal real LSRDRs. With everything being said, there are still examples (such as with MPO word embeddings) of real matrices
where training a complex LSRDR achieves better results than training a real LSRDR and where if
is a complex LSRDR, then there are
where
is a real matrix for
.
If , then define a function
by setting
We say that is an
-SRDR projector if
is a local maximum for
. We say that
is a strong
-SRDR projector if
is an
-SRDR.
Observation: If is an
-SRDR projector, then there is usually a constant
with
and a projection
of rank
with
-SRDRs projectors are usually strong
-SRDR projectors. If
, then set
precisely when there are
where
. If
and
are
-SRDR projectors, then we typically have
In fact,
for some
and if we set
, then
for some
.
We say that a -SRDR projector is constant factor normalized if
. If
are two constant factor normalized
-SRDR projectors of
, then
and where if we set
, then
Furthermore, we typically have
We shall call the projection matrix
an
-LSRDR projection operator of
Observation: Let be a constant factor normalized
-SRDR projector of
, and suppose that
for
. Let
and suppose that
is a projection matrix. Let
be a dominant eigenvector of
and let
be a dominant eigenvector of
. Then in our experiments, the matrices
can all be written as positive semidefinite matrices multiplied by a complex constant.
The eigenvalues of are all simple eigenvalues.
Let be random complex
-positive semidefinite matrices. Let
and let
for
. Then the sequence
converges to a positive semidefinite matrix
and the sequence
converges to a positive semidefinite matrix
. The positive semidefinite matrices
do not depend on the initialization
There exists complex constants
where
and
.
Here, , and
and
.
Let be random complex
-positive semidefinite matrices. Let
and let
for
. Then
.
The matrix is the dominant eigenvector of the superoperator
. The matrix
is the dominant eigenvector of the superoperator
. The matrix
is the dominant eigenvector of the superoperator
The matrix
is the dominant eigenvector of the superoperator
. Furthermore, we have
. These eigenvectors correspond to real positive eigenvalues:
and
Application: LSRDRs may be used to compress data that can be represented as a collection of matrices. Suppose that is a collection of matrices. Let
be a constant factor normalized
-SRDR of
. Let
for
. Then the matrices
can be approximately recovered from
and
. The matrix
is an approximation for the matrix
which we shall call an LSRDR principal component (LSRDRPC) for
. We observe that if
are two constant factor normalized
-SRDRs of
, then since
, we have
. In other word, the LSRDRPC for
is typically unique or at the very least, there will be few LSRDRPCs for
Observation: If is an
-SRDR of
, then there is usually a strong
-SRDR projector
where
for
Observation: If are self-adjoint, then whenever
is a constant factor normalized
-SRDR projector of
, we usually have
for some
-SRDR projector of
with
and where
. Furthermore, if
is another
-SRDR projector of
with
, then
is a unitary matrix, and
, and
If
is the inclusion mapping, and
is the orthogonal projection, then
is an
-SRDR projector.
Observation: If are real square matrices, then whenever
is a
-SRDR projector of
, we usually have
for some
-SRDR projector of
where
are real matrices. Furthermore, if
are real constant factor normalized
-SRDR projectors of
, then there is a real matrix
where
.
Observation: If are real symmetric matrices, then whenever
is a
-SRDR projector of
, we usually have
for some
-SRDR projector of
where
are real matrices with
. In fact, if
are real constant factor normalized
-SRDR projectors of
with
, there is a real orthogonal matrix
with
.
Observation: If are complex symmetric matrices, then whenever
is a
-SRDR projector of
, we usually have
for some
-SRDR projector of
and where
. Furthermore, if
are constant factor normalized
-SRDR projectors of
, and
, then
.
Application: Dimensionality reduction of various manifolds. Let denote the quotient complex manifold
where we set
precisely when
or
. One can use our observations about LSRDRs to perform dimensionality reduction of data in
, and we can also use dimensionality reduction to map data in
and
to data in
and
respectively.
Let be the collection of all
positive definite rank-1 real matrices. Let
be the collection of all positive definite rank-1 complex matrices. Let
be the collection of all symmetric rank-1 complex matrices. The sets
can be put into a canonical one-to-one correspondence with
respectively. Suppose that
and
is an
-SRDR of
. If
, then
, and every element of
can be considered as a dimensionality reduction of
. Similarly, if
, then
, and if
, then
.
-SRDRs can be used to produce dimensionality reductions that map subsets
to
where
.
Suppose that . Then let
for
. Then there is some subspace
where if
are the inclusion and projection mappings, then
is a constant factor normalized
-SRDR projector for
. In this case,
is a dimensionality reduction of the tuple
.
Suppose that . Then let
for
. Suppose that
is a constant factor normalized
-SRDR projector with respect to
. Then
is a dimensionality reduction of
One may need to normalize the data in the above paragraphs before taking a dimensionality reduction. Suppose that
. Let
be the mean of
. Let
. Suppose that
is a dimensionality reduction of
. Then
is normalized dimensionality reduction of
.
Application: (clustering for neural networks with bilinear layers) Suppose that are finite dimensional real inner product spaces. We say that a function
is a quadratic form if
is a quadratic form for each
. We say that a linear operator
is symmetric if
is a symmetric linear operator for each
. The quadratic forms
can be put into a one-to-one correspondence with the symmetric linear operators
by the relation
for
.
Suppose that is a quadratic form and
is a linear mapping such that
for
Suppose furthermore that
is an LSRDR of
. Let
be the linear operators where
and suppose that
. Then let
. Then
is an orthogonal projection operator, and the space
is a cluster in
. Furthermore, let
be the dominant eigenvector of
and let
be the dominant eigenvector of
. Then we may assume that the operators
are positive semidefinite with trace 1. Then whenever
, the values
and
are measures of how well the vector
belongs to the cluster
.
Observation: (Random real matrices) Suppose that and
are random real matrices; for simplicity assume that the entries in
are independent, and normally distributed with mean
and variance
. Then whenever
, we have
. In particular,
.
Observation: (Random complex matrices) Suppose now that and
are random complex matrices; for simplicity assume that the entries in
are independent, Gaussian with mean
, where the real part of each entry in
has variance
and the imaginary part of each entry in
has variance
. Then
and
.
Observation: (Random permutation matrices) Suppose that and
is the standard irreducible representation. Let
be selected uniformly at random. Then
and
Observation: (Random orthogonal or unitary matrices): Suppose that , and let
be a collection of unitary matrices select at random according to the Haar measure or let
be a collection of orthogonal matrices selected at random according to the Haar measure. Then
and
.
Suppose that is a complex matrix with eigenvalues
with
. Then define the spectral gap ratio of
as
Observation: Let . Let
be random matrices according to some distribution. Suppose that
are matrices where
is positive semidefinite (for example,
could be a constant factor normalized LSRDR projector of
). Let
. Suppose that the eigenvalues of
are
with
Then the spectral gap ratio of
is much larger than the spectral gap of ratio random matrices that follow the circular law. However, the eigenvalues
with the dominant eigenvalue
removed still follow the circular law (or semicircular or elliptical law depending on the choice of distribution of
). The value
will be approximately
. The large spectral gap ratio for
makes the version of gradient ascent that we use to compute LSRDRs work well; the power iteration algorithm for computing dominant eigenvectors converges more quickly if there is a large spectral gap ratio, and the dominant eigenvectors of a matrix are needed to calculate the gradient of the spectral radius of that matrix.
Observation: Suppose that is near zero. Suppose that
are independent random complex matrices. Suppose furthermore that the entries in each
are independent, Gaussian, and have variance
. Then
Observation: (increased regularity in high dimensions) Suppose that are
-complex matrices. Suppose that
. Suppose that
are
-SRDRs of
trained by gradient ascent while
are
-SRDRs of
. Then it is more likely for
to be projectively similar to
than it is for
to be projectively similar to
. In general,
will be better behaved than
.
Application: (Iterated LSRDRs) Suppose that . Let
be
-complex matrices. Suppose that
for
. Suppose furthermore that
is an LSRDR of
whenever
. Then we shall call
an iterated LSRDR of
. One advantage of iterated LSRDRs over LSRDRs is that an iterated LSRDR
of
where each
is an
-matrix is more likely to be well-behaved than an LSRDR
of
where each
is an
-matrix. While iterated LSRDRs may seem to require more intensive computation than ordinary LSRDRs, it will take fewer rounds of gradient ascent to compute
from
than it will take to compute the LSRDR
.
Observation: (Increased regularity with more matrices) Suppose that . Let
. Suppose that
is an LSRDR of
while
is an LSRDR of
. Then the LSRDR
is generally better behaved than the LSRDR
For example, if all the matrices
are real matrices, then it is more likely that one can find
where
are all real matrices than it is to find
where each
are all real matrices.
Application: (Appending matrix products) Suppose that are
-complex matrices and
. Suppose that
is an LSRDR of
. If the LSRDR
is poorly behaved, then one may modify
to obtain a better behaved LSRDR.
For example, if , then set
. Suppose that
are non-negative real numbers. Then let
. Then let
be an LSRDR of
. Then the LSRDR
typically behaves better than the LSRDR
. While each round of gradient ascent for computing the LSRDR
takes longer than each round for computing the LSRDR
, the gradient ascent will converge in fewer rounds for
than it will for
.
Block preservation
Suppose that are real or complex inner product spaces. Let
. Let
be a linear operator for
. Suppose that for each
, there are
where if
and
, then
. Let
be an LSRDR projection operator of
. Then we may typically find projection operators
for
with
.
Suppose now that is a function. For
, let
be the matrix where the
-th entry is
and every other entry is zero. Let
for
, and let
Let
be an LSRDR projection operator of
. If
, then let
be the matrix where
and where
whenever
and where
otherwise. If
, then let
be the
-diagonal matrix where the
-th diagonal entry is
for
and the
-th diagonal entry is
whenever
Then we typically have
where
is the subset of
with
and where
is maximized.
Application: Given a matrix , let
denote the sum of all the entries in
Suppose that
is a real
-matrix. Then we may use LSRDRs to find a subset
where
is maximized.
Let be the
-matrix where each entry is
, and let
. Let
be a real
-matrix. Then it is not too difficult to show that
Suppose that is an
-real matrix, and there is a unique
with
and where
is maximized. Then whenever
is maximized,
is sufficiently small, and
, we typically have
Quaternions
We say that a complex matrix is a quaternionic complex matrix if
and
whenever
. Suppose that each
is an
quaternionic complex matrix for
. Suppose that
is an even integer with
and suppose that
is a
-SRDR of
. Then there is often some
and
where if we set
for
, then
are quaternionic complex matrices. Furthermore, if
is an
-SRDR projector, then there is some LSRDR projector
where
but where
are quaternionic complex matrices.
We observe that if are quaternionic complex matrices and
are rectangular matrices with
, then the matrix
will still have a single dominant eigenvalue with a spectral gap ratio significantly above 1 in contrast with the fact that
each have spectral gap ratio of
since the eigenvalues of
come in conjugate pairs. In particular, if
are matrices obtained during the process of training
, then the matrix
will have a spectral gap ratio that is significantly above
; this ensures that gradient ascent will not have any problems in optimizing
.
Gradient ascent on subspaces
Let be a natural number. Let
be the set of all
-Hermitian matrices, complex symmetric matrices, and complex quaternionic matrices. Suppose that
. Suppose that
. Then let
be a tuple such that
is locally maximized. Suppose that
is a tuple where
is locally maximized. Then we usually have
. In this case, we would say that
is an alternate domain
-spectral radius dimensionality reduction (ADLSRDR).
Homogeneous non-commutative polynomials
Suppose that is a homogeneous non-commutative polynomial of degree
. Define a fitness function
by letting
whenever
. We say that
is an
-spectral radius dimensionality reduction of
if
is maximized.
If is a tensor of degree
, then
where
denotes the Frobenius norm of
. One should take an LSRDR of an
-tensor
only if
since there are plenty of linear algebraic tools that one can use to analyze
-tensors.
Suppose that is a homogeneous non-commutative polynomial of degree
with random coefficients. Let
be an LSRDR of
. Then whenever
is a homogeneous non-commutative polynomial of degree
with
, we typically have
. Let
. The eigenvalues of
will be symmetric under
rotations. Whenever the coefficients of
are random complex numbers and
, then
has no real eigenvalues except for a dominant eigenvalue or zero eigenvalues. In the case when the coefficients of
are random real numbers and
, the superoperator
may have other real eigenvalues, but these non-zero non-dominant eigenvalues will tend to have multiplicity 2.
A simple calculation shows that the above observations of the traces of and of
are equivalent.
Observation: The trace of a completely positive superoperator is always a non-negative real number. In particular, .
Observation: Suppose that are
real or complex matrices. Then
.
Therefore if and only if
whenever
.
LSRDRs of homogeneous non-commutative polynomials are typically well behaved even when the polynomial is of a very high degree. For example, let and let
be a system of non-commutative polynomials each of degree
. Then for
, let
. Let
. Then there are homogeneous non-commutative polynomials
where
. If
are
-SRDR of
trained with different initializations, then we may typically find a central scalar
and some invertible
with
for
. This is despite the fact that
has degree
which could be very large.
One can use LSRDRs to map tensors in to other objects in several ways. Let
. Then since
, we may consider a tensor
in
as an element of
, and we may take our LSRDR of the tensor
Suppose that
. Then we can represent a tensor
as a homogeneous non-commutative polynomial
consisting of a linear combination of monomials of the form
. Define a fitness function
by setting
where
is some mean. Then the fitness function empirically has a unique local maximum value.
Tensors to collections of matrices
Not only are we able to obtain tuples of matrices from tensors, but we may also obtain tensors from tuples of matrices, and go back and forth between tensors and tuples of matrices. Suppose that is either the field of real numbers or the field of complex numbers. Let
. Then a LSRDR
of
is typically unique up to a scalar multiple, and if
is the field of complex numbers but each
is a real matrix, then
is a real tensor multiplied by a complex nonzero scalar.
Suppose that are finite dimensional inner product spaces over the field
. Suppose that
is a basis for
. Let
be a sequence of positive integers with
. Suppose that
is an
matrix over the field
whenever
. Then define the tensor
. If
and
is locally minimized and
, then we say that
is a matrix table to tensor dimensionality reduction of type
. Our computer experiments suggest that the matrix table to tensor dimensionality reduction of type
is generally unique. Furthermore, if
and
is an
-linear combination of the basis vectors
, then in our computer experiments, the matrix table to tensor dimensionality reduction of type
is also an
-linear combination of the basis vectors
.
Suppose that is a sparse tensor whose entries are integers and
. Suppose furthermore that
are matrix table to tensor dimensionality reductions of \
. Then the values
are often positive integers. It is sometimes but not always the case that each entry in
is an integer or at least a rational number. It is sometimes the case that
- there is a partition
of
where
precisely when
,
is always a positive integer,
- there is a positive integer
where
whenever
,
- if
, then all of the entries in
are integers,
- if
, then most but not all of the entries in
are integers,
- if
, then the tensor
has repeated non-integer entries, and
- one can find non-integer entries
in
where
is a non-zero integer.
Suppose that is a random tensor in
. Then one can often find a matrix table to tensor dimensionality reduction
of type
such that
is its own matrix table to tensor dimensionality reduction
of type
.
LSRDRs to learn data
LSRDRs may be adapted to solve classification tasks. Suppose that .
Suppose that are natural numbers. Suppose that
. Let
be natural numbers. Suppose that
for
. We say that a collection
of matrices is an
-spectral radius similarity log-expected maximizer (LSRSLEM) of degree
and dimension
if
locally maximizes the fitness
Application: LSRSLEMs may be used to solve machine learning classification problems. Suppose that is a universal set. Suppose that we are given a training data set
, and we would like the machine learning algorithm to recognize objects that are similar to the elements in
. Then one would first construct a function
. One then constructs an LSRSLEM
of
Then a higher value of
indicates that the value
is more similar to the training data set
.
Observation: If each is nearly positive semidefinite, and
are LSRSLEMs of the same degree and dimension of
, then
typically have the same fitness level.
Observation: Suppose that is an LSRSLEM. Then one can typically find subspaces
of the vector space
such that each
is invariant with respect to each
and each
. In other words, there is a unitary matrix
along with
with
and where each
is a block diagonal matrix whose diagonal blocks are matrices
of sizes
. Furthermore, the matrices
typically generate the algebra
.
Application: LSRSLEMs may be used to recognize objects in a data set, but LSRSLEMs may also be used to partition data sets into clusters and produce representatives of these clusters. Suppose that is an LSRDLEM of
and each
is a block diagonal matrix with block diagonal entries
.
Our clustering function will simply be the function that maximizes the quantity
Our clustering function can be used to assign tuples
that are not contained in our original
data values to these
clusters. Our extended clustering function
will be the partial function where if
, then
whenever
is maximized.
An LSRDR of a cluster
is a representative for the
-th cluster.
Iterated LSRSLEM clustering
One can use iterated LSRSLEMs to partition data into clusters. For simplicity, assume that .
Suppose that is a random positive real number for
. Let
. Then we shall construct
whenever
using recursion on
.
From , we construct an LSRSLEM
of
of degree
and dimension
. Suppose that
is a dominant eigenvector with norm
of
whenever
. Then there is an equivalence relation
on
where for all
, there is some
where if
and
, then
whenever
and
otherwise. The equivalence relation
does not typically depend on the choices made when training the LSRSLEM
for
.
The stability of the iterations also suggests that our method of iterating LSRSLEMs will not be able to replicate the capabilities of deep neural networks.
Multiple tensors
Compositional LSRDRs
Suppose that are positive integers. Suppose that
is an
-matrix over the field
(here addition is taken modulo
) whenever
. Then we say that
is a compositional LSRDR (abbreviated CLSRDR) of
with dimensions
of
if the following quantity known as the fitness is locally maximized:
Observation: Two CLSRDRs of often have the same fitness levels.
Observation: Suppose that are positive integers with
for all
. Then a CLSRDR with dimensions
will typically have the same fitness level as a CLSRDR with dimensions
.
Observation: Suppose that is a CLSRDR of
. Then one can often find two sequences
of matrices where
for all
. Furthermore, for all
, there is a non-zero constant
where
. We say that the CLSRDR
is constant factor normalized if
for all
. In this case, the matrix
will be a projection matrix that we may denote by
.
Let denote the matrices where
is the dominant eigenvector of
and where
is the dominant eigenvector of the adjoint
. Then the matrices
are generally positive semidefinite up to a constant factor. If we train the CLSRDR twice then each time we will get the same value for
.
Suppose furthermore that for all
. Then
whenever the CLSRDR
is constant factor normalized, and the positive semidefinite matrices
do not depend on the value
.
WHAT IF ALL BUT ONE ENTRY IS ZERO? BLOCK MATRICES. UNDER CONSTRUCTION. Suppose that
Other fitness functions
Let be a finite dimensional real or complex inner product space. Suppose that
is a Borel probability measure with compact support on
. Now define a fitness function
by letting
.
If is a Borel probability measure with compact support on
, then define a fitness function
by letting
.
If are real inner product spaces, then
will generally be disconnected spaces, so gradient ascent will have no hope of finding an
or
that maximizes
respectively simply by using gradient ascent with a random initialization. However,
whenever
is positive definite, and gradient ascent seems to always produce the same local optimum value
for the function
when
is initialized to be a positive semidefinite matrix. If
are linear operators that maximize
, then we generally have
.
If is a complex inner product space, then gradient ascent seems to always produce the same local optimal value
for the fitness function
, and if
are matrices that maximize the fitness function
, then we generally have
for some
. The local optimal operator
will generally have positive eigenvalues for some
, but the operator
itself will not be positive semidefinite. If
and
is supported on
, then for each operator
that optimizes
, there is some
where
, and
is up to sign the same matrix found by real gradient ascent with positive definite initialization.
The problem of optimizing is a problem that is useful for constructing MPO word embeddings and other objects for machine learning similar to LSRDR dimensionality reductions. While
tends to have a unique up to constant factor local optimum, the fitness function
does not generally have a unique local optimum; if
are complex inner product spaces, then
has few local optima in the sense that it is not unusual to there to be a constant
where
such that
locally optimize
. In this case, the matrix
that globally optimizes
is most likely also the matrix that has the largest basin of attraction with respect to various types of gradient ascent.
MPO word embeddings: Suppose that is a matrix embedding potential,
is a finite set, and
. Suppose that
is are MPO word pre-embeddings with respect to the matrix embedding potential
. Then there are often constants
for
where
for
. Furthermore, there is often a matrix
along with constants
for
where
is a real matrix for
.
Rank-1 matrix valued graph embeddings
Applying pseudodeterminism to test randomness/pseudorandomness
The uniqueness of the fitness functions (pseudodeterminism) in the machine learning algorithms in this post is useful for testing for randomness/pseudorandomness in cryptography. Here is a non-exhaustive list of ways that one can use pseudodeterministic machine learning algorithms to test data sets for randomness/pseudorandomness and to measure the quantity of randomness/pseudorandomness in the data set and obtain confidence levels of the non-randomness/pseudorandomness of an object.
Application: Let be a finite set. Let
and let
be a positive integer. Let
. Define our fitness function
by setting
If
is locally maximized, then a lower value of
indicates that the sequence
is more random/pseudorandom.
Application: Suppose that is a finite set,
is a natural number, and
is a subset. Let
and let
be a positive integer. Define a function
by setting
.
If is locally maximized, then a lower value of
indicates that the set
is more random/pseudorandom.
Best quantum channel approximation to a relation.
Let be complex Euclidean spaces. Suppose that
be such that each
is a unit vector. Then we say that a quantum channel
is a best Choi rank
approximation to
if the quantity
is locally minimized subject to the condition that
is a quantum channel of Choi rank at most
.
Empirical observation: Best Choi rank approximations to functions tend to be unique in the sense that the gradient descent process tends to produce the same Best Choi rank
approximations when we run the gradient descent multiple times.
Suppose that is a best Choi rank
approximation to
.
Let where each
is a unit vector. For each
, let
be a unit dominant eigenvector of
. Let
. Then we typically have
. More generally, if
is a neighborhood of
of small diameter and
, then we typically have
and if
is a best Choi rank
-approximation to
, then
is typically the identity mapping.
Afterward
Even though I have published this post, this post is not yet complete since I would like to add information to this post about functions related to and which satisfy similar properties to the functions of the form . I also want the community to actually say something and communicate back before I add any information to this post.