In this post, we shall discuss the notion of the GMNO hypergraph embedding. This graph/hypergraph embedding is related to the -spectral radii and satisfies similar properties to the -spectral radii, MPO word embeddings, and also embeddings obtained from eigenvectors of the Laplacian matrix of a graph. Even though I have published this post, I will eventually add more details to this post.
Suppose that is a graph or hypergraph with unlabeled vertex set . Let denote the division algebra of the real numbers, complex numbers, or quaternions. Our goal is to be able to construct a function for some where machine learning models can more easily work with for than directly with the elements .
Suppose that is a set which we shall call a vertex set. Let be a collection of functions where and where for . Let be a function such that . Let be a function which we shall call a multiplicity function (or you can call the vertex weight function). The function should be considered as a sort of generalized weighted hypergraph structure on the vertex set .
If , then define the fitness of with respect to by
We say that a function is a geometric mean of norm optimized (GMNO) pre-embedding for the generalized hypergraph structure and multiplicity function if is locally maximized. Here, we have not made any attempt to make the fitness function a very general function.
We say that a GMNO pre-embedding is a GMNO hypergraph embedding if the matrix is a diagonal matrix with increasing diagonal entries. If , then we say that a GMNO embedding is positive for if each entry in is positive.
Observation: which can easily be interpreted as a quantity of potential energy of the function .
One can define the class so that with an embedding , certain vertices will be far away from each other. If , and is the function defined by letting , and , then indicates that the vertices should be far away from each other.
- (Empirical uniqueness of local optimum) The GMNO embedding for a multiplicity function and generalized hypergraph that is positive for with is typically unique. If and are GMNO embeddings with which are both positive for , and have both been obtained using gradient ascent but with possibly different initial conditions and different optimizers, then we typically have . By training the GMNO embedding twice and obtaining GMNO embeddings , one can test whether the GMNO embedding is satisfactory or not; the GMNO embeddings are likely to be satisfactory precisely when .
2. (Learned realization) Suppose that . Then for a GMNO embedding that is positive for some , we typically have . One can test whether a GMNO embedding is satisfactory by training the embedding when . The GMNO embedding is likely to be satisfactory precisely when .
3. (Sphericity) If is a GMNO embedding for a suitable multiplicity function and generalized hypergraph , then will typically have very low variance. In particular, if , then will typically be near the sphere . One can train the multiplicity function while simultaneously training the GMNO embedding so that when is a fully trained GMNO embedding.
4. (Thin singularities) While the fitness function does have singularities, these singularities have a small dimension and high codimension (all but one of the singularities has codimension ). The singularities of the fitness function are unlikely to cause any issues.
5. (Interpretability) It is not to difficult to see relations between a hypergraph and its corresponding GMNO embedding . For example, from a GMNO embedding , one can easily identify clusters, intersecting clusters, and much of the geometric structure of the hypergraph .
6. (Improved performance in high dimensions) By increasing the dimension , one can produce a higher quality GMNO embedding. For example, the empirical uniqueness of local optimum and learned realization properties are often satisfied only when is sufficiently large.
7. (Fast gradient computation) The gradient of with respect to is easy to compute.
8. (Adaptability) GMNO embeddings are reasonable graph embeddings for many different kinds of graphs including graphs with a very high diameter to order ratio, graphs with several (intersecting or non-intersecting) clusters, and graphs with a low diameter but high clustering coefficient.
Other properties: GMNO hypergraph embeddings also satisfy properties that may or may not be desirable.
- (Uncentered) If is a GMNO embedding, then the expected value is typically far away from the origin
- (Dimension filling) In order to minimize , the matrix will tend to be close to a constant multiple of the identity matrix. In other words, the function will often (but not always) use all available dimensions nearly equally. This may result in a high dimensional GMNO embedding even when is inherently low dimensional. This problem can be ameliorated by reducing the dimension or by generalizing the notion of a GMNO embedding. There are cases (including positional embeddings) where one would want to embed a low dimensional graph into a high dimensional space.
- (Flat embedding) Sometimes the matrix can be well approximated by a low rank matrix. This means that not all dimensions in are used equally. The notion of a GMNO hypergraph embedding can be generalized so that the dimensions of are used more evenly.
- (Weak repulsion) For GMNO hypergraph embeddings, it is possible that there are unrelated where and are very close together. This problem can be ameliorated by increasing the dimension .
Comparison with spectral clustering: GMNO hypergraph embeddings are closely related to the eigenvalues of the Laplacian matrix of a graph for a few reasons. First of all, the eigenvalues of the Laplacian are always unique while the GMNO hypergraph embeddings are often unique. There is also a loss function resembling the loss function where the input that minimizes this loss function is simply the bottom -non-zero eigenvectors of the Laplacian matrix.
A potential application: positional embeddings GMNO hypergraph embeddings can be used to produce positional embeddings for convolutional neural networks. For example, one can represent a grid as a hypergraph. One can also add additional hyperedges to this hypergraph that represent regions in images that the CNN is training on. Suppose that is a GMNO embedding with respect to this hypergraph. Define by letting . Then a positional embedding layer is a function defined by letting . By including a positional embedding layer in a CNN, the network has not only information about the content of the image but the location of that content as well. We recommend using GMNO embedding for this positional embedding layer and using a positional embedding layer in CNNs.
Even though I have published this post, I will add more details onto this post later.