GMNO and related hypergraph embeddings

In this post, we shall discuss the notion of the GMNO hypergraph embedding. This graph/hypergraph embedding is related to the L_{2,d}-spectral radii and satisfies similar properties to the L_{2,d}-spectral radii, MPO word embeddings, and also embeddings obtained from eigenvectors of the Laplacian matrix of a graph. Even though I have published this post, I will eventually add more details to this post.

Suppose that G is a graph or hypergraph with unlabeled vertex set V. Let K denote the division algebra of the real numbers, complex numbers, or quaternions. Our goal is to be able to construct a function f:V\rightarrow K^d for some d where machine learning models can more easily work with f(v) for v\in V than directly with the elements v\in V.

Suppose that V is a set which we shall call a vertex set. Let S be a collection of functions s where \text{Dom}(s)\subseteq V and where s(v)\in \mathbb{R} for v\in V. Let w:S\rightarrow[0,1] be a function such that \sum_{s\in S}w(s)=1. Let m:V\rightarrow(0,\infty) be a function which we shall call a multiplicity function (or you can call m the vertex weight function). The function w shall be called a generalized weighted hypergraph structure on the vertex set V.

If f:V\rightarrow K^d, then define the fitness of f with respect to m,w by

L(f,m,w)=\sum_{s\in\text{Dom}(w)}w(s)\cdot\log(\|\sum_{u\in\text{Dom}(s)}s(u)\cdot f(u)\|)-\log(\|\sum_{v\in V}m(v)f(v)f(v)^*\|)/2.

We say that a function f:V\rightarrow K^d is a geometric mean of norm optimized (GMNO) pre-embedding for the generalized hypergraph structure w and multiplicity function m if L(f,m,w) is locally maximized. Here, we have not made any attempt to make the fitness function L(f,m,w) a very general function.

We say that a GMNO pre-embedding f is a GMNO hypergraph embedding if the matrix \sum_{v\in V}m(v)f(v)f(v)^* is a diagonal matrix with increasing diagonal entries. If v_0\in V, then we say that a GMNO embedding f is positive for v_0 if each entry in f(v_0) is positive.

Observation: \sum_{v\in V}m(v)f(v)f(v)^*=\sum_{u,v\in V}\langle \sqrt{m(u)}f(u),\sqrt{m(v)}f(v)\rangle which can easily be interpreted as a quantity of potential energy of the function f.

One can define the class S so that with an embedding f:V\rightarrow K^d, certain vertices will be far away from each other. If u,v\in V,u\neq v, and s:\{u,v\}\rightarrow\{-1,1\} is the function defined by letting s(u)=1,s(v)=-1, and s\in S, then s indicates that the vertices u,v should be far away from each other.

Desirable properties:

  1. (Empirical uniqueness of local optimum) The GMNO embedding f for a multiplicity function m and generalized hypergraph w that is positive for v_0 with \|f(v_0)\|=1 is typically unique. If K_1,K_2\in\{\mathbb{R},\mathbb{C},\mathbb{H}\} and f:V\rightarrow K_1^d,g:V\rightarrow K_2^d are GMNO embeddings with \|f(v_0)\|=\|g(v_0)\|=1 which are both positive for v_0, and f,g have both been obtained using gradient ascent but with possibly different initial conditions and different optimizers, then we typically have f=g. By training the GMNO embedding twice and obtaining GMNO embeddings f,g, one can test whether the GMNO embedding is satisfactory or not; the GMNO embeddings f,g are likely to be satisfactory precisely when f=g.

2. (Learned realization) Suppose that K\in\{\mathbb{C},\mathbb{H}\}. Then for a GMNO embedding f:V\rightarrow K^d that is positive for some v_0\in V, we typically have f[V]\subseteq\mathbb{R}^d. One can test whether a GMNO embedding is satisfactory by training the embedding when K\in\{\mathbb{C},\mathbb{H}\}. The GMNO embedding f is likely to be satisfactory precisely when f[V]\subseteq\mathbb{R}^d.

3. (Sphericity) If f:V\rightarrow\mathbb{R}^d is a GMNO embedding for a suitable multiplicity function m and generalized hypergraph w, then \|f(v)\| will typically have very low variance. In particular, if E[\|f(v)\|]\approx 1, then f[V] will typically be near the sphere S^{d-1}. One can train the multiplicity function m while simultaneously training the GMNO embedding f so that f[V]\subseteq S^{d-1} when f is a fully trained GMNO embedding.

4. (Thin singularities) While the fitness function f\mapsto L(f,m,w) does have singularities, these singularities have a small dimension and high codimension (all but one of the singularities has codimension d). The singularities of the fitness function are unlikely to cause any issues.

5. (Interpretability) It is not to difficult to see relations between a hypergraph V and its corresponding GMNO embedding f:V\rightarrow K^d. For example, from a GMNO embedding f:V\rightarrow K^d, one can easily identify clusters, intersecting clusters, and much of the geometric structure of the hypergraph (V,E).

6. (Improved performance in high dimensions) By increasing the dimension d, one can produce a higher quality GMNO embedding. For example, the empirical uniqueness of local optimum and learned realization properties are often satisfied only when d is sufficiently large.

7. (Fast gradient computation) The gradient of L(f,m,w) with respect to f is easy to compute.

8. (Adaptability) GMNO embeddings are reasonable graph embeddings for many different kinds of graphs including graphs with a very high diameter to order ratio, graphs with several (intersecting or non-intersecting) clusters, and graphs with a low diameter but high clustering coefficient.

Other properties: GMNO hypergraph embeddings also satisfy properties that may or may not be desirable.

  1. (Uncentered) If f:V\rightarrow K is a GMNO embedding, then the expected value E(f(v)) is typically far away from the origin \mathbf{0}.
  2. (Dimension filling) In order to minimize \sum_{v\in V}m(v)f(v)f(v)^*, the matrix \sum_{v\in V}m(v)f(v)f(v)^* will tend to be close to a constant multiple of the identity matrix. In other words, the function f will often (but not always) use all available dimensions nearly equally. This may result in a high dimensional GMNO embedding f even when (V,m,w) is inherently low dimensional. This problem can be ameliorated by reducing the dimension d or by generalizing the notion of a GMNO embedding. There are cases (including positional embeddings) where one would want to embed a low dimensional graph into a high dimensional space.
  3. (Flat embedding) Sometimes the matrix \sum_{v\in V}m(v)f(v)f(v)^* can be well approximated by a low rank matrix. This means that not all dimensions in K^d are used equally. The notion of a GMNO hypergraph embedding can be generalized so that the dimensions of K^d are used more evenly.
  4. (Weak repulsion) For GMNO hypergraph embeddings, it is possible that there are unrelated u,v where f(u) and f(v) are very close together. This problem can be ameliorated by increasing the dimension d, so if d is sufficiently large, it will be unlikely to have f(u) near f(v) unless the vertices u,v are related.
  5. (Homophilicity) GMNO hypergraph embeddings are not good for predicting links between nodes that are dissimilar form each other. For example, in a bipartite graph with bipartition A,B each edge must connect one block of the partition to the other block, but the GMNO hypergraph embedding will not have any way of knowing that the nodes in A should not be connected to different nodes in A.

Comparison with spectral clustering: GMNO hypergraph embeddings are closely related to the eigenvalues of the Laplacian matrix of a graph for a few reasons. First of all, the eigenvalues of the Laplacian are always unique while the GMNO hypergraph embeddings are often unique. There is also a loss function resembling the loss function f\mapsto L(f,m,w) where the input that minimizes this loss function is simply the bottom d-non-zero eigenvectors of the Laplacian matrix.

A potential application: positional embeddings GMNO hypergraph embeddings can be used to produce positional embeddings for convolutional neural networks. For example, one can represent a 64\times 64 grid as a hypergraph. One can also add additional hyperedges to this hypergraph that represent regions in images that the CNN is training on. Suppose that f:\{0,\dots,63\}\times\{0,\dots,63\}\rightarrow K^d is a GMNO embedding with respect to this hypergraph. Define \hat{f}:\{0,\dots,63\}\times\{0,\dots,63\}\times\{1,\dots,d\}\rightarrow\mathbb{R} by letting \hat{f}(i,j,k)=f(i,j,k). Then a positional embedding layer is a function M:\mathbb{R}^{\{0,\dots,63\}\times\{0,\dots,63\}\times A}\rightarrow \mathbb{R}^{\{0,\dots,63\}\times\{0,\dots,63\}\times A\cup\{1,\dots,d\}} defined by letting M(g)=g\cup\hat{f}. By including a positional embedding layer in a CNN, the network has not only information about the content of the image but the location of that content as well. We recommend using GMNO embedding for this positional embedding layer and using a positional embedding layer in CNNs (of course, it may be easier to add the positional information simply by training the CNN, so one may want to do experiments on this).

Embeddings from generalized means

Suppose that w is a generalized weighted hypergraph structure on a vertex set V and m:V\rightarrow[0,\infty) is a multiplicity function. Let M be a weighted mean function (for example, M could denote the weighted $p$-mean for some non-zero real number p). Let F:[0,\infty)^d\rightarrow[0,\infty) be a continuous function such that F(\lambda x_1,\dots,\lambda x_d)=\lambda\cdot F(x_1,\dots,x_d) whenever \lambda,x_1,\dots,x_d\geq 0.

Then define \huge L_{M,F}(f,m,w)=\frac{M((w(s),\|\sum_{u\in\text{Dom}(s)}s(u)\cdot f(u)\|)_{s\in\text{Dom}(w)})}{\text{Tr}(F(\sum_{v\in V}m(v)f(v)f(v)^*))}.

Then if f maximizes L_{M,F}(f,m,w), we say that f is an MNO (mean of norm optimized)-hypergraph pre-embedding. If also the matrix \sum_{v\in V}m(v)f(v)f(v)^* is a diagonal matrix with increasing diagonal entries, then we say that f is an MNO hypergraph embedding. If each entry in f(v_0) is positive, then we say that f is positive for v_0.

Even though I have published this post, I will add more details onto this post later.

Leave a comment