Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Viewing Matrices and Probability as Graphs (math3ma.com)
245 points by _Microft on March 6, 2019 | hide | past | favorite | 29 comments


Re viewing matrices as graphs

Under the usual matrices-as-linear-maps interpretation, a nxm matrix is a linear map from the free vector space on m elements to the free vector space on n elements, M : Vect(Y) -> Vect(X), where X and Y are as in the article.

For a regular map of finite sets Y -> X, we can draw a picture showing how each element of Y goes to an element of X by drawing a bunch of dots for Y and a bunch of dots for X with an arrow from each dot in Y to its image in X. This is a directed bipartite graph where each dot in Y is the source of exactly one edge.

How is a linear map, Vect(Y) -> Vect(X), different? Each element of Y goes not to a single element of X, but to a linear combination of elements of X. If y goes to a x1 + b x2, we can think of this as two arrows out of y, each one now being weighted, one arrow to x1 with weight a and one to x2 with weight b (considering edges with weight 0 to be non-existent if you like). This is the graph shown in the article.

This interpretation shows why matrix multiplication is path composition. It's exactly analogous to how for the map of finite sets, you find the image of z by first following its arrow to y in Y, then following its arrow to x in X. Here you first follow z to all its images in Y, then you follow those to all their images in X. Relation composition is only a special case.

And of course the connection between XxY->R and Vect(Y)->Vect(X) is the usual correspondence between a rank (1,1) tensor and a linear map.

Re probabilities: this corresponds to imagining a random process that transforms elements of Y into elements of X where y turns into x with probability p(x,y). The probability that you end up with x (the marginal probability) is the probability that any dot in Y transformed into x, ie. sum over all the arrows that wind up at x.


If you like this idea, also check out https://graphicallinearalgebra.net.

It also describes fundamental linear algebra operations in terms of graphs, but with a slightly different flavor, and much more detail.


thank you so much for sharing! this looks very interesting.


There is an API formalizing this style of matrix/graph computation called GraphBLAS, coming soon to a language and database near you.

Here is an easy to read but math focused introduction to the topic by Dr Jeremy Kepner Head and Founder of the MIT Lincoln Laboratory Supercomputing Center:

http://www.mit.edu/~kepner/GraphBLAS/GraphBLAS-Math-release....


What do you mean "coming soon" -- GraphBLAS 2.3.0 was just released [1] -- it's been here! ;)

[1] http://faculty.cse.tamu.edu/davis/GraphBLAS.html

[2] http://graphblas.org

For more context, depth, and previous discussions, see...

https://hn.algolia.com/?query=GraphBLAS&sort=byPopularity&pr...


It's true. I look forward to seeing more language bindings, like Python, Javascript, etc. I expect it will be soon. For my own GraphBLAS education I have been working on a PostgreSQL extension:

https://github.com/michelp/pggraphblas


The semi-ring described at the end on Z_2 = {0,1} with the 'interesting' addition function where 1+1=1 might be more familiar here as Boolean logic, where:

0 -> False

1 -> True

x -> AND

+ -> OR

(a AND b)=True iff a=True and b=True; (axb)=1 iff a=1 and b=1

(a OR b)=False iff a=False and b=False; (a+b)=0 iff a=0 and b=0


And vice versa. Take a graph, write it down as a matrix, apply a discrete Laplacian operator. Then - just as an example - find the number of distinct non-negative eigenvalues. This is exactly the number of connected components in the graph. Pure magic.


A fun before and after for me was when trying to explore correlations over samples with a lot of features. In particular, on survey results on why teams adopt different programming languages:

V1: Correlations as an interactive heatmap --

http://lmeyerov.github.io/projects/socioplt/viz/rank.html

V2: Correlations as an interactive graph --

https://labs.graphistry.com/graph/graph.html?dataset=PyGraph...

Nowadays when we're dealing with wide event data (security, fraud, operations, etc. investigations), even more fun!


I like how the material was presented. Anyone know what program was used to create the illustrations -- these are fantastic for explanations!


I believe they are hand drawn. Check out more posts on the blog, they are similarly wonderfully presented!


There's a probability theory and associated software package that uses stochastic matrices for inference/conditioning/etc: https://efprob.cs.ru.nl/


There's also graphical linear algebra which takes matrices into a more category theory interpetation that is also visual: https://graphicallinearalgebra.net


> [...] what do matrices corresponding to equivalence relations look like?

Actually, the graphic under "Block matrices correspond to disconnected graphs" illustrates this for us already! (Just mentally replace all the non-zero entries with 1; the graphs structurally remain the same.)

I initially thought it would be "a disconnected set of cliques", but we've already established that we're modeling rectangular matrices as bipartite graphs between the input and output spaces. Square matrices in particular can be considered to operate over the same vector space, so we could have more arbitrary graphs -- such as cliques.


This seems related to the way graphs are often implement in code - https://en.wikipedia.org/wiki/Adjacency_matrix

Just in that case you can start with any graph, not necessarily bipartite, so you don't divide vertices into 2 sets X and Y, just use the same set of all vertices V twice, so the matrix is always square:

    V * V -> R
I wonder if the properties from this article still hold on adjacency matrices. Would be very useful.


Yes, it's really the same graph. There really no difference between drawing two copies of V with arrows going from one to the other, and drawing one copy of V with arrows just staying in that copy. Either way, if you trace the arrows from a given vertex you will end up at the same spots.

So for example the interpretation of matrix multiplication as path composition given here explains why the nth power of the adjacency matrix gives the number of length-n walks.

(I think the precise relation is that if A the adjacency matrix for G, then if you draw the graph from this article for the matrix A, the adjacency matrix of that graph will be the block matrix

    0 A
    A 0
That's assuming the edges in the article are regarded as undirected; if they are directed, just replace one of the As with 0.)


Heh. No one noticed but that should have been

    0 A^T
    A  0
In the matrix-as-arrows-from-dots-in-Y-to-dots-in-X picture, the transpose means "turn all the arrows around". This also explains the connection between the transpose and the categorical dual.


Thanks.

It's crazy to me that I've had graph theory on university, and many graph algorithms, and I don't remember ever hearing that multiplicating adjacency matrices have this properties.

Maybe I just forgot about it.

It would make some algorithms much easier :)


Given this seamless transition between linear algebra and graph theory, could someone summarize how exactly one or the other formulation helps to solve problems better?

I wouldn't want to read up a huge list of things, just trying to get a sense of what's being achieved by this kind of (re)formulation.


See Bayesian networks, Markov chains...


I've been thinking about this lately. Specifically, I see a connection between systems theory and graph theory. My intuition tells me that all relationships between systems can be expressed as graphs.

Can anyone confirm this? I'm not very familiar with either field, so I may be asking a basic question.


You would have to be more precise in describing what you mean by "all relationships between systems". A simple graph (meaning vertices and edges with no additional information) can encode the notions of a system and the existence of relationships between two systems, but without more structure you can't day much more. But this is already useful. If you have a big system you're trying to understand, a good first step is to draw a graph of the major components with edges indicating a dependency.


Systems interact through inputs and outputs, right? I mean, one system's output can be another system's input and even the second system's output can be the first one's input.

I guess my question was if there's something that can't be described or modelled with graphs.

As I said, my intuition is fuzzy, because I don't know much graph theory.


Is there a typo in the header "Matrices with no nonzero entires correspond to complete bipartite graphs"??

Shouldn't it be "Matricies with all nonzero entries" or "Matricies with no zero entries"?


I do recall looking up the definition of 'bijection', and seeing that it slightly resembles a graph. As a generalization: so an n-dimensional tensor can be viewed as a weighted n-partite graph?


An n-partite graph has the problem that edges can still only connect two vertices at a time. I think you would also need to include hyper-edges (faces, solids, etc). The joint probabilities would be represented by these hyper-edges, and the bounding ("lower") edges represent progressively more marginal probabilities. Higher category theory studies these kinds of objects, so it wouldn't particularly surprise me.

I wonder if simplicial complexes would work here? https://en.wikipedia.org/wiki/Simplicial_complex


No, because the space of tensors of order n with k-dimensional arguments is k^n-dimensional, while a complete n-partite graph with partitions of size k has only nk(n-1)k/2 edges, which doesn't give you enough coefficients to uniquely identify a tensor if n > 2.

You'd need a n-uniform n-partite hypergraph, where coefficients are assigned to n-tuples instead of pairs. So for a tensor of order 3, that would mean weighted triangles, which are somewhat hard to draw and everything gets less intuitive.


I'm not following. Why are M12 and M31 ignored in the first example, and why are only edges given values while vertices coordinates are ignored?


It's funny because I could write a "viewing probability graphs as matrices"

For me, probabilities are graphs, and matricess is just a way to represent those.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: