Why learn Graph Theory? Here are some reasons ...

daeken · on March 16, 2010

I have to throw my personal favorites into the ring: decompilation and deobfuscation.

In either case, after a certain amount of analysis you'll end up with a set of nodes (typically basic blocks) which connect to each other, but they'll be what I call 'immature' nodes. You know they connect, but they all effectively end with gotos. To make any sense of them, you have to make them into mature nodes: if, if-else, while, do-while, for, etc. Via graph reduction this is possible.

For instance, a few years back I was reverse-engineering Apple's Fairplay DRM and ran into an odd obfuscation scheme. There were dozens of functions which had the same structure: a switch inside a while statement; a state machine. After some hand-analysis, I figured out that at the end of each state, it'd set the state variable to either a constant value (a jump) or one of two conditional values (a conditional branch). At this point, it became clear that they were simply taking basic blocks and turning them into a big obfuscated state machine.

To make this into something worthwhile, I wrote a script that would go through the function and 'execute' it, to determine the state connections automatically. Then from there, I wired them up into a big graph, with two types of nodes: Fallthrough (no conditional) and Branch (conditional). Now, that helps a lot in terms of showing you what's happening, but not nearly as much so as control flow structures do, so the battle was only half over.

Now, if you look at common control flow structures from a graph perspective, they all look pretty straightforward (using something like dot graph notation):

    A; if(X) {B} else {C} D  becomes A -> B, A -> C, B -> D, C -> D.
    A; if(X) {B} C  becomes A -> B, A -> C, B -> C
    A; while(X) {B} C  becomes A -> B, A -> C, B -> C

So if you walk over the graph looking for these patterns (only looking at single nodes, mind you), replacing them with their mature equivalents, and recursively do that until you don't change anything, you can end up with a fully deobfuscated, cleanly readable result.

tptacek · on March 16, 2010

Great story. Worth pointing out here that reversers have been running with this idea full speed since ~2000. For instance, Halvar Flake and his team at Zynamics use graph isomorphisms of basic blocks in BinDiff, which is the industry standard tool for reversing security patches back to vulnerabilities.

Obviously, anyone who's ever used IDA Pro is familiar with control flow graphs and bblocks; hit the space bar in IDA and you get it graphically.

daeken · on March 16, 2010

Yea, there's a lot of interesting things that have been done with graphs. I'm currently working on a reversing platform, and one of the core things I'm doing is elegant graph manipulation. It's amazing how much you can do when you have nice graphs, e.g. native pattern matching over graphs.

anatoly · on March 16, 2010

Most of these examples are either naturally graphs already, or on the contrary, can be presented as graphs through ingenious reductions (e.g. coloring as an approach to register optimization). So transforming the domain problem to a graph theory problem is either trivial or hard.

I found that a huge benefit of knowing some graph theory lies in being able to approach problems where this transformation is easy - neither trivial nor hard, but easy once you have some experience and intuition working with graphs. Every time you have a bunch of entities, and you can kinda think of their interdependencies as a binary relationship of some sort, there's a warning bulb going off in your brain - "is this a graph problem?". And if it is, then very possibly you immediately get lots of insight into your problem, basically for free.

This occurred to me a few weeks ago as I was thinking about this puzzle: design a circuit that inverts three inputs, but uses only two NOT gates (plus arbitrary amount of ORs and ANDs). It's an interesting puzzle to solve manually, but I also wanted to quickly write a program to find the solution. Circuits are trivially graphs, but there're too many of them (an infinity) to iterate over. Once you see the problem as a reachability problem in a certain fixed graph, however (easy, but not trivial), the code writes itself. Someone with no exposure to graph theory might easily just balk at this easy problem, not knowing how to proceed.

assemble · on March 16, 2010

I can get the circuit with 4 XORs and 1 NOT, or 3 XORs and 1 XNOR. I can't see a way to simplify to just ANDs, ORs and 2 NOTs. It's an interesting problem, were you able to find a AND, OR and 2 NOT solution? Or did you wind up using XORs?

anatoly · on March 16, 2010

No, the requirement is specifically to use just ANDs, ORs and two NOTs. There is a solution, I did find it. It's not particularly easy to find.

lmkg · on March 16, 2010

On a related note, Graph Theory is one of the top reasons to learn Linear Algebra. All graphs (included directed, weighted, and multi-graphs) can be represented intuitively by adjacency matrices, and matrix operations often end up being meaningful in terms of the graph they represent. Seeing the connections between a graph and its matrix helps to understand both of them, and being able to switch back and forth between mental models is often useful. For example, Markov Chains are the go-to tool in many fields of modeling, are most easily thought of as weighted graphs, and are most easily manipulated as matrices.

kmak · on March 16, 2010

Graph theory was easily the most practical thing I learned from programming contests. Most of the applications are "obvious", but only if you play with enough graphs.

ableal · on March 16, 2010

He misunderstood the circuits problem. It's not a graph planarity problem - anything significant is not going to be planar. Only very simple circuits are wireable with less than two layers (and then by, for instance, sneaking a couple of lines under a resistor, etc. - not planar).

But you do get to use graphs to solve the component placement problem, in heuristics that try to minimize total wire length. That hopefully means that wiring delays are smaller, and routing more feasible. One of the classic heuristics involved successive bipartions on the min-cut of the graph.

Nitpicking aside, it's a good post, well argued. My intro book recommendation: http://www.reddit.com/r/mathbooks/comments/aho2h/graph_theor...

michael_dorfman · on March 16, 2010

I like the cheap Dover intro: http://www.amazon.com/Introduction-Graph-Theory-Advanced-Mat...

antester · on March 16, 2010

If this is the same Dover book we used in our graph theory class, I found it difficult to learn from without a professor explaining the unclear sections. Not just doing the exercises, but some of the things they presented as trivial had some important steps missing.

argv_empty · on March 16, 2010

This is an NP-complete problem, so you need to know that no optimal solution can be found in a reasonable time for large problems.

Apparently, the interference graphs that arise from SSA code are a well-behaved sort of graph, colorable in polynomial time.

http://digbib.ubka.uni-karlsruhe.de/volltexte/documents/6532

mark_l_watson · on March 16, 2010

Good summary list. For most developers, there is a nice abstraction barrier for graph theory because tools like neo4j, efficient RDF stores, search libraries, etc. make it fairly easy to build apps without getting one's hands dirty in the details. (But knowing the details makes it more likely to get creative and novel apps.)

Off topic, but: since so many things are well described as graphs, the GraphViz tool is great for communication: few lines of code to write your data to a 'dot' file, and a few seconds later you have a visual representation of your data.

simonw · on March 16, 2010

What's a good curriculum / reading list for learning graph theory? What kind of things do you need to understand before you can check it off the "things to learn more about" list?

RiderOfGiraffes · on March 16, 2010

This book is on-line, free, and the few bits I dipped into are sufficiently accurate.

http://www.ecp6.jussieu.fr/pageperso/bondy/books/gtwa/gtwa.h...

Not sure what it would be like to learn from - certainly you would have to "Read Like Math" and not "Read Like Prose". You would be strongly advised to do the exercises properly and not just skim.

showerst · on March 16, 2010

I have this: http://www.amazon.com/Introductory-Graph-Theory-Gary-Chartra... and quite enjoy it, although it's fairly beginner if you're a math person.

There's also a more thorough list by a real pro under the 'network theory' section here: http://measuringmeasures.com/blog/2010/3/12/learning-about-m...

owinebarger · on March 16, 2010

I agree with his assessment. Any data representation can be construed as a graph - this is the basis of garbage collection. Besides the areas where graphs (or networks) are explicit in a problem, the most powerful use comes in analyzing a given model/algorithm. If you don't take that step to the meta-level, you might not see the use for graph theory.

j_baker · on March 16, 2010

I find that the word "dependency" is a huge indicator that I'm going to need some kind of graph theory.

petern · on March 17, 2010

Awesome post, love the pragmatic and no-bullshit approach! There are some more real world graphy projects on the Neo4j wiki, http://wiki.neo4j.org/content/Neo4j_In_The_Wild

swah · on March 16, 2010

Hm, blog has two entries, both are on HN at this moment... is it considered ok to ask for friends to post links to your blog? Just askin...

RiderOfGiraffes · on March 16, 2010

I found this entry by following the other. This one deserves to be here on merit, hence I posted it.

Disclaimer: I have no connection with the author, my PhD is in Graph Theory.

_tggb · on March 16, 2010

Perfectly fine to do; even better if you post it yourself. If the posts are interesting, they'll get upvoted. If not, they won't.

eru · on March 16, 2010

Yes, self posts are perfectly fine, as long as you don't overspam.

Daishiman · on March 16, 2010

I'm the blog post author. I just uploaded the reply to the language speed article. I was actually very surprised when I woke up today and saw thousands of entries in my access log and referrals from social news sites.

Locke1689 · on March 17, 2010

Can I just add in anything and everything having to do with networks?

I've run into a number of forest problems due to distributed systems.

wlievens · on March 16, 2010

Good post, but it's not like anyone with a bit of imagination can't come up with (most of) this list himself.

timwiseman · on March 16, 2010

This is true of most things.

A few things require true genius, but most things can be done by anyone with enough effort. But few people put in the effort so when someone does and produces good results, they deserve the credit.

sid0 · on March 16, 2010

One very cool thing I learned today is that knowledge amongst a number of people, with the help of modal logic and Kripke possible world semantics, essentially reduces to a graph reachability problem.