Would you guys mind open sourcing your data set, including the cases where you currently fail (presumably graphs with edge crossings -- I couldn't help but notice that all examples are planar graphs)? I think I may have some ideas how to tackle some of the unresolved issues you state at the end of the blog post.
Yeah, we intentionally omitted edge crossings. There's prior research in dealing with them and those approaches work well (and are clever), so we thought it wouldn't be terribly useful for us to re-invent that part. The papers are linked in the blog post near the end where we compare our approach with previous attempts.
We've then concentrated on different segmentation strategies (the other approaches started with a sensible binarization of the image, which precludes color-based segmentation), as well as getting visual characteristics right, such as shape and color. The algorithms still don't handle cases well where features are much larger than we expect (e.g. photos are very different than screenshots). That'd be certainly an area of improvement.
Our data set is ... basically most of what the screenshots in the article show (one of them opens an album with more images). We didn't have time for a thorough testing of thousands of different graphs. That's something we'd certainly have to do if we'd want to publish anything ;-)