Would you guys mind open sourcing your data set, including the cases where you c...

ygra · on Nov 19, 2019

Yeah, we intentionally omitted edge crossings. There's prior research in dealing with them and those approaches work well (and are clever), so we thought it wouldn't be terribly useful for us to re-invent that part. The papers are linked in the blog post near the end where we compare our approach with previous attempts.

We've then concentrated on different segmentation strategies (the other approaches started with a sensible binarization of the image, which precludes color-based segmentation), as well as getting visual characteristics right, such as shape and color. The algorithms still don't handle cases well where features are much larger than we expect (e.g. photos are very different than screenshots). That'd be certainly an area of improvement.

Our data set is ... basically most of what the screenshots in the article show (one of them opens an album with more images). We didn't have time for a thorough testing of thousands of different graphs. That's something we'd certainly have to do if we'd want to publish anything ;-)