Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is pretty similar to Sparkey[0] and bam[1]. Sparkey also comes from growing out of cdb's limitations. It supports block-level compression like Riffle does, and is optimized for accepting bulk writes. Riffle's linear-time merge behavior lifted from Sorted String Tables is a nice alternative to accepting writes at runtime. bam is cool in that it takes a plain separated values file as input, and builds an index file from a minimal perfect hash function over the input file.

[0]: https://github.com/spotify/sparkey [1]: https://github.com/StefanKarpinski/bam



There are a lot of variants on this design out there, I had seen 'bam' but not the Spotify implementation. An additional constraint we had that I didn't allude to in the post was avoiding JNI, which adds some nasty failure modes for remote installations that can be very hard to debug. This meant any C implementation was off-limits for us.

It's unfortunate that using the JVM means that some wheels need to be reinvented, but those are the breaks, I guess.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: