Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"While memory-mapping is used for the hashtable, values are read directly from disk, decoupling our I/O throughput from how much memory is available."

Whether you're mmap'ing or using read(), you're hitting the page cache before you hit disc, and potentially evicting the LRU page thereof. Glancing through the source it doesn't look like they're using actual "direct IO" (which, in order to be performant, would have to have its own caching layer).

That being the case, for lots of tiny reads & writes I'd expect mmap to be superior to read() and write().



A caching layer for random reads where the dataset is 10x larger tham memory isn't hugely useful. If you get a hit, great, but you can't count on it.

For memory mapping to make sense you need to fetch a big chunk of data, whereas read() gets a page's worth. For the data size and read pattern described in the post, the latter is much more desirable.


It's the opposite. As I said, read() and a read into a mmap'do array both hit the OS page cache first, and will bring in ~4K of data on miss; read() also has the overhead of a system call. For tiny read/writes the advice is to use mmap. This is different if you're doing "direct io" and bypassing the OS page cache because you have your own caching layer, but I don't think they do.


Whose advice? Check out RocksDB's front page: http://rocksdb.org. Empirically what you're saying isn't true in my experience, rather mmap should be used when there's decent coherency w.r.t. the available memory. Without knowing what you're basing your belief on, I really can't address it.


It looks like they are making some claims about OS level bottlenecks specifically with the virtual memory subsystem. This is something I'd like to look into; all I can find is the particular quote but no explanation of where they think the bottleneck actually lies. The experience of, e.g., the SQLite folks seems to be different.

https://www.sqlite.org/mmap.html




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: