Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> In particular, binary search has awful cache locality.

That makes me think of an idea which I'm certain can't be original, but I can't think of the jargon for what it's called.

Suppose you have a fixed-length sorted array, and its binary-search can be visualized as a tree. Traverse that tree in level-order and pop it into a helper-array, arranged much like a heap: The midpoint is first and all left and right children are proceed from there at offsets that you can predict.

So the original array [A,B,C,D,E,F,G] has a data-structure for searching of [(D,3), (B,1), (F,5), (A,0), (C,2), (E,4), (G,6)]

While the high-indexed items will get spread out, the most common and sequential activity occurs all together near the lower indexes, minimizing large jumps. Since each entry has the original index stored, you can always switch back to the initial array when a different strategy (e.g. linear scan) becomes preferable.



Take a look at ARRAY LAYOUTS FOR COMPARISON-BASED SEARCHING. TL;DR there are schemes like this called "Eytzinger" and "van Emde Boas".

https://arxiv.org/ftp/arxiv/papers/1509/1509.05053.pdf

The topic of arranging keys like this within index pages (or maybe elsewhere) has come up on the pgsql-hackers mailing list before:

https://www.postgresql.org/message-id/flat/3B774C9E-01E8-46A...


Here is a writeup of the idea and some numbers.

http://bannalia.blogspot.com/2015/06/cache-friendly-binary-s...


Yes this is a nice trick. You can use all kinds of implicit search trees for this. See also my comment to a sibling with a link to a paper evaluating the performance of this, including the time it takes to compute that new layout: https://news.ycombinator.com/item?id=18881938




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: