Hacker News new | past | comments | ask | show | jobs | submit login
An Introduction to Cache-Oblivious Data Structures (rcoh.me)
218 points by rusbus on Feb 6, 2018 | hide | past | favorite | 31 comments



For everyone interested in this, I would like to recommend the MIT Advanced Data Structures course, by prof. Erik Demaine (mentioned in the article). The part of the course about cache oblivious data structures starts on this lecture:

https://www.youtube.com/watch?v=V3omVLzI0WE


Great suggestion -- Demaine's lecture videos were so illuminating for me when I studied cache-oblivious algos. Here's a corresponding paper describing the cache-oblivious Funnelsort algorithm: http://erikdemaine.org/papers/BRICS2002/paper.pdf


I'd like to second that, Demaine is an awesome teacher.


I watched the lectures last week but it was difficult to understand. Can you recommend any beginner data structures course?


As @Cursuviam said, Introduction to Algorithms is a pretty good introduction:

https://www.youtube.com/playlist?list=PLUl4u3cNGP61Oq3tWYp6V...

There are two things that make the lectures difficult to follow, IMO:

The first thing is that you need to be comfortable with the names that are given to things, understand well the simpler data structures like different kinds of trees, linked lists, graphs and so on, with their corresponding operation running times. Go watch Algorithms and you'll be fine :)

The second is that the subject is really hard. During the course, most of the time we're learning about (close to) state of the art data structures, so don't expect to understand everything in one go.

If you want to understand it in depth, you'll probably have to more time studying each lecture. Try thinking about how you'd solve the problem, try to understand the proposed solution and go through the video very slowly making sure you understand the details, try to implement at least some of them.

I used these lectures as a way to get comfortable with the field and get a broad overview of the kind of solutions that usually work, so I still have to go through most of Step 2.


Why not Introduction to Algorithms? (also taught by demaine)


Trivia: Why they are called "cache-oblivious" and not "cache-aware"?

The paper that coined the term is here: http://cacs.usc.edu/education/cs653/Frigo-CacheOblivious-FOC...

Basically, there were first cache-aware algorithms that assumed certain cache sizes and other properties. "Cache-oblivious" algorithms were a refinement that worked well for many cache sizes.

All in all it's silly that the "cache-oblivious" term was the one that survived, because now cache-unaware and cache-oblivious algorithms mean the opposite things - contradicting the dictionary definition of oblivious.


It all become clearer if you replace it in your head by

"cache sizes oblivious" algorithms.


Nope. Still sounds like "having no awareness of cache sizes and lacking a strategy for dealing with them."

No matter how you prefix it, 'oblivious' means you are missing something you are expected to be responsible for. It is a word expressing negative sentiment.


I've read about this for a few mins now, and want to check if my understanding is correct.

There are these classes of algorithms:

1. Cache aware algos are tailored to specific environmental-cache behavior. They may perform poorly in other environments.

2. Cache unaware algos do not consider any cache behavior.

3. Cache oblivious algos take maximum advantage of cache behavior, regardless of what that behavior is.

4. ???? algos, when given cache behavior as input, behave as cache aware algos; are unaware otherwise.


4. Cache adaptive algos?


see: Clipless Pedals


Serverless computing?


wouldn't a cache aware data structure depend on assuming certain cache sizes and other properties to match a cache aware algorithm?


Another fun one is radix sort[1]. We used to use it for sorting alpha'd quads(think particle systems/ui/etc) from front-to-back. Easily annihilates anything else in terms of raw performance for datasets < 10,000 with int/f32 keys and it's stable(!) for identical values.

This talk[2] from Herb Sutter(starting at 22:30) is one of my favorite talks and does a great job of explaining why this is so important(and why dense arrays are still king in many areas).

[1] http://stereopsis.com/radix.html

[2] https://channel9.msdn.com/Events/Build/2014/2-661


The article looks great but a doubling of speed actually seems kind of disappointing (What I'd like to see is would something like "big O" improvement).

I have been investigating cache-oblivious data structures for a while. They're impressive, complex and a bit difficult to fathom (for a cache-oblivious tree, you have to consider both relational structure created by pointers and the layout of all the pointer in memory - the article seems like it gives a nice overview compared to academic articles I've looked at).

The thing is, in most of the large applications I've tried to optimize, you could squeeze several speed doublings out of fairly easy inner-loop rewrites as soon as you started profiling the app. After that, you got more marginal results from optimizing this and that. Consider; if 25% of the time is spent retrieving data, a halving of time to do this results in a 12.5% increase in performance. Is that 12.5% worth the complexity of implementing a cache oblivious structure? Even if you application is a full database, I'm not sure if the trade-offs are worth it.


The advantage of squeezing more performance out of data structures is that you can package them up in standard libraries and provide that performance for potentially thousands of applications out there. Inner-loop rewrites are great but they are tied to a specific application.


Cache friendly algorithms tend to be on the order of 10-50x speed improvement so they're definitely worth it, the problem is that once you've already built your app you've already locked yourself into an architecture that can't be changed enough to take advantage of them.

Dig up some of posts around "Data oriented design" from the gamedev space if you want to see it done proper.


It actually may be an order as you suggest -- the difference gets bigger as the tree gets bigger. I just stopped benchmarking at about 300MB of data where the perf was almost 2x. If you run it on a 1GB tree it might be 4x


Sure, maybe. The problem is that doesn't seem enough to necessarily compensate for the added complexity.

The thing is a self-balancing binary tree is so much faster than the alternatives for large chunks of data that you really have to use it. Here, I'm not so sure.

Anyway, thinking about this lead me to this discussion about fractal trees and LSM trees.

https://www.quora.com/What-are-the-major-differences-between...

Another question here is how will this play out if the high-performance DB is going to be moving to a GPU where cache levels and scans would be quite different.


> Sure, maybe. The problem is that doesn't seem enough to necessarily compensate for the added complexity.

A 50% improvement could be massively significant if processing a huge data-set on thousands of nodes and it is still taking days/weeks/more of wall-clock time. Given the choice between waiting longer, buying more computer power, or using a more complex algorithm, the best of those choices will depend a lot on the circumstances of the project. If it is a one-off then waiting or paying for power will likely win out. Of course in some circumstances paying for more power won't help: you can only get so much out of each node and for some processes factors like interconnect throughput & latency might eat into the ROI significantly as the number of nodes grow.

In some cases a 10% or less boost might be worth the extra complexity. And if the algorithm can be generalised and packages as a library, you aren't actually having to deal with that complexity each time the method is useful.


Neat--I wrote a C++ implementation of a cache-oblivious (implicit) van Emde Boas binary search tree a [long] while back:

https://github.com/lwu/veb-tree/blob/master/veboas.cpp


Dumb question: is it appropriate to describe fractal trees as an "extreme" version of a log-structured merge tree? At a glance (i.e. if you squint really hard), the buffer flushing process in a fractal tree seems kind of similar to the tablet merging process in LSM trees in that they both batch operations to amortize secondary memory access costs, except fractal trees generalize it to all operations.


Yes, additionally fractal trees have a predictable and compact memory structure while LSMTs are more sparse. Average case for queries is also better for fractal structure and implementation is actually easier.


Really enjoyed this post, thanks.

I'm not sure if I really understood how the cache-oblivious solution works though in this case, I can't get an intuition for how the code linked in the github relates to the description in the article.


OP here. It is admittedly a little hairy. The place to start is https://github.com/rcoh/treelayout/blob/8adb9bb9727bdaee671d...

The rough idea is that we layout the top chunk, then the bottom chunks, and then wire them together


Do fractal tree indexes lend themselves to copy-on-write designs? (B-trees do.)

EDIT: Also, are fractal tree indexes roughly like COW b-trees?


Are fractal tree indexes in the public domain, or are they patented?


They are patented but there is a patent grant in place.

https://github.com/Tokutek/ft-index/blob/master/PATENTS

Disclaimer: IANAL. Consult proper legal professionals etc.


One thing I've always wondered with patent grants in open source: it seems to say only that it's okay to use this library, nor necessarily to implement the patented algorithm yourself.

Probably open sourcing and including a patent grant signal that the owner is benevolent and isn't likely to sue you for re-implementing their algorithm, but I don't think that's spelled out anywhere.


In the linked document they go out of the way to specify that the patent grant only applies to "THIS IMPLEMENTATION" of the algorithm. I'm not one hundred percent sure what that means in regards to reimplementing the algorithm.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: