Cache line alignment in C++ – How it makes your program faster

userbinator · on Sept 16, 2023

Alignment and other optimisations that increase size tend to show their greatest benefit in microbenchmarks, but as the size of the data increasingly doesn't fit in the cache, you'll find that the increased cache misses will start decreasing performance.

vvanders · on Sept 16, 2023

That's highly dependent on access patterns and domain space.

When I was optimizing for render sorting on mobile hardware we knew we would never see 2k+ drawcalls and cache aware tuning gave us huge end-user facing benefits.

SmoothBrain12 · on Sept 16, 2023

Exactly. I find splitting your struct into 2 structs based on memory access patterns(1 used all the time and the other for data) is a huge win.

a_e_k · on Sept 16, 2023

I've long had the idea for trying to write a Valgrind tool to help with this by analyzing struct usage. Something to profile how hot and cold the various fields of my structs are, and also to correlate which fields in a struct are frequently accessed together (i.e., within N cycles of each other). A tool for the profile part of "profile before optimizing" to go with the optimizations you mentioned.

I'm not sure how feasible this is. But if someone else wants to steal this idea and implement it for me, be my guest. :-)

loeg · on Sept 17, 2023

The problem with this kind of instrumentation is that it is very expensive to collect, which affects the data collected in a way that may skew it from true runtime performance. Maybe that is still good enough! (It also feels difficult to implement.)

rurban · on Sept 17, 2023

He shows the complete opposite, how to seperate data, so they won't appear on the same cache line. This is of course nonsense for single threaded accesses, but beneficial for concurrent accesses.

ithkuil · on Sept 17, 2023

When the data no longer fits in the cache, you also have to associate a strategy for cache prefetches, if at all possible for your access pattern

eddtests · on Sept 16, 2023

gcc has some compiler arguments to automate this from what I understand (-falign) https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

rsaxvc · on Sept 16, 2023

Currently those GCC flags are only about instruction alignment. The article shows a simple example of runtime data alignment.