Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Cache line alignment in C++ – How it makes your program faster (ryonaldteofilo.medium.com)
49 points by mfiguiere on Sept 16, 2023 | hide | past | favorite | 9 comments


Alignment and other optimisations that increase size tend to show their greatest benefit in microbenchmarks, but as the size of the data increasingly doesn't fit in the cache, you'll find that the increased cache misses will start decreasing performance.


That's highly dependent on access patterns and domain space.

When I was optimizing for render sorting on mobile hardware we knew we would never see 2k+ drawcalls and cache aware tuning gave us huge end-user facing benefits.


Exactly. I find splitting your struct into 2 structs based on memory access patterns(1 used all the time and the other for data) is a huge win.


I've long had the idea for trying to write a Valgrind tool to help with this by analyzing struct usage. Something to profile how hot and cold the various fields of my structs are, and also to correlate which fields in a struct are frequently accessed together (i.e., within N cycles of each other). A tool for the profile part of "profile before optimizing" to go with the optimizations you mentioned.

I'm not sure how feasible this is. But if someone else wants to steal this idea and implement it for me, be my guest. :-)


The problem with this kind of instrumentation is that it is very expensive to collect, which affects the data collected in a way that may skew it from true runtime performance. Maybe that is still good enough! (It also feels difficult to implement.)


He shows the complete opposite, how to seperate data, so they won't appear on the same cache line. This is of course nonsense for single threaded accesses, but beneficial for concurrent accesses.


When the data no longer fits in the cache, you also have to associate a strategy for cache prefetches, if at all possible for your access pattern


gcc has some compiler arguments to automate this from what I understand (-falign) https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html


Currently those GCC flags are only about instruction alignment. The article shows a simple example of runtime data alignment.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: