Struct of arrays (also called MultiArrayList in Zig), instead of storing big str...

FridgeSeal · on July 22, 2022

Now I remember reading a blog post recently, about an Areta library (one of the rust ones maybe?) that were doing both: array-of-struct-of-array.

The idea was to split the main arrays up into “chunks” that were structs, themselves containing rows of data (still array oriented).

The idea was that you retain the cache/layout/performance benefits of SoA layout, except each chunk is already packaged up into lengths ready for pushing through SIMD. Quite a clever idea.

Found it: https://www.rustsim.org/blog/2020/03/23/simd-aosoa-in-nalgeb...

samanator · on July 22, 2022

Sounds like a primitive of columnar databases.

fb03 · on July 22, 2022

Entity-Component Systems do exactly this for better cache locality

cmpolis · on July 22, 2022

this is a great way to store structured data in js since it saves the memory cost having repeated keys. e.g.:

  records = {
    time: [1000, 1001],
    price: [20, 25],
    volume: [50, 15]  
  }

  records = [
      { time: 1000, price: 20, volume: 50 },
      { time: 1001, price: 25, volume: 15 }
  ]
  // not a big difference with 2 records, but for xxxx records...

hayley-patton · on July 23, 2022

Decent JS engines will use "hidden classes" to dedupe keys for you already, so this isn't necessary to save space; the technique is pretty old and dates to Self. Still, the arrangement may help with locality of reference.

dan-robertson · on July 23, 2022

In practice, most js engines these days can ‘recognise’ the ‘class’ of these objects (if you create them from scratch in a few places) and the memory representation would end up with a word for the ‘class’ which says that time is at field 0 and price at 1 and volume at 2, and then the data itself. The main reason is to speed up code that reads the fields rather than memory use.

pmontra · on July 23, 2022

I remember BASIC in the 80s when all I had were arrays of integers or strings. To have complex data structures I used one array per field. I don't think the CPU had cache though (Z80) /grin

celeritascelery · on July 22, 2022

Wouldn’t that hurt locality? Since now you need to do multiple access across the entire heap to reconstruct one object.

DoubleFree · on July 22, 2022

It depends. For operations or aggregates on a single field, it improves cache locality, whereas if operations act on all, or most, fields of the struct, it hurts cache locality. The exact same tradeoff differentiates OLTP (transactional/row stored) databases and OLAP (analytics/column stored) databases.

adwn · on July 22, 2022

You wouldn't typically use this when you need lots of random accesses, but when you process the data sequentially.

It also simplifies and speeds up SIMD loads and stores, because you can load/store entire SIMD registers with a continuous, non-strided memory access.