Generally the secret sauce in these things is the query optimiser heuristics.
The actual data structures and algorithm are often relatively simple.
Having said that, I’ve read their whitepaper on how they implement hash tables, and… it’s way more complex than I had assumed.
They cater for scenarios like many duplicated keys, parallel construction, unbalanced load across CPU cores, etc…
Generally the secret sauce in these things is the query optimiser heuristics.
The actual data structures and algorithm are often relatively simple.
Having said that, I’ve read their whitepaper on how they implement hash tables, and… it’s way more complex than I had assumed.
They cater for scenarios like many duplicated keys, parallel construction, unbalanced load across CPU cores, etc…