And of course there is no branching in MemSQL for this use case. And also no hashing b/c number of groups is small and you can use an array and not a hashtable.
Finally if you compress data rather than do the sum on an uncompressed array you will have a lot more compact data representation which would allow you not hit the memory bandwidth ceiling this quickly (4 threads)
When you say an array and not a hash table, do you just mean a simple perfect hash table indexed by the offset of the dictionary id? We use this fairly extensively for inputs of bounded domain (i.e. dictionary-encoded strings, moderately-sized integer ranges, even binned values, numeric or timestamp), but call it a perfect hashing. Assume we're talking about the same thing but wanted to clarify.
I’m still of an opinion that it’s important to demonstrate performance on more complex queries with joins, subqueries, subselects, and clustered data movements. The count(*), group by query is a very very simple case.
Yes. Branching will absolutely hurt. Good old x100 paper teaches how to avoid branching: http://cidrdb.org/cidr2005/papers/P19.pdf.
And of course there is no branching in MemSQL for this use case. And also no hashing b/c number of groups is small and you can use an array and not a hashtable.
Finally if you compress data rather than do the sum on an uncompressed array you will have a lot more compact data representation which would allow you not hit the memory bandwidth ceiling this quickly (4 threads)