it depends on your query obviously. In general, I did very deep benchmarking of ...

arp242 · on Aug 4, 2023

"2-5x times slower" can mean the difference from 2 seconds to 4 to 10 seconds. Two seconds is still (barely) acceptable for interactive usage, ten seconds: not so much. You're also going to need less beefy servers, or fewer servers.

I also "just" use PostgreSQL for all of this by the way, but the limitations are pretty obvious. You're much more limited in what you can query with good performance, unless you start creating tons of queries or pre-computed data and such, which have their own trade-offs. Columnar DBs are "overhyped" in the sense that everything in programming seems to be, but they do exist for good reasons (the reason I don't use it are because they also come with their own set of downsides, as well as just plain laziness).

zX41ZdbW · on Aug 3, 2023

ClickHouse can do large GROUP BY queries, not limited by memory: https://clickhouse.com/docs/en/sql-reference/statements/sele...

riku_iki · on Aug 3, 2023

as explained in https://github.com/ClickHouse/ClickHouse/issues/47521#issuec... it can't, that parameters only applies on pre aggregation phase but not aggregation.

Feature request is not implemented yet: https://github.com/ClickHouse/ClickHouse/issues/40588

zX41ZdbW · on Aug 4, 2023

ClickHouse uses "grace hash" GROUP BY with the number of buckets = 256.

It can do size about 256 times larger than a memory because only one bucket has to be in memory while merging. It works for distributed query processing as well and is enabled by default.

About the linked issue - it looks like it is related to some extra optimization on top of what already exists.

riku_iki · on Aug 4, 2023

> only one bucket has to be in memory while merging.

its hard for me to judge about implementation details, but per that person reply memory is also multiplied by number of threads which do aggregation.

benn0 · on Aug 3, 2023

Do you have happen to have any documentation about your benchmarking? I'm also considering these options at the moment (currently using pg+timescaledb) and interested in what you found.

riku_iki · on Aug 3, 2023

I don't have documentation.

I just created large tables, and tried to join, group by, sort them in pg, clickhouse, duckdb, looked what failed or being slow, and tried to resolve it.

I am happy to answer specific questions, but I didn't use timescaledb.