The referenced paper on uniserve: https://petereliaskraft.net/res/uniserve.pdf is interesting, but seems to focus on systems where storage and compute are colocated, but it doesn't discuss (or maybe I skimmed too quickly) more modern architectures where compute and storage are separated (usually with a caching layer built into the compute nodes). In those architectures, most concerns about shifting data around at query time are moot.
Also in my experience building the scatter-gather query functionality and re-aggregation is usually the easiest part. The hard part is figuring out how to build fair multi-tenancy and QoS into what is essentially a massively parallel user facing real-time data lake.
That's a great point, and I definitely agree that supporting disaggregated architectures is important and a potential next step for the project. It raises new challenges--systems like Snowflake need to know a lot about how data is represented on disk in order to efficiently move it around--but it ought to be possible to define new abstractions for those representations (or reuse existing ones) in a way that cuts across a lot of systems.
Also in my experience building the scatter-gather query functionality and re-aggregation is usually the easiest part. The hard part is figuring out how to build fair multi-tenancy and QoS into what is essentially a massively parallel user facing real-time data lake.