It's a cool example, but really an antipattern. Nowadays everyone gets analysts want access to raw data, since they know which aggregations they need best, whereas data engineers stay away from pre-aggregating and focus on building self-service data access tooling. Win-win this way.
How about building a duckdb accessible catalog on top of s3? Like instead of read_parquet, you would select from tables, which themselves would be mapped to s3 paths aka external tables.
It's a cool example, but really an antipattern. Nowadays everyone gets analysts want access to raw data, since they know which aggregations they need best, whereas data engineers stay away from pre-aggregating and focus on building self-service data access tooling. Win-win this way.
How about building a duckdb accessible catalog on top of s3? Like instead of read_parquet, you would select from tables, which themselves would be mapped to s3 paths aka external tables.