But I *want* a file per range! I’m already writing out an entire chunk of rows, ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		amluto 3 months ago \| parent \| context \| favorite \| on: DuckLake is an integrated data lake and catalog fo... But I want a file per range! I’m already writing out an entire chunk of rows, and that chunk is a good size for a Parquet file, and that chunk doesn’t overlap the previous chunk. Sure, metadata in the Parquet file handles this, but a query planner has to read that metadata, whereas a sensible way to stick the metadata in the file path would allow avoiding reading the file at all.

mrlongroots 3 months ago [–]

I have the same gripe. You want a canonical standard that's like "hive partitioning" but defines the range [val1, val2) as column=val1_val2. It's a trivial addition on top of Parquet.

amluto 3 months ago | [–]

That would do the trick, as would any other spelling of the same thing.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact