I'm not super sure about it being in the filename, if only because my understanding is that some of the lakes use it for partitioning and other metadata (metameta-data?).
Imo range is probably the most useful statistic in a folder/file name anyways for partitioning purposes. My vote would be for `^` as the range separator to minimize risk of collision and confusion. i.e. `timestamp=2025-03-27T00:00:00-0800^2025-03-30-0700` or `hour=0^12`,`hour=12^24`. `^` is valid across all systems, and I'd be very surprised if it was commonly used as a property/column name. Only collision I can think of is that its start-of-line in regex
Maybe convince DuckDB and/or clickhouse-local and/or polars.scan_parquet to implement it as a pilot? If it's a success, other tools might follow suit.
Or maybe something like DuckLake could have an option to put column statistics in the filenames. I raised this as a discussion:
https://github.com/duckdb/ducklake/discussions/92