On that subject I've often thought of writing a combination decompressor/JSON parser that can parse the contents of the dictionary before decompression for faster JSON parsing and lower memory usage.
Thanks, but that wasn't quite what I meant. What I was thinking about was processing existing gzipped or zstd compressed JSON rather than having a new compression format.
I could see how customising the compressor could be helpful - a bit like how there's an `--rsyncable` option to `gzip`, but I'd like to keep compatibility with existing formats. I like to avoid coming up with new formats whenever possible because you might want to mine the data in 20 years time - and if it's a custom format it'll be much harder to deal with.
I don't have a current use-case for this, but I like thinking about it. In particular I like data stores to be as simple as possible. Managing persistent state is the hard bit of most systems as unlike with runtime state or code you have to manage forwards and backwards compatibility, migrations, etc. and you can't just "turn it off and on again" to fix issues.
a lot of document key/value databases that store arbitrary JSON tend to work this way (for example Cassandra does this, IIRC). some will even automatically build indices for you (though hand-tuning is usually better)