I mean, sure this is compression in the sense that I can send you a tiny "compre...

markisus · on May 5, 2024

But we don’t usually count the decompression program size when evaluating compression ratio. Eg 7-Zip is about 1 MB, but you don’t think about that when evaluating particular 7z files.

vintermann · on May 5, 2024

We do when it's the Hutter prize, otherwise it's easy to cheat.

But sure, it's a constant factor, so if you compress enough data you can always ignore it.

Filligree · on May 5, 2024

We would if it’s a multi-gigabyte program the receiver doesn’t have installed.

viraptor · on May 5, 2024

Maybe not multi-gigabyte, but in a new system/phone in a year, you're basically guaranteed to find at least one tiny model. We may even get some "standard" model everyone can reliably use as a reference.

Filligree · on May 5, 2024

At that point it would be useful. Although, I wonder if it wouldn’t make more sense to train one specifically for the job. Current LLMs can predict HTML, sure, but they’re large and slow for the task.

markisus · on May 5, 2024

Yeah sometimes compression ratio is not the right question when there are other practical concerns like disk space or user experience.

But I do want to point out that almost everyone installs at least one multigigabyte file to decompress other files, and that is the OS.

Filligree · on May 6, 2024

If the only thing the OS could do is decompress files, then we'd be rightly upset at that for being multi-gigabyte as well. :)

Legend2440 · on May 5, 2024

There's an existing idea called shared dictionary compression. Everybody pre-agrees on some statistical priors about the data, and you use them to improve the compression ratio.

This is just the gigascale version of that.