I mean, sure this is compression in the sense that I can send you a tiny "compressed text" and all you need is this multi-terabyte model to decompress it!
But we don’t usually count the decompression program size when evaluating compression ratio. Eg 7-Zip is about 1 MB, but you don’t think about that when evaluating particular 7z files.
Maybe not multi-gigabyte, but in a new system/phone in a year, you're basically guaranteed to find at least one tiny model. We may even get some "standard" model everyone can reliably use as a reference.
At that point it would be useful. Although, I wonder if it wouldn’t make more sense to train one specifically for the job. Current LLMs can predict HTML, sure, but they’re large and slow for the task.
There's an existing idea called shared dictionary compression. Everybody pre-agrees on some statistical priors about the data, and you use them to improve the compression ratio.