When can we bring this to the web? Zstd aka RFC8478[1] is so good. That it can continue to improve at all feels almost unbelievable, but @Cyan4973 &al continue to make it faster, somehow.
Especially on mobile, with large assets, I feel like zstd's lightning fast decompression time could be a huge win. It used to be that Brotli was the obvious choice for achieving high compression, but it doesn't feel so clear to me now. Here's one random run-off between the two[2].
The other obvious use case is if there is large-ish dynamic-ish data, where the cost of doing Brotli compression each time might be too high.
Brotli is such an ugly hack (hardcoded dictionary with a snapshot of the world as it looked from Mountain View on some random day...), the quicker it dies the better.
The contents of that hardcoded dictionary are really weird, too. It includes a lot of bizarrely specific entries like:
in popular culture
Holy Roman Emperor
It is important to
examples include the
have speculated that
aria-hidden="true">·<
(new Date).getTime()}
:url" content="http://
Many of the English phrases (like "in popular culture"!) are highly characteristic of content from the English Wikipedia. I've speculated before that this may be the result of the use of a compression benchmark which included (or consisted entirely of) content from that site, such as https://cs.fit.edu/~mmahoney/compression/textdata.html
If you're curious, here's a full dump of the dictionary:
however the idea seems sound? I can't help wondering why Google who can fund GP-3, couldn't come up with a better initial dictionary? especially for small responses it is a big win
Static dictionary is not why Brotli reaches excellent compression density. Much of it comes from the more advanced context modeling and in general more dynamic entropy modeling.
Brotli wins over Zstd on web compression even when you fill the static dictionary with zeros.
No Mountain View based engineers participated in the development of Brotli, it was built in Zurich. The static dictionary was distilled from a language corpus originally having 40 languages. To reduce the binary size while keeping effectiveness, I reduced it to six: English, Spanish, Chinese, Russian, Hindi, Arabic. What got into the dictionary and in which order it was (both words and transforms) were chosen by how much entropy they reduced in a relatively large test corpus including diverse material (for example over 110 natural languages).
Brotli was originally designed as a faster to decode replacement of LZMA for fonts to enable WOFF2. LZMA was too slow for that use. WOFF2 is just geometry, no natural languages there. There, W3C observed that Brotli performed similarly to LZMA in density but was significantly (~5x) faster to decode -- and enabled the WOFF2 launch.
Unlike SDCH, Brotli's performance does not degrade when the static dictionary ages. This is because Brotli only uses invalid LZ77 commands to codify static dictionary entries. If you don't use them, you are not paying for them. Just like Zstd, users can bring their own dictionaries when they prefer that -- a functionality that was in Brotli before it was introduced in Zstd.
Unlike ZStd, Brotli degrades less for exotic languages heavy on utf-8 (Korean, Vietnamese, Chinese, Japanese, etc.), and especially so for mixed data such as HTML-commands mixed with the above utf-8 sources.
Unlike LZMA, Brotli's context modeling works for very short data, too -- It becomes about 0.6 % worse on gigabyte corpora, but can be 15+ % better on shortest documents.
Unlike Zstd, Brotli offers the data in a streamable order, i.e., hides less data during the transfer. The user can decode much more out of a brotli stream than a respective fraction of zstd stream (shadowed amount is in tens of bits vs. tens of kilobytes in zstd). This is because brotli does not reshuffle the data within the blocks for less cpu-work at decoding time. Any shadowing of data will make other dependant resources loads start later or dependant processing (Javascript or HTML parsing) to happen later.
Most of Brotli's advances come from other algorithms than LZ77. The LZ77 part of ZStd and Brotli are essentially the same. The LZ77 algorithm is proven optimal when the data is infinitely long as well as the sliding window -- this means that the longer the data, the less difference you see. If you benchmark Brotli vs. Zstd with real life data (like HTML pages of 75 kB) you see a different performance behaviour than if you compare them using a 100 MB or 10 GB file. There, most of Brotli's benefits disappear.
Zstd's encoder has seen more work during the years and some features (like the long maching) is missing from Brotli. Reaching parity in the encoder could reduce the gap. An interesting option would be to have a single encoder for both.
Compiling is also difficult: One continual topic has been performance degradations when moving from GCC/CLANG to MSVC. Brotli was never optimized for MSVC and it seems something is going badly wrong there. Also, in the past several benchmarks were comparing a non-optimized build of Brotli vs a release build of Zstd. This was because until summer 2020 Brotli compiled without options produced a non-optimized build -- you'd need to configure release specifically to get it right.
You cite RFC 8478 as though it being an RFC gives it weight, but always remember when looking at IETF RFCs to check the Category, which is Informational in this case, not Standards Track.
> Despite use of the word "standard" as part of its name, readers are advised that this document is not an Internet Standards Track specification; it is being published for informational purposes only.
This is an IETF-stream document, which I believe (though I’m not certain) implies that it’s at least a little more than just Facebook saying “here, can we publish this?”, but it hasn’t come through any working group, so it hasn’t been subject to the full IETF experience which is so helpful at improving things by collaboration.
(It’s also obsoleted by RFC 8878, to which the same caveats apply.)
> It’s also obsoleted by RFC 8878, to which the same caveats apply.
Tangent: I decided to sit down and read that RFC yesterday. So I downloaded the PDF to my e-reader (because it doesn't support mono fonts for the plain text option). Turns out my Kobo H20 for some reason can't render the "fi" and "ff" ligatures, which made it a real struggle to read at times ("compressed les", "Human Encoding Trees").
Does anyone have any advice for how to turn plaintext RFCs into minimal PDFs with an embedded mono font?
Zstd is so much better than the commonly-used alternatives that I get mildly annoyed when given a .tar.{gz,xz,bz2} it's not like it's a huge deal, but a much smaller file (compared to gz) or similarly sized with much faster decompression (comared to xz, bz2) just makes me a tiny bit happier.
I agree with the general premise that there's no reason to ever use gzip anymore (unless you're in an environment where you can't install stuff), but interestingly my experience with the tradeoffs is apparently not the same as yours. I tend to find that zstd and gzip give pretty similar compression ratios for the things I tend to work with, but that zstd is way faster, and that xz offers better compression ratios than either, but is slow. So like, my personal decision matrix is "if I really care about compression, use xz; if I want pretty good compression and great speed -- that is, if before I would have used gzip -- use zstd; and if I really want the fastest possible speed and can give up some compression, use lz4."
1. There are two speeds: compression and decompression; lz4 only beats zstd when decompressing ("zstd -1" will compress faster than lz4, and you can crank that up several levels and still beat lz4_hc on cmopression). bzip2 is actually fairly competitive at compression for the ratios it achieves but loses badly at decompression.
2. "zstd --ultra -22" is nearly identical compression to xz on a corpus I just tested (an old gentoo distfiles snapshot) while decompressing much faster (I didn't compare compression speeds because the files were already xz compressed).
[edit]
Arch linux (which likely tested a larger corpus than I) reported a 0.8% regression in size when switching from xz to zstd using a compression level 20. This supports your assertion that xz will beat zstd in compression ratio.
[edit2]
bzip2 accidentally[1] outperforms all other compression algorithms I've tried handily on large files that are all zero; for example 1GB of zeroes with "dd if=/dev/zero bs=$((1024*1024)) count=1024 |bzip2 -9 > foo.bz2" generates a file that is only 785 bytes. zstd is 33k and xz is 153k. Of course my non-codegolfed script for generating 1GB of zeros is only 38 bytes...
1: There was a bug in the original BWT implementation that had degenerate performance on long strings of identical bytes, so bzip2 includes an RLE pass before the BWT.
Maybe in a generic-you sense ("one also cares"), but if by "you" you mean me, no, most of my compression needs are in situations where I control both the compression and decompression sides of the interaction, e.g., deciding how to store business data at rest on s3, and debating the tradeoffs between cost of space, download time, and decompression time/CPU use. We migrated a bunch of workflows at my last job from gzip to either lz4 or zstd to take advantage of better tradeoffs there, and if I were building a similar pipeline from scratch now, gzip would not be a contender. Adding an extra dependency to my application is pretty trivial, in exchange for shaving ten minutes' worth of download and decompression time off of every CI run.
Is this the same "zstd" compression used in Fedora's btrfs transparent block level compression? I have been thoroughly impressed with it in Fedora 34. If that's true, I had no idea that it was a Facebook project. Color me shocked.
> In June 2012, Chris Mason left Oracle for Fusion-io, which he left a year later with Josef Bacik to join Facebook. While at both companies, Mason continued his work on Btrfs.[27][17]
> Single file Libs
> This move reflects a commitment on our part to support this tool and this pattern of using zstd going forward.
I love that they are moving toward supporting an amalgamation build. I and many others reach for SQLite because of this feature, and I think this will really increase the adoption of Zstd.
Glad to hear it! It's a pretty hefty single file, so it probably won't be qualifying for https://github.com/nothings/single_file_libs anytime soon... but hopefully people find it useful nonetheless.
Fun fact: when we started compressing our analytical events (furry JSON arrays with lots of context) we've dropped about 60% of our total inbound traffic. I was afraid that compression will eat client's battery, but we eventually got a reverse effect: saving energy on radio part, being WiFi or 4G/5G.
For background transfers reducing processing time can also maximize sleep/low power mode, too (Doze is one of them on Android but I think there's some other low power states)
There's some interesting (imo) research about optimizing cpu governors on Android that details some of the tradeoffs (like running high power on a single core might be better in some cases than longer at low power if the core can be hotplugged/disabled sooner)
Speaking of context, sincere question: what does "furry" indicate in this one? Because I'm guessing it's not about people who like to dress up as animals, but I'm also not sure what the metaphor of "hairy" data represents. Noise? Roughness?
NB: English is my second language. "Furry" is a superlative form of "hairy" for me.
Above I meant that our analytical events are not short bits of data like "main screen open at 11:25:15 UTC”, but a bunch of stuff, most of it repeating in every element of array: phone OS, vendor, version, connection type, session time, previous screen, etc all grouped into nested objects. No wonder it compressing next to nothing.
zstd performance is great on really old and weak CPUs, at its default compression level. I recently tested it with a whole system file tree on a 2005 vintage Pentium M single core (thinkpad x41 tablet) running a bare bones debian install, and was pleasantly surprised.
Especially on mobile, with large assets, I feel like zstd's lightning fast decompression time could be a huge win. It used to be that Brotli was the obvious choice for achieving high compression, but it doesn't feel so clear to me now. Here's one random run-off between the two[2].
The other obvious use case is if there is large-ish dynamic-ish data, where the cost of doing Brotli compression each time might be too high.
[1] https://datatracker.ietf.org/doc/html/rfc8478
[2] https://peazip.github.io/fast-compression-benchmark-brotli-z...