Zstandard v1.5.0

rektide · on May 14, 2021

When can we bring this to the web? Zstd aka RFC8478[1] is so good. That it can continue to improve at all feels almost unbelievable, but @Cyan4973 &al continue to make it faster, somehow.

Especially on mobile, with large assets, I feel like zstd's lightning fast decompression time could be a huge win. It used to be that Brotli was the obvious choice for achieving high compression, but it doesn't feel so clear to me now. Here's one random run-off between the two[2].

The other obvious use case is if there is large-ish dynamic-ish data, where the cost of doing Brotli compression each time might be too high.

[1] https://datatracker.ietf.org/doc/html/rfc8478

[2] https://peazip.github.io/fast-compression-benchmark-brotli-z...

meltedcapacitor · on May 14, 2021

Brotli is such an ugly hack (hardcoded dictionary with a snapshot of the world as it looked from Mountain View on some random day...), the quicker it dies the better.

duskwuff · on May 14, 2021

The contents of that hardcoded dictionary are really weird, too. It includes a lot of bizarrely specific entries like:

    in popular culture
    Holy Roman Emperor
    It is important to
    examples include the
    have speculated that
    aria-hidden="true">·<
    (new Date).getTime()}
    :url" content="http://

Many of the English phrases (like "in popular culture"!) are highly characteristic of content from the English Wikipedia. I've speculated before that this may be the result of the use of a compression benchmark which included (or consisted entirely of) content from that site, such as https://cs.fit.edu/~mmahoney/compression/textdata.html

If you're curious, here's a full dump of the dictionary:

https://gist.github.com/duskwuff/8a75e1b5e5a06d768336c8c7c37...

hifly · on May 15, 2021

however the idea seems sound? I can't help wondering why Google who can fund GP-3, couldn't come up with a better initial dictionary? especially for small responses it is a big win

nvllsvm · on May 14, 2021

It's also impossible to identify whether an arbitrary file is compressed with brotli. It lacks a magic number. https://github.com/google/brotli/issues/298

JyrkiAlakuijala · on May 15, 2021

Static dictionary is not why Brotli reaches excellent compression density. Much of it comes from the more advanced context modeling and in general more dynamic entropy modeling.

Brotli wins over Zstd on web compression even when you fill the static dictionary with zeros.

No Mountain View based engineers participated in the development of Brotli, it was built in Zurich. The static dictionary was distilled from a language corpus originally having 40 languages. To reduce the binary size while keeping effectiveness, I reduced it to six: English, Spanish, Chinese, Russian, Hindi, Arabic. What got into the dictionary and in which order it was (both words and transforms) were chosen by how much entropy they reduced in a relatively large test corpus including diverse material (for example over 110 natural languages).

Brotli was originally designed as a faster to decode replacement of LZMA for fonts to enable WOFF2. LZMA was too slow for that use. WOFF2 is just geometry, no natural languages there. There, W3C observed that Brotli performed similarly to LZMA in density but was significantly (~5x) faster to decode -- and enabled the WOFF2 launch.

Unlike SDCH, Brotli's performance does not degrade when the static dictionary ages. This is because Brotli only uses invalid LZ77 commands to codify static dictionary entries. If you don't use them, you are not paying for them. Just like Zstd, users can bring their own dictionaries when they prefer that -- a functionality that was in Brotli before it was introduced in Zstd.

Unlike ZStd, Brotli degrades less for exotic languages heavy on utf-8 (Korean, Vietnamese, Chinese, Japanese, etc.), and especially so for mixed data such as HTML-commands mixed with the above utf-8 sources.

Unlike LZMA, Brotli's context modeling works for very short data, too -- It becomes about 0.6 % worse on gigabyte corpora, but can be 15+ % better on shortest documents.

Unlike Zstd, Brotli offers the data in a streamable order, i.e., hides less data during the transfer. The user can decode much more out of a brotli stream than a respective fraction of zstd stream (shadowed amount is in tens of bits vs. tens of kilobytes in zstd). This is because brotli does not reshuffle the data within the blocks for less cpu-work at decoding time. Any shadowing of data will make other dependant resources loads start later or dependant processing (Javascript or HTML parsing) to happen later.

Most of Brotli's advances come from other algorithms than LZ77. The LZ77 part of ZStd and Brotli are essentially the same. The LZ77 algorithm is proven optimal when the data is infinitely long as well as the sliding window -- this means that the longer the data, the less difference you see. If you benchmark Brotli vs. Zstd with real life data (like HTML pages of 75 kB) you see a different performance behaviour than if you compare them using a 100 MB or 10 GB file. There, most of Brotli's benefits disappear.

In a gigabyte multi-corpus comparison, Brotli still compresses ~5 % more than Zstd. See: https://github.com/google/brotli/issues/642 -- aggregated here: https://encode.su/threads/2947-large-window-brotli-results-a...

Zstd's encoder has seen more work during the years and some features (like the long maching) is missing from Brotli. Reaching parity in the encoder could reduce the gap. An interesting option would be to have a single encoder for both.

Compiling is also difficult: One continual topic has been performance degradations when moving from GCC/CLANG to MSVC. Brotli was never optimized for MSVC and it seems something is going badly wrong there. Also, in the past several benchmarks were comparing a non-optimized build of Brotli vs a release build of Zstd. This was because until summer 2020 Brotli compiled without options produced a non-optimized build -- you'd need to configure release specifically to get it right.

ac29 · on May 14, 2021

Caddy supports Zstd encoding: https://caddyserver.com/docs/caddyfile/directives/encode

On the client end, curl does: https://curl.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html

chrismorgan · on May 15, 2021

You cite RFC 8478 as though it being an RFC gives it weight, but always remember when looking at IETF RFCs to check the Category, which is Informational in this case, not Standards Track.

> Despite use of the word "standard" as part of its name, readers are advised that this document is not an Internet Standards Track specification; it is being published for informational purposes only.

This is an IETF-stream document, which I believe (though I’m not certain) implies that it’s at least a little more than just Facebook saying “here, can we publish this?”, but it hasn’t come through any working group, so it hasn’t been subject to the full IETF experience which is so helpful at improving things by collaboration.

(It’s also obsoleted by RFC 8878, to which the same caveats apply.)

vanderZwan · on May 15, 2021

> It’s also obsoleted by RFC 8878, to which the same caveats apply.

Tangent: I decided to sit down and read that RFC yesterday. So I downloaded the PDF to my e-reader (because it doesn't support mono fonts for the plain text option). Turns out my Kobo H20 for some reason can't render the "fi" and "ff" ligatures, which made it a real struggle to read at times ("compressed les", "Human Encoding Trees").

Does anyone have any advice for how to turn plaintext RFCs into minimal PDFs with an embedded mono font?

https://www.rfc-editor.org/info/rfc8878

rini17 · on May 15, 2021

Print as PDF from a browser or wordprocessor? In LibreOffice Writer you have the option to embed the font.

vanderZwan · on May 15, 2021

... sometimes the simplest solutions are the best. I feel so dumb, thanks!

I could probably even set the page size to the right dimensions in LibreOffice Writer

aidenn0 · on May 14, 2021

Zstd is so much better than the commonly-used alternatives that I get mildly annoyed when given a .tar.{gz,xz,bz2} it's not like it's a huge deal, but a much smaller file (compared to gz) or similarly sized with much faster decompression (comared to xz, bz2) just makes me a tiny bit happier.

apendleton · on May 14, 2021

I agree with the general premise that there's no reason to ever use gzip anymore (unless you're in an environment where you can't install stuff), but interestingly my experience with the tradeoffs is apparently not the same as yours. I tend to find that zstd and gzip give pretty similar compression ratios for the things I tend to work with, but that zstd is way faster, and that xz offers better compression ratios than either, but is slow. So like, my personal decision matrix is "if I really care about compression, use xz; if I want pretty good compression and great speed -- that is, if before I would have used gzip -- use zstd; and if I really want the fastest possible speed and can give up some compression, use lz4."

aidenn0 · on May 14, 2021

A few comments:

1. There are two speeds: compression and decompression; lz4 only beats zstd when decompressing ("zstd -1" will compress faster than lz4, and you can crank that up several levels and still beat lz4_hc on cmopression). bzip2 is actually fairly competitive at compression for the ratios it achieves but loses badly at decompression.

2. "zstd --ultra -22" is nearly identical compression to xz on a corpus I just tested (an old gentoo distfiles snapshot) while decompressing much faster (I didn't compare compression speeds because the files were already xz compressed).

[edit]

Arch linux (which likely tested a larger corpus than I) reported a 0.8% regression in size when switching from xz to zstd using a compression level 20. This supports your assertion that xz will beat zstd in compression ratio.

[edit2]

bzip2 accidentally[1] outperforms all other compression algorithms I've tried handily on large files that are all zero; for example 1GB of zeroes with "dd if=/dev/zero bs=$((1024*1024)) count=1024 |bzip2 -9 > foo.bz2" generates a file that is only 785 bytes. zstd is 33k and xz is 153k. Of course my non-codegolfed script for generating 1GB of zeros is only 38 bytes...

1: There was a bug in the original BWT implementation that had degenerate performance on long strings of identical bytes, so bzip2 includes an RLE pass before the BWT.

jopsen · on May 14, 2021

Most of the time you also care about ease of use and compatibility.

apendleton · on May 14, 2021

Maybe in a generic-you sense ("one also cares"), but if by "you" you mean me, no, most of my compression needs are in situations where I control both the compression and decompression sides of the interaction, e.g., deciding how to store business data at rest on s3, and debating the tradeoffs between cost of space, download time, and decompression time/CPU use. We migrated a bunch of workflows at my last job from gzip to either lz4 or zstd to take advantage of better tradeoffs there, and if I were building a similar pipeline from scratch now, gzip would not be a contender. Adding an extra dependency to my application is pretty trivial, in exchange for shaving ten minutes' worth of download and decompression time off of every CI run.

infogulch · on May 14, 2021

Your comment made me curious what a Zstandard-compressed tar file's extension would be, and apparently it's .tar.zst

viraptor · on May 14, 2021

I understand they probably tried to keep to 3 letters (because history), but I'm unreasonably annoyed they didn't go with .zstd :(

Even the mime type is full application/zstd

xxpor · on May 14, 2021

The only problem I have with it is when I first heard about it I thought it was another name for the old school .Z/COMPRESS algorithm.

rubyist5eva · on May 14, 2021

Is this the same "zstd" compression used in Fedora's btrfs transparent block level compression? I have been thoroughly impressed with it in Fedora 34. If that's true, I had no idea that it was a Facebook project. Color me shocked.

terrelln · on May 14, 2021

Yeah, it is.

The Linux kernel is currently using zstd-1.3.1, and I'm working on getting it updated to the latest zstd version.

post-factum · on May 14, 2021

Looking forward to having modern zstd in-kernel! Thanks for your efforts.

rubyist5eva · on May 15, 2021

Thank you for the great work!

ipsum2 · on May 14, 2021

According to https://en.wikipedia.org/wiki/Btrfs the core developers of brtfs work at Facebook.

> In June 2012, Chris Mason left Oracle for Fusion-io, which he left a year later with Josef Bacik to join Facebook. While at both companies, Mason continued his work on Btrfs.[27][17]

gliptic · on May 14, 2021

It wasn't originally. Facebook hired Yann Collet well after zstd was a working thing.

greatgoat420 · on May 14, 2021

Facebook actually does a decent amount of work on Fedora, and were even part of the force behind using btrfs as the default.

greatgoat420 · on May 14, 2021

> Single file Libs > This move reflects a commitment on our part to support this tool and this pattern of using zstd going forward.

I love that they are moving toward supporting an amalgamation build. I and many others reach for SQLite because of this feature, and I think this will really increase the adoption of Zstd.

felixhandte · on May 14, 2021

Glad to hear it! It's a pretty hefty single file, so it probably won't be qualifying for https://github.com/nothings/single_file_libs anytime soon... but hopefully people find it useful nonetheless.

SergeAx · on May 15, 2021

Fun fact: when we started compressing our analytical events (furry JSON arrays with lots of context) we've dropped about 60% of our total inbound traffic. I was afraid that compression will eat client's battery, but we eventually got a reverse effect: saving energy on radio part, being WiFi or 4G/5G.

nijave · on May 15, 2021

For background transfers reducing processing time can also maximize sleep/low power mode, too (Doze is one of them on Android but I think there's some other low power states)

There's some interesting (imo) research about optimizing cpu governors on Android that details some of the tradeoffs (like running high power on a single core might be better in some cases than longer at low power if the core can be hotplugged/disabled sooner)

vanderZwan · on May 15, 2021

> furry JSON arrays with lots of context

Speaking of context, sincere question: what does "furry" indicate in this one? Because I'm guessing it's not about people who like to dress up as animals, but I'm also not sure what the metaphor of "hairy" data represents. Noise? Roughness?

SergeAx · on May 15, 2021

NB: English is my second language. "Furry" is a superlative form of "hairy" for me.

Above I meant that our analytical events are not short bits of data like "main screen open at 11:25:15 UTC”, but a bunch of stuff, most of it repeating in every element of array: phone OS, vendor, version, connection type, session time, previous screen, etc all grouped into nested objects. No wonder it compressing next to nothing.

vanderZwan · on May 15, 2021

> NB: English is my second language.

That makes two of us, which is probably why I didn't get you at first :). Thanks for explaining!

markdog12 · on May 14, 2021

Good blog post on zstd: https://gregoryszorc.com/blog/2017/03/07/better-compression-...

walrus01 · on May 15, 2021

zstd performance is great on really old and weak CPUs, at its default compression level. I recently tested it with a whole system file tree on a 2005 vintage Pentium M single core (thinkpad x41 tablet) running a bare bones debian install, and was pleasantly surprised.

sudeepj · on May 14, 2021

zstandard continues to amazes me. Compared to zlib (level=4 I think) it seems to have best of both worlds (good speed & comparable compression ratio).