Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Zstd is so much better than the commonly-used alternatives that I get mildly annoyed when given a .tar.{gz,xz,bz2} it's not like it's a huge deal, but a much smaller file (compared to gz) or similarly sized with much faster decompression (comared to xz, bz2) just makes me a tiny bit happier.


I agree with the general premise that there's no reason to ever use gzip anymore (unless you're in an environment where you can't install stuff), but interestingly my experience with the tradeoffs is apparently not the same as yours. I tend to find that zstd and gzip give pretty similar compression ratios for the things I tend to work with, but that zstd is way faster, and that xz offers better compression ratios than either, but is slow. So like, my personal decision matrix is "if I really care about compression, use xz; if I want pretty good compression and great speed -- that is, if before I would have used gzip -- use zstd; and if I really want the fastest possible speed and can give up some compression, use lz4."


A few comments:

1. There are two speeds: compression and decompression; lz4 only beats zstd when decompressing ("zstd -1" will compress faster than lz4, and you can crank that up several levels and still beat lz4_hc on cmopression). bzip2 is actually fairly competitive at compression for the ratios it achieves but loses badly at decompression.

2. "zstd --ultra -22" is nearly identical compression to xz on a corpus I just tested (an old gentoo distfiles snapshot) while decompressing much faster (I didn't compare compression speeds because the files were already xz compressed).

[edit]

Arch linux (which likely tested a larger corpus than I) reported a 0.8% regression in size when switching from xz to zstd using a compression level 20. This supports your assertion that xz will beat zstd in compression ratio.

[edit2]

bzip2 accidentally[1] outperforms all other compression algorithms I've tried handily on large files that are all zero; for example 1GB of zeroes with "dd if=/dev/zero bs=$((1024*1024)) count=1024 |bzip2 -9 > foo.bz2" generates a file that is only 785 bytes. zstd is 33k and xz is 153k. Of course my non-codegolfed script for generating 1GB of zeros is only 38 bytes...

1: There was a bug in the original BWT implementation that had degenerate performance on long strings of identical bytes, so bzip2 includes an RLE pass before the BWT.


Most of the time you also care about ease of use and compatibility.


Maybe in a generic-you sense ("one also cares"), but if by "you" you mean me, no, most of my compression needs are in situations where I control both the compression and decompression sides of the interaction, e.g., deciding how to store business data at rest on s3, and debating the tradeoffs between cost of space, download time, and decompression time/CPU use. We migrated a bunch of workflows at my last job from gzip to either lz4 or zstd to take advantage of better tradeoffs there, and if I were building a similar pipeline from scratch now, gzip would not be a contender. Adding an extra dependency to my application is pretty trivial, in exchange for shaving ten minutes' worth of download and decompression time off of every CI run.


Your comment made me curious what a Zstandard-compressed tar file's extension would be, and apparently it's .tar.zst


I understand they probably tried to keep to 3 letters (because history), but I'm unreasonably annoyed they didn't go with .zstd :(

Even the mime type is full application/zstd


The only problem I have with it is when I first heard about it I thought it was another name for the old school .Z/COMPRESS algorithm.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: