Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Are you agreeing or disagreeing with ars' claim that XZ provides a better compression ratio than Zstd? My data shows that it's true in at least one common use case (distribution of open-source software source archives).

I've seen similar comparative ratios from files up to the multi-gigabyte range, for example VM images. In what cases have you seen XZ produce worse compression ratios than Zstd?



Generally speaking, the top-end of xz very slightly beats the top-end of zstd. However, xz typically takes several times as long to extract. And generally I've seen xz take longer to compress than zstd, as well.

Example with a large archive (representative of compiled software distribution, such as package management formats):

    $ time xz -T0 -9k usrbin.tar 
    
    real 2m0.579s
    user 8m46.646s
    sys 0m2.104s
    
    $ time zstd -T0 -19 --long usrbin.tar 
    real 1m47.242s
    user 6m34.845s
    sys 0m0.544s
    /tmp$ ls -l usrbin.tar*
    -rw-r--r-- 1 josh josh 998830080 Jul 23 23:55 usrbin.tar
    -rw-r--r-- 1 josh josh 189633464 Jul 23 23:55 usrbin.tar.xz
    -rw-r--r-- 1 josh josh 203107989 Jul 23 23:55 usrbin.tar.zst
    /tmp$ time xzcat usrbin.tar.xz >/dev/null

    real 0m9.410s
    user 0m9.339s
    sys 0m0.060s
    /tmp$ time zstdcat usrbin.tar.zst >/dev/null
    
    real 0m0.996s
    user 0m0.894s
    sys 0m0.065s
Comparable compression ratio, faster to compress, 10x faster to decompress.

And if you do need a smaller compression ratio than xz, you can get that at a cost in time:

    $ time zstd -T0 -22 --ultra --long usrbin.tar 

    real 4m32.056s
    user 9m2.484s
    sys 0m0.644s
    $ ls -l usrbin.tar*
    -rw-r--r-- 1 josh josh 998830080 Jul 23 23:55 usrbin.tar
    -rw-r--r-- 1 josh josh 189633464 Jul 23 23:55 usrbin.tar.xz
    -rw-r--r-- 1 josh josh 186113543 Jul 23 23:55 usrbin.tar.zst
And it still takes the same amount of time to extract, 10x faster than xz.


That seems fine -- it's a tradeoff between speed and compression ratio, which has existed ever since compression went beyond RLE.

Zstd competes against Snappy and LZ4 in the market of transmission-time compression. You use it for things like RPC sessions, where the data is being created on-the-fly, compressed for bandwidth savings, then decompressed+parsed on the other side. And in this domain, Zstd is pretty clearly the stand-out winner.

When it comes to archival, the wall-clock performance is less important. Doubling the compress/decompress time for a 5% improvement in compression ratio is an attractive option, and high-compression XZ is in many cases faster than high-compression Zstd even delivering better ratios.

---

EDIT for parent post adding numbers: I spot-tested running zstd with `-22 --ultra` on files in my archive of source tarballs, and wasn't able to find cases where it outperformed `xz -9`.


I think you're missing the point that in terms of tradeoffs people are willing to make: absolute compression ratio loses to 80% of the compression ability with big gains to decompression speed (aka include round trip cpu time if you want something to agree / disagree with, we're not talking about straight compression ratios).

Arch Linux is a case study in a large distributor of open source software that switched from xz compressed binaries to zstd and they didn't do it for teh lulz[0].

[0] https://archlinux.org/news/now-using-zstandard-instead-of-xz...


I'm not missing the point. I'm responding to the thread, which is about whether XZ offers better compression ratios than Zstd.

Whether it's faster in some, many, or most cases isn't really relevant.


Yup and how much better is about 1%. "zstd and xz trade blows in their compression ratio. Recompressing all packages to zstd with our options yields a total ~0.8% increase in package size on all of our packages combined, but the decompression time for all packages saw a ~1300% speedup."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: