One of the coolest parts of zstd is the simple support for custom dictionaries. If you have a lot of mid sized blobs that you want to separately compress (so that you can separately decompress them), you can create a common dictionary that covers the entire corpus. In real world use cases the compression ratio can go from 3x to 9x: https://github.com/facebook/zstd#the-case-for-small-data-com...
What was the finality on the issue of patents with this library? Is there an active covenant from Facebook not to sue users?
This is a little known feature borrowed from zlib, and possibly predates even that.
I ran into this many moons ago while working on a large J2ME app, trying to shrink the zip file. We hit the point where looking a lot the compression algorithm was our next most reasonable step, and since I had prior zlib experience decoding PNGs, I dug into it.
I discovered that Sun had partly exposed the dictionary support and I ended up filing an enhancement request and some numbers for my POC. But as it turns out Sun had already begun work on their dense archive format which achieved a multiple of the improvements I was getting, so it went nowhere.
Alphabetizing the constant pool got us a fraction of the benefit, and I discovered that because of the way the constants are stored, suffix sorting them got us another 1.1% compression so I dropped it.
Which only makes things more confusing. Is it dual licensed, both apply? Or dual license, your choice? That only complicates the patent question further.
The whole patent issue was the BSD + Patents license Facebook used (they addressed the concerns over React by relicensing to MIT and dropping the PATENTS file).
The PATENTS file appears to be present in zstd 1.3.0 but not 1.3.1, so it looks like similar actions have been carried over here?
I'm left with questions though, because the development branch has both a LICENSE file (BSD) and a COPYING file (GPL).
I'd like to be able to load log files in a database and have it take the same space as compressed log file size, not the uncompressed size (which can easily be 20x more). I guess this requires database built on compact data structures or something similar, but surely simple support for dictionary compression for database text fields would help? Text fields are small chunks of text of the same type, so they should be amenable to good compression with the same dictionary. I've searched around and found some support in RocksDB, but that's pretty much it.
Most compression libraries offer dictionary support, although it's somewhat obscure. For example, there is no method in zlib to actually create your dictionary. Bindings and higher level libraries often ignore the dictionary support.
"Zstandard, or zstd as short version, is a fast lossless compression algorithm, targeting real-time compression scenarios at zlib-level and better compression ratios. It's backed by a very fast entropy stage, provided by Huff0 and FSE library."
A very important aspect of this project is the zlibWrapper. It provides a interface for transparently reading from zstd-compressed, zlib-compressed, or uncompressed files with one API. It's quite fast, provides an excellent balance between speed and compression ratio, and is generally my preferred way to work with compressed data.
Their emphasis on branchless coding is even more valuable now than it was when zstd was released, since we are living in a post-spectre/meltdown world.
I dislike Facebook as a company, but I cannot find fault with their engineering.
[Edit/Off-topic: I want someone to make an LZ77/DEFLATE/Led Zeppelin pun-based product sometime. Please. That joke has been begging to be made since at least 1990.]
In this case though the core innovation actually happened pre-facebook. Post facebook it got more polish/tweaking/multi-threading etc. but I don't recall any breakthrough kind of change.
Which algorithm is best is use-case dependent. As of now, zstd offers best-in-class compression for a wider variety of use cases than lz4. lz4 (created by the same author) still wins for high-throughput software compression, yes. But zstd --fast 4 or 5 are getting pretty close.
It's not obvious to me what the relevant measurements are on the zstd side, but I'm pretty sure lz4 wins considerably where code size and RAM footprint are major considerations, as in some bootloader and embedded firmware situations.
I wonder if it's possible to create a blockchain where the proof-of-work consists of building a better compression-dictionary instead of doing useless hashing.
To give a little detail on why this is not possible, proof-of-whatever should be quickly verifiable while hard to generate. Compression might sound like it can achieve both requirements. However, raw data needs to be very large so that the problem is not trivial. Large data is a red flag in any blockchain application.
I wish he’d stop tweaking it at this point. Newer versions are incompatible with older versions, which is a cardinal sin if you want mass adoption for something like this.
Or at least identify a stable subset of some sort and put it into another tool that I would not be hesitant to use.
So when I need something very fast I use Snappy or lz4, and when I need decent compression ratio I use pigz.
Looks like xenial has 0.5, so that's consistent with the grandparent's claim. xenial-updates has 1.3.1, however. Seems like the best option for Ubuntu users would be to pull this from the xenial-updates repo. Also note that Xenial is about to be replaced by 18.04.
Also, be careful with universe packages. They are fast and loose with them. The version of redis commonly deployed on 14.04 has a known security issue that bit us some time ago (CVE 2015-4335). See https://bugs.launchpad.net/ubuntu/trusty/+source/redis/+bug/... .
Zstandard version 0.5 (and all other versions < 0.8.1) were pre-releases where forward compatibility was never intended. However, Zstandard is still backward compatible with versions down to version 0.4. All releases since August 2016 have been forward and backward compatible.
What was the finality on the issue of patents with this library? Is there an active covenant from Facebook not to sue users?