We haven’t benchmarked FastLanes directly against LanceDB yet, but here’s a quick look at the compression side:
LanceDB supports:
FSST
Bit-packing
Delta encoding
Opaque block codecs: GZIP, LZ4, Snappy, ZLIB
So in that regard, it’s quite similar to Parquet — a mix of lightweight codecs and general-purpose block compression.
FastLanes, on the other hand, introduces Expression Encoding — a unified compression model that allows combining lightweight encodings to achieve better compression ratios. It also integrates multiple research efforts from CWI into a single file format:
Lance contributor here. This sounds about right. We haven't really innovated too much in the compression space. Most of our efforts have been around getting rid of row groups and the resulting changes in decoding patterns.
Our current approach is pretty similar to Parquet for scalar types. We allow a mix of general and lightweight codecs for small types and require lightweight only codecs for larger types (string, binary).