Any comparisons with Lance/LanceDB?

azimafroozeh · 2025-07-24T18:40:16 1753382416

We haven’t benchmarked FastLanes directly against LanceDB yet, but here’s a quick look at the compression side:

LanceDB supports:

FSST

Bit-packing

Delta encoding

Opaque block codecs: GZIP, LZ4, Snappy, ZLIB

So in that regard, it’s quite similar to Parquet — a mix of lightweight codecs and general-purpose block compression.

FastLanes, on the other hand, introduces Expression Encoding — a unified compression model that allows combining lightweight encodings to achieve better compression ratios. It also integrates multiple research efforts from CWI into a single file format:

The FastLanes Compression Layout: Decoding >100 Billion Integers per Second with Scalar Code (VLDB '23) PDF: https://dl.acm.org/doi/pdf/10.14778/3598581.3598587

ALP (Adaptive Lossless Floating-Point Compression) — SIGMOD '24 https://ir.cwi.nl/pub/33334/33334.pdf

G‑ALP (GPU-parallel variant of ALP) — DaMoN '25 https://azimafroozeh.org/assets/papers/g-alp.pdf

White-box Compression (self-describing, function-based) — CIDR '20 https://www.cidrdb.org/cidr2020/papers/p4-ghita-cidr20.pdf

CCC (Exploiting Column Correlations for Compression) — MSc Thesis '23 https://homepages.cwi.nl/~boncz/msc/2023-ThomasGlas.pdf

westonpace · 2025-07-24T22:16:13 1753395373

Lance contributor here. This sounds about right. We haven't really innovated too much in the compression space. Most of our efforts have been around getting rid of row groups and the resulting changes in decoding patterns.

Our current approach is pretty similar to Parquet for scalar types. We allow a mix of general and lightweight codecs for small types and require lightweight only codecs for larger types (string, binary).

Nice work on the paper :)