I'd expect a comparison with compact (or even succinct) constructions for arbitr...

boyd · on May 22, 2024

My understanding is that a perfect hash function maps elements elements to a unique integer (i.e., it's a one-to-one mapping). I think PHF data structures will also always return a value. So if you look up an element not in the constructed PHF, you'll always get a "false positive" value.

In contrast, a B-field lets you map a key to an arbitrary number of (typically non-unique) values. So I could map a million elements to "1", another million to "2", etc.

I'm not especially current (or fluent!) in that literature though, so would love pointers to anything that doesn't have the above constraints.

pkhuong · on May 22, 2024

The MWHC construction represents minimal (monotone!) perfect hash functions as arbitrary functions to the ceil(log(n)) bits needed to store the rank... where the value happens to be the rank, but could be anything.

boyd · on May 22, 2024

... meaning it is an "injective" function that maps unique key-value pairs, correct? Genuinely asking, I have glancing familiarity via their use in assembly algorithms but (a) don't have a formal math/CS background; and (b) haven't read any of the papers recently.

pkhuong · on May 22, 2024

No, it doesn't have to be injective. In theory, the range can be any group. It's k bits in practice (with addition mod 2^k or xor as the group operator), but k need not have any relationship with `lg(|S|)`.

boyd · on May 23, 2024

I think we're somewhat talking past one another -- in any case, we'll add more in the README on minimal perfect hash functions and the differences. In short, you'd need to also have a data structure (e.g., a Bloom filter) for checking if the key is in your MPHF and then also a mapping of 1..n MPHF values to your actual values.

pkhuong · on May 23, 2024

There's a classic solution to detecting most missing entries: make your value a pair of a signature and the actual value. m signature bits result in a 2^-m false match rate for keys not in the input map.

Again, the MWHC construction does not need to map the hashed keys to ranks, they can map to the values directly.