Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The MWHC construction represents minimal (monotone!) perfect hash functions as arbitrary functions to the ceil(log(n)) bits needed to store the rank... where the value happens to be the rank, but could be anything.


... meaning it is an "injective" function that maps unique key-value pairs, correct? Genuinely asking, I have glancing familiarity via their use in assembly algorithms but (a) don't have a formal math/CS background; and (b) haven't read any of the papers recently.


No, it doesn't have to be injective. In theory, the range can be any group. It's k bits in practice (with addition mod 2^k or xor as the group operator), but k need not have any relationship with `lg(|S|)`.


I think we're somewhat talking past one another -- in any case, we'll add more in the README on minimal perfect hash functions and the differences. In short, you'd need to also have a data structure (e.g., a Bloom filter) for checking if the key is in your MPHF and then also a mapping of 1..n MPHF values to your actual values.


There's a classic solution to detecting most missing entries: make your value a pair of a signature and the actual value. m signature bits result in a 2^-m false match rate for keys not in the input map.

Again, the MWHC construction does not need to map the hashed keys to ranks, they can map to the values directly.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: