Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'd love to see someone with a large dataset containing those variables to see what % of their set overlaps on those.


About a year ago, I did a project for de-duping customer entered contact info. We had less than 100k records for any particular region (Bay Area, New York, etc.) From what I recall, we were able to confirm identity in about 80% of cases using Name, DOB, and zip code.

By far the best metric is phone number, that bumped us up over 97%. If the comparison was limited to only three metrics, then we would use Name, DOB, and phone number. In reality, we also compare zip code, street address, and contacts (like friends on facebook), in that order.

It would be interesting to see if on average, people change their phone number less often than their name (marriage) or address (renters).


That's interesting, considering Google is trying to get me to give them my phone number. puts on a tin foil hat




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: