Hacker Newsnew | past | comments | ask | show | jobs | submit | eudaimonia22's commentslogin

Removal of Personally identifiable information (PII) from text data is the first obvious application that came to mind.


if you do that you will likely wipe out a lot of non-person-name data too, just think of common words used as last names, like "Brown"[0].

It seems to me if you have PII in text data you should treat the whole thing as PII.

[0] although the dataset doesn't actually contain "Brown" as a valid last or first name, go figure.


Yeah, calling it "exhaustive" is plainly ridiculous. (It does have "Black" and "Green", if you want alternative examples, though "White" is also missing...) Seems like just scraping author names from a few bibliographic databases could've filled in some of the more obvious holes.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: