Notice that we're talking about de-identified patient data here. There is a utility/privacy trade-off when using data containing private information: on the one hand, the patient's personal information must absolutely be protected, on the other hand, many processes could benefit from the information in such a data collection even without the knowledge of which data point belongs to which individual.
If de-identification is done right, it would be a bit of a stretch to talk about "surveillance" because that's the whole point of de-identification: remove any information from the records that allows a third-party to identify the underlying person from whom the data originated. Note especially that this includes inference attacks, i.e., not only should any occurrences of names be removed/masked but also any information that would allow an informed attacker to re-infer that information, i.e., cross-link the patient data back to a specific person.
The big elephant in the room, however, is the "If" at the beginning of the previous paragraph. As far as I see it, the problem lies not in wanting to establish some functionality that actually uses the collected information but whether appropriate privacy prerequisites have been put in place prior to that.
HIPAA already considers that too much under their safe habour rules, and k-anonymity (expert determination) can hardly be applied if you need to provide full zipcode and have a data-set that will grow/shrink over time.
33 bits of entropy to narrow down to a single individual. ~28 in USA
UID, Gender, Age group, Zipcode and City, plus of course your medication habits, is probably enough to deanonymize with a reasonable amount of confidence. Say age group is one of 8, age+gender is 5 bits of entropy. City zip is ~8. So that's 15 bits left on a good day.
Throw in any off-the-shelf targeted marketing data (usally worth 10-25 bits iirc) and you might as well use SSN as the patient ID.
As others noted, it is de-identified in name only. There is clearly sufficient information to make this something like a ROT13 analogue.
> If de-identification is done right
There is the rub, indeed. As a general rule, I don't trust de-identification. People mostly seem to reason poorly about how datasets can be merged and this has repeatedly failed.
Worse, I have seen it proposed to shut people up about privacy in situations where the proposer knew full well it would fail. De-identification was merely a prop in a con.
I would suggest that, if sensitive de-identified data is to be used by government, it go through a public trial challenge round. Let's let the public give a shot at it, it would build confidence and help suppress a little conspiratorial nonsense too, something we could use right now.
They should put their money where their mouth is and release the de-identified info of the high ranking DEA personnel. If they're so confident it's de-identified, it shouldn't be a problem. If that's a problem for them, the rest of us should definitely not trust it.
That's the rub isn't it? How can anyone think US Intel/LEO agencies will settle for de-anonymized anything? Their definition of anonymous seems to be "we didn't look at it yet."
Issue is once they target an individual as potentially abusing their prescription its only a matter of time before they seek a warrant to properly ID and raid that person. Guilty or innocent, those raids never go well for anyone or their dogs.
States already have similar registries that are not de-identified. Police or other local officials can review these registries almost at will. My provider requires patients to sign an expansive privacy waiver.
The US medical profession completely rolled over and sold out their patients. I can't figure out why, unless it is part of a deal to avoid being pursued or prosecuted for their part in creating the opioid crisis.
If de-identification is done right, it would be a bit of a stretch to talk about "surveillance" because that's the whole point of de-identification: remove any information from the records that allows a third-party to identify the underlying person from whom the data originated. Note especially that this includes inference attacks, i.e., not only should any occurrences of names be removed/masked but also any information that would allow an informed attacker to re-infer that information, i.e., cross-link the patient data back to a specific person.
The big elephant in the room, however, is the "If" at the beginning of the previous paragraph. As far as I see it, the problem lies not in wanting to establish some functionality that actually uses the collected information but whether appropriate privacy prerequisites have been put in place prior to that.