Most importantly, do NOT collect data you do not need to. I don’t see why universities need their students’ SSNs, apart from reporting salaries paid to grad students-as-employees.
As far as I was aware, many private US universities explicitly didn’t collect this data in an effort to maintain deniability when dealing with immigrants (especially “illegal immigrants”).
Of course, many many students in undergraduate life are also employees of the school part time, so presumably they need to provide this information.
For large lecture classes (100+ people), my university (professors/TAs) would post grades on the wall outside the lecture hall. For "privacy", instead of names they used SSNs.
Given the construction of SSNs (first five digits are a key for state-and-date), and our large population of students from different states, reconciling SSN to human was trivial.
Ironically, the foreign students were in better shape because the registrar issued them an ID number which was not their SSN.
It's like the foreign key into the rest of government systems. The id isn't the problem, it's the presumption that the id presented corresponds to the correct person.
This would go away if government's had safe, secure and unique identifiers for all individuals that would have all the necessary data attached to it and stored safely by the government. That way, the only thing a business needs to collect is your unique ID and some secure token controlled by the individual that allows the third party to confirm your data with the government. This whole "privacy" thing is probably a solvable problem if we think outside our comfortable box, but instead we're trying to optimize in the local maxima we've already inherited.
Exactly, it should be like an API token or a signed blob, that allows the ID owner X (the student) to ask the verifier V (part of Govt) to verify ID for query entity Q. This string can be checked by those holding the private key for Q.
X:Q = V->generatepair(Qpub)
// Generate a unique ID stri g for interacting with the university. Not confidential, because not verifiable by anyone.
Tok = V->encode(Xpriv, X:Q, Qpub, property:FullName, Vpriv)
// Generate a token string unique to the pairing of X:Q, for a specified property like FullName, signed by the verifier.
FullName = decode(Tok, Xpub, Qpriv)
// The query entity (university) can decide this blob, but no one else can.
If the Q looses confidentiality of Qpriv and all the Tok, then that data is lost. But having that doesn't let the attacker prove they are X to a different entity.
I'm sure more rigorous schemes have been thought out, but there is so much inertia in changing anything.
All good points, I'd add one more. Make sure you have detection and response capabilities.
Many attacks go from an initial point of compromise to find and attack target information. If you can detect this activity early, it might be possible to reduce the severity of the breach.
I learned a valuable lesson, no matter how “secure” you think your servers are, at some point all the data on them will be compromised.
So...
* Backup - often in many places
* secure it as much as you can
* Encrypt we much as you can
* Airgap sensitive stuff
And even with all of that you will still be compromised. Eventually.
Protect your data so you can spin up on a new server if you need to.
Sigh.