However, attempts to contact more than 100 of the addresses in the sample saw many returned as undeliverable with auto-responses reading: "This account has been disabled or discontinued," which might suggest that the data is old.
Or another sad possibility is that this may be representative of any sample of yahoo email addresses.
hotmail / outlook does still delete email from inactive accounts, and then the account itself. you have to log in every <270 days (9 months) to keep the system from deleting your mail and in <12 months to keep your email address/account alive.[1] but you will get emails to your secondary email address as a warning before this happens. gmail also does the same, but likely on a longer time horizon[2].
I still remember too. 28 days? I was easily off-grid for 28 days.
At the moment, we are heavily focusing on "exit strategies" for outsourcing agreements in regulated industries. Banking is leading the way, followed by insurance, but it's hard not to recommend to any CRO in a company with business to lose.
I worked for a listed game company. We often delete inactive account for making a better revenue report. Once an account is deleted,all the money he paid can be immediately confirmed as revenue.
While it is probably true that a big chunk of yahoo addresses are defunct, this could probably be said about most free providers.
The more interesting fact about Yahoo mail is that Verizon now will manage a very large percentage of all free email accounts on the planet now. US estimate for Yahoo, AOL, Verizon would be somewhere near 35-40%.
And before the No Way AOL reply trickles in from somewhere, AOL is still estimated near 10% of total active addresses in the US, and is substantiated by the ESPs. edit* With the demographic of your sample being very linked to that percentage.
There's also a point in cleaning up old accounts in order to protect people's privacy. These might be accounts the user has forgotten or no longer has possibility to access the account (forgotten password). Even though user can't access the account, it might be still accessible to others with malicious intents. Maybe the password was weak, maybe user used the same password in multiple services and it is not publicly available in some datadump.
I still have a very desirable e-mail address from when they initially released a lot of abandoned e-mails, after Mayer took over. I haven't really used it since.
In the interest of making this a learning experience for myself and others, I would like to get any feedback on the following questions.
What would be considered as strong/good/secure password/authentication algorithms if one had to implement this today?
What would you recommend today as a good/secure authentication library that one can use with a micro framework like Python Flask (or others)? What about the authentication library in the batteries included Django framework?
What about recommended general authentication libraries for other web application frameworks such as Phoenix/Elixir, Node based, Go based, etc?
Here are some links from OWASP/Google that offers some details:
What would be considered as strong/good/secure password/authentication algorithms if one had to implement this today?
Use salted bcrypt or another slow salted hashing algorithm. The goal is to make it expensive to crack. For reference, the GPU version of hashcat (cudahashcat or oclhashcat) is very popular, and look at these benchmarks to choose a hashing algorithm and to understand what to avoid like the plague:
Clearly md5 is a no no and bcrypt and a few others are a huge improvement.
However, I feel compelled to add (and please don't flame me for this, just sharing reality) that even MD5 can actually be very hard to crack if your users use a long password (12 chars or above) and a large enough character set. In other words, using 12 char or more passwords with uppercase, lowercase, numbers and symbols is hard to crack even if they used a crappy algorithm like md5.
So the ideal combo is a strong hashing algo and enforcing complex long passwords.
Enforce length. Every other notion of "complex" has a very poor return (in terms of memorability:crackability) compared to just throwing a couple more characters on the end. E.g. if we suppose your "uppercase, lowercase, numbers and symbols" makes 60 possible characters, a 12-character uppercase, lowercase, numbers and symbols has the same level of security as a 15-letter lowercase-only password.
Is it even necessary to actual enforce password complexity? Or is it enough that those characters theoretically could be in the password? The size of the password-space is the same.
Yes, absolutely, for uniquneness and protection of "brute force guessing". It would be hard to find two people using the following password: "2#1mf!fAk", but it is extremely easy to find two people with the password "123456". Essentially you want to avoid to have a password in a lookup table("rainbow table").
And no, when you make the complexity super hard. I really dislike every website has a different complexity rule. Also, be aware that passphrase is a form of password. Instead of choosing "j2%25a^a2" as your password, you can choose a phrase you construct "hacker news is really awesome right buddy" (assuming the website cap at a reasonable length - allowing a long long password is a DOS vulnerability by itself). The pitfall of password/passphrase is reusing common password/passphrase which goes back to the first point - YES.
The real bummer is reusing your password and passphrase for every website. I have several sets of passwords. One set for financial/sensitive website for example.
Whoever is trying to find a password that hashes to the known hash will order the passwords they try in order to prioritise testing more frequently used passwords like aaaaaaaaaaaa rather than mj(8anZ0$uQ,! , so if you can encourage people to choose a less predictable password you increase the cost of discovering the password for an attacker.
Correct. They'll also start with dictionaries of a few hundred million passwords which they'll run through fairly quickly. Then they'll use mask attacks as @ximeng said in ascending order of length and complexity.
While we found some MD5 weaknesses, it's not broken in the general case. If I give you a well salted MD5 hash of a password, your only option is still bruteforce.
While you can produce two files with the same MD5, it doesn't help you to reverse a hash.
What's shocking to me is that Yahoo! still uses MD5 hashes. Those can be decrypted almost instantly with hashcat and tools like it. There's some confusion about the age of data in question but I hope they've moved away from MD5 since the breach occurred.
Too bad as well that very, very few sites support 80 character long passwords.
Ever since I switched to a password manager, I've always made sure that the length of it is the maximum length that the site will accept.
I am getting pretty pissed with the sites that have ridiculous "security" schemes like 1 capital letter, 1 one number, 1 special character, must rotate every three months...but will only allow a password between 8-12 characters long.
Grandparent was suggesting that 80-bit, not 80-character, passwords were sufficient -- though I agree as a rule that more characters is generally better.
MD5 caps your overall complexity at 128 bits (realistically probably 110 bits with modern attacks). Typical english text has about 3 bits/character, completely random ascii is 7 bits/character (though that includes control codes you probably can't put in a password field). So 80 characters of typical English would be ~240 bits (obviously 80 characters of just 'aaa...' would be less), so definitely overkill for MD5; ~36 characters (110/3) is the point where you max out what you can do with MD5, and ~26 characters (80/3) is good enough. You can reduce these numbers a bit by using upper case, symbols etc., but personally I find it easier to remember 26 characters of normal English than 13 characters of random letters and symbols.
> The passwords appear to be hashed - which means they have been scrambled - but the hacker has also published details of the algorithm allegedly used for the hash.
I like how BBC triedto explain what cryto hash does to plaintext, but this is a poor way to describe what hash is because scramble means re-ordering, but hashing doesn't reorder. I hope this reporter takes time to come up with a different way to explain to general public.
This fundamentally changes the meaning of the title. "Yahoo probes possibly huge data breach" implies that there was a data breach, and it may or may not be huge. "Yahoo probes possible huge data breach" implies that there may or may not have been a data breach to begin with.
If you really want to go there, then one could say any reference or dependence on possibility implies probability, which necessarily includes the possibility of the event not happening at all.
I don't think that was the point of the comment you are responding to. Possibly can only attach to "huge", so it parses as "probes [possibly huge] data breach". Whereas possible attaches to the whole data breach and parses as "probes possible [huge data breach]".
Or another sad possibility is that this may be representative of any sample of yahoo email addresses.