Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Yahoo probes possible huge data breach (bbc.co.uk)
184 points by JohnHammersley on Aug 2, 2016 | hide | past | favorite | 52 comments


However, attempts to contact more than 100 of the addresses in the sample saw many returned as undeliverable with auto-responses reading: "This account has been disabled or discontinued," which might suggest that the data is old.

Or another sad possibility is that this may be representative of any sample of yahoo email addresses.


One of my friends had years of correspondence stored in a Yahoo Mail account, including letters from people who had then passed away.

She did a study abroad, and when she came home a year later discovered that Yahoo had closed the account. All of that correspondence was gone forever.


Classic from old Hotmail as well.

Don't think hotmail do this anymore but I am sure I fell victim to it 16 or so years ago.


hotmail / outlook does still delete email from inactive accounts, and then the account itself. you have to log in every <270 days (9 months) to keep the system from deleting your mail and in <12 months to keep your email address/account alive.[1] but you will get emails to your secondary email address as a warning before this happens. gmail also does the same, but likely on a longer time horizon[2].

[2] I worked on outlook.com and http://answers.microsoft.com/en-us/outlook_com/forum/oemail-...

[2] https://www.quora.com/Will-Google-deactivate-my-Gmail-accoun...


The quora link you posted also says that gmail does NOT do that anymore - the clause was removed from their policy.


Likewise. It was a really sad day when I realized I couldn't find e-mails I sent and received when I touched the internet for the first time.

It would have been quite a ride down nostalgia lane.


I still remember too. 28 days? I was easily off-grid for 28 days.

At the moment, we are heavily focusing on "exit strategies" for outsourcing agreements in regulated industries. Banking is leading the way, followed by insurance, but it's hard not to recommend to any CRO in a company with business to lose.


I worked for a listed game company. We often delete inactive account for making a better revenue report. Once an account is deleted,all the money he paid can be immediately confirmed as revenue.


I would think a whole year is a more than reasonable time for a free service to keep your account for.


While it is probably true that a big chunk of yahoo addresses are defunct, this could probably be said about most free providers.

The more interesting fact about Yahoo mail is that Verizon now will manage a very large percentage of all free email accounts on the planet now. US estimate for Yahoo, AOL, Verizon would be somewhere near 35-40%.

And before the No Way AOL reply trickles in from somewhere, AOL is still estimated near 10% of total active addresses in the US, and is substantiated by the ESPs. edit* With the demographic of your sample being very linked to that percentage.


There's also a point in cleaning up old accounts in order to protect people's privacy. These might be accounts the user has forgotten or no longer has possibility to access the account (forgotten password). Even though user can't access the account, it might be still accessible to others with malicious intents. Maybe the password was weak, maybe user used the same password in multiple services and it is not publicly available in some datadump.


I still have a very desirable e-mail address from when they initially released a lot of abandoned e-mails, after Mayer took over. I haven't really used it since.


In the interest of making this a learning experience for myself and others, I would like to get any feedback on the following questions.

What would be considered as strong/good/secure password/authentication algorithms if one had to implement this today?

What would you recommend today as a good/secure authentication library that one can use with a micro framework like Python Flask (or others)? What about the authentication library in the batteries included Django framework?

What about recommended general authentication libraries for other web application frameworks such as Phoenix/Elixir, Node based, Go based, etc?

Here are some links from OWASP/Google that offers some details:

https://www.owasp.org/index.php/Cryptographic_Storage_Cheat_...

https://www.owasp.org/index.php/Password_Storage_Cheat_Sheet

https://docs.google.com/document/d/1R6c9NW6wtoEoT3CS4UVmthw1...


What would be considered as strong/good/secure password/authentication algorithms if one had to implement this today?

Use salted bcrypt or another slow salted hashing algorithm. The goal is to make it expensive to crack. For reference, the GPU version of hashcat (cudahashcat or oclhashcat) is very popular, and look at these benchmarks to choose a hashing algorithm and to understand what to avoid like the plague:

https://gist.github.com/epixoip/c0b92196a33b902ec5f3

Clearly md5 is a no no and bcrypt and a few others are a huge improvement.

However, I feel compelled to add (and please don't flame me for this, just sharing reality) that even MD5 can actually be very hard to crack if your users use a long password (12 chars or above) and a large enough character set. In other words, using 12 char or more passwords with uppercase, lowercase, numbers and symbols is hard to crack even if they used a crappy algorithm like md5.

So the ideal combo is a strong hashing algo and enforcing complex long passwords.


Enforce length. Every other notion of "complex" has a very poor return (in terms of memorability:crackability) compared to just throwing a couple more characters on the end. E.g. if we suppose your "uppercase, lowercase, numbers and symbols" makes 60 possible characters, a 12-character uppercase, lowercase, numbers and symbols has the same level of security as a 15-letter lowercase-only password.


Is it even necessary to actual enforce password complexity? Or is it enough that those characters theoretically could be in the password? The size of the password-space is the same.


Yes, absolutely, for uniquneness and protection of "brute force guessing". It would be hard to find two people using the following password: "2#1mf!fAk", but it is extremely easy to find two people with the password "123456". Essentially you want to avoid to have a password in a lookup table("rainbow table").

And no, when you make the complexity super hard. I really dislike every website has a different complexity rule. Also, be aware that passphrase is a form of password. Instead of choosing "j2%25a^a2" as your password, you can choose a phrase you construct "hacker news is really awesome right buddy" (assuming the website cap at a reasonable length - allowing a long long password is a DOS vulnerability by itself). The pitfall of password/passphrase is reusing common password/passphrase which goes back to the first point - YES.

The real bummer is reusing your password and passphrase for every website. I have several sets of passwords. One set for financial/sensitive website for example.


Whoever is trying to find a password that hashes to the known hash will order the passwords they try in order to prioritise testing more frequently used passwords like aaaaaaaaaaaa rather than mj(8anZ0$uQ,! , so if you can encourage people to choose a less predictable password you increase the cost of discovering the password for an attacker.


Correct. They'll also start with dictionaries of a few hundred million passwords which they'll run through fairly quickly. Then they'll use mask attacks as @ximeng said in ascending order of length and complexity.

https://hashcat.net/wiki/doku.php?id=mask_attack

So it's critically important to enforce length because short passwords will be cracked quickly even with slower hashing algorithms.


I don't know much about prebuilt end-to-end authentication libraries, but I would use bcrypt for hashing passwords.

I've seen conflicting opinions about whether scrypt is better than bcrypt, but I haven't seen strong enough arguments to convince me to switch.


In the past I've used https://pythonhosted.org/Flask-Security/ for Python Flask password management (with bcrypt) and it fit my use case.


Am I right in understanding that the passwords were hashed with MD5? WTF?


They spent an entire weekend on their corporate rebranding. How long do you think they should have spent on security?

http://adage.com/article/digital/marissa-mayer-yahoo-s-logo-...


While we found some MD5 weaknesses, it's not broken in the general case. If I give you a well salted MD5 hash of a password, your only option is still bruteforce.

While you can produce two files with the same MD5, it doesn't help you to reverse a hash.


Problem being that MD5 is incredibly fast, as opposed to e.g. Bcrypt, which makes bruteforcing so much easier.


True. Assuming one round of MD5.


Probably old data then?


Oh man, all of my spam, and mail from people I didn't want having my real address will be compromised.


Its becoming increasingly the case that the idea of semi reliable "privacy" is rapidly disintegrating and perhaps this is not a bad thing


You too? I just logged in for the first time since 2013. It turns out that in 2014 I was really into children's gaming sites. Who knew?


Using the name Peace, the hacker said the data was "most likely" from 2012.

So you're saying as long as you updated your password from the 2014 breach, you should be fine.


Rather timely, given their recent buyout.


A last minute security fix. Funny that Myspace got hacked too apparently


What's shocking to me is that Yahoo! still uses MD5 hashes. Those can be decrypted almost instantly with hashcat and tools like it. There's some confusion about the age of data in question but I hope they've moved away from MD5 since the breach occurred.


Only for weak passwords. Strong (80-bit plus) passwords are still safe, even with unsalted MD5. Too bad people are really bad at choosing passwords.


Too bad as well that very, very few sites support 80 character long passwords.

Ever since I switched to a password manager, I've always made sure that the length of it is the maximum length that the site will accept.

I am getting pretty pissed with the sites that have ridiculous "security" schemes like 1 capital letter, 1 one number, 1 special character, must rotate every three months...but will only allow a password between 8-12 characters long.


Grandparent was suggesting that 80-bit, not 80-character, passwords were sufficient -- though I agree as a rule that more characters is generally better.


wait, which one is strong enough under MD5 hashing? 80 bit or 80 character?


MD5 caps your overall complexity at 128 bits (realistically probably 110 bits with modern attacks). Typical english text has about 3 bits/character, completely random ascii is 7 bits/character (though that includes control codes you probably can't put in a password field). So 80 characters of typical English would be ~240 bits (obviously 80 characters of just 'aaa...' would be less), so definitely overkill for MD5; ~36 characters (110/3) is the point where you max out what you can do with MD5, and ~26 characters (80/3) is good enough. You can reduce these numbers a bit by using upper case, symbols etc., but personally I find it easier to remember 26 characters of normal English than 13 characters of random letters and symbols.


80 bits of entropy (20 really random hex chars) should be safe enough.


The hacked data is supposedly from 2012, which was four years after MD5 was declared broken as a form of encryption.

Hopefully they've moved on by now.


> The passwords appear to be hashed - which means they have been scrambled - but the hacker has also published details of the algorithm allegedly used for the hash.

I like how BBC triedto explain what cryto hash does to plaintext, but this is a poor way to describe what hash is because scramble means re-ordering, but hashing doesn't reorder. I hope this reporter takes time to come up with a different way to explain to general public.


Can I admit seeing this right above the Verizon post (as it is right now) makes me giggle

Well, I'm not sure Verizon is going to love NOT being a dumb pipe...


If some of these accounts have been deleted how has a hacker got a username and password for them?


Old breach from 2012. Just being reported now and some of the accounts breached have expired.


Grammar police: "Yahoo probes possibly huge data breach"


This fundamentally changes the meaning of the title. "Yahoo probes possibly huge data breach" implies that there was a data breach, and it may or may not be huge. "Yahoo probes possible huge data breach" implies that there may or may not have been a data breach to begin with.


Not at all. http://www.merriam-webster.com/dictionary/possibly

If you really want to go there, then one could say any reference or dependence on possibility implies probability, which necessarily includes the possibility of the event not happening at all.


I don't think that was the point of the comment you are responding to. Possibly can only attach to "huge", so it parses as "probes [possibly huge] data breach". Whereas possible attaches to the whole data breach and parses as "probes possible [huge data breach]".


That sentence has a different meaning.


No it doesn't.


It absolutely does.

"Possibly huge" breach -> There's a breach, but we don't know if it's huge or not.

"Possible" huge breach -> We don't know if there's a breach, but if there is, it's a huge one.

"Possible huge data breach" is more accurate in this situation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: