Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
100M clear text passwords stolen from Russia's biggest social network (thestack.com)
134 points by twoshedsmcginty on June 6, 2016 | hide | past | favorite | 89 comments


Instead of idiot questions like FizzBuzz maybe recruiters should be asking about password storage best practices instead


I've asked these questions in interviews and a lot of programmers seem to know the basics (what a hash is), but seem to fall short when it comes to things like what a salt is, modern storage techniques (bcrypt, scrypt, PBKDF2) and how rainbow attacks can be used to crack hashes.

It's a bit tricky to fail someone in an interview because they don't know security. I think the better way to handle security in a company is to hire a programmer that knows it and teaches it to other programmers during code reviews.


Is there a good article or breakdown on modern best security practices? I am a relatively experienced programmer and know what a hash and salt are, and have read the wikipedia article on rainbow tables (though I now forget what they are other than that they make attacks of certain things easier), but that's the extent of my knowledge.


There's this good link from Thomas Ptacek: https://gist.github.com/tqbf/be58d2d39690c3b366ad


The best one I've come across is actually an answer on the Security StackExchange.

http://security.stackexchange.com/questions/211/how-to-secur...


Here's one. Title: How to safely store your users' passwords in 2016.

https://paragonie.com/blog/2016/02/how-safely-store-password...


That site recommends libsodium which by default uses Argon2. The issue is that Argon2 is not very mature yet. Also if you are using python there is no good library out there. Ideally you want something well tested since it is possible for libraries to have bugs as well.


The best resource I have found is https://www.owasp.org/index.php/Main_Page


Hashing with an insecure algorithm and without good salting isn't much better that not hashing at all though.


Nope, it is much better for those who use passwords with good entropy.


How do you store salt?


That question should never even need to be asked. The library you're using should take care of that for you.

In PHP:

    $storeMe = password_hash($plaintext, PASSWORD_DEFAULT);
    if (password_verify($plaintext, $storeMe)) {
        // Logged in
    }
The detail is totally abstracted away. All sane password hashing libraries offer this API.

See: https://paragonie.com/blog/2016/02/how-safely-store-password...

EDIT - CANNOT REPLY:

  > One major failure of this article:
  >
  > You should generate a random salt for each user and store it alongside the user
  > record in the DB.
No, you shouldn't. Your library should do that for you, and store it as a single string that's opaque to the developer.

  > I completely disagree. This implies that my DB ORM handles password stuff,
  > which doesn't make sense.
See the passlib section: https://paragonie.com/blog/2016/02/how-safely-store-password...


One major failure of this article:

You should generate a random salt for each user and store it alongside the user record in the DB.


How is this a failure? That's correct. (If your password hashing library doesn't handle this automatically, of course. But it does the same internally.)


Is it normal for password hashing libraries to save salts to a database?


Yes, they usually produce a string that looks something like "salt||hash". (Salt is a non-secret value.)

This result of bcrypt:

   $2a$10$N9qo8uLOickgx2ZMRZoMyeIjZAgcfl7p92ldGxad68LJZdL17lhWy
    |/ \| \____________________/\_____________________________/
    |   |        salt                      hash
    |  cost 
    |
 algorithm,
  version

You store this string in the database.


The big thing about this, is that it is perfectly "OK" to store both the algorithm, cost, and salt alongside the hash.

Most people seem to think, and myself included when I was new-to-it, that storing all those things together would compromise the security. The point of the hash is that it is impossible (almost) to get to the hash without the user's password, and there is no way to get to the password with the entire string you posted.


I'm naive about these things, but I was under the impression that salt just thwarted pre-computed hash tables? I guess should be "just" in quotes.

So somebody with resources and motive could still brute-force that string. It seems that storing the salt somewhere else would add a comparable amount of security as the salt itself. It seems prudent along the lines of "don't put all your eggs in one basket."


> but I was under the impression that salt just thwarted pre-computed hash tables?

Yes. Because if you had two users with the password 'dadada' they would hash to the same value

Now 1234:dadada hashes differently then 1326:dadada hence preventing the use of a prehashed table (you could go through all salts for common passwords, but it's usually a bit long as well)


What you're thinking of is called a "pepper" and is discouraged.


Rather than expecting the password hash library to store something into your application DB, you should be managing the access to that DB yourself.

In our case, we use an immutable attribute of each user as their hash. This might be an internal identifier, or the timestamp on which their account was created, or something like that.


Rather than expecting the password hash library to store something into your application DB, you should be managing the access to that DB yourself.

You do manage it yourself. Password hashing library doesn't access your database, it produces a string that you store, which includes salt and password hash.

In our case, we use an immutable attribute of each user as their hash

What? You really need to talk to security-competent people.


> In our case, we use an immutable attribute of each user as their hash.

I assume you mean "as their salt". And even then, why the half-measure? Just laziness? Sure, a guessable/computable salt is better than no salt, but it's not nearly as good as a random salt.


I assume you mean "as their salt"

Yes, thanks for clarifying what I meant to type.

why the half-measure? Just laziness? Sure, a guessable/computable salt is better than no salt, but it's not nearly as good as a random salt.

But isn't the salt essentially safe to make public anyway? That being the case, how does it matter what value you use, so long as it differs between users?


Ideal salt is a large (e.g. 16 bytes or more) random byte string generated for each password.

If there's a reason for it (in most cases, there is none), some trade offs are possible, e.g.:

Salt is a large random string unique per user, not per password.

Given two hashes of passwords for the same user it reveals whether passwords are the same.

Salt is a small random string or some predictable value.

Attackers can precompute guesses and then look them up.

If you use some immutable identifier per user as salt, both of these attacks are possible. Is there a reason for this? Since you already store password hash in your database, I'm 100% certain that it's not, you can generate large random salt per each password hash and store it.

As for "safe to make public": there are many things in crypto called "public" where "public" doesn't mean that the whole world is free to get it, but instead means an opposite of "private", or, as I like to call them, "non-secret". Yes, salt can be made public, but shouldn't (unless there's a reason for it — like in a kind of client-side crypto where server stores salt and sends it to clients) to avoid precomputation.


Salt is a large random string unique per user, not per password.

Of course it's per user.

But "large" makes some sense. My current implementation has maybe 20-22 bits of uniqueness in the salt, certainly less than 16 bytes.

I don't think 16 bytes is necessary even as insurance against the future. Rainbow tables are still expensive to build.

On the other hand, maybe to build just a small table addressing the stupidest passwords ("password","12345678",etc.) it's worth making it more difficult.


Of course it's per user.

What I meant is that it shouldn't be per user, it should be per password. If a user changes his password, he should get a new salt.


> I don't think 16 bytes is necessary even as insurance against the future.

The birthday problem comes into play here.

If you have 22 bits of entropy in your salt, after 2048 users (2^11) you will find two with the same salt, with 50% probability. If they also use the same password, this makes attacking your users much easier.

Don't make it easy for attackers. Use 16 bytes from a CSPRNG. Better yet: Use a password hashing library that takes care of this for you.

If you use a 128-bit (16-byte) salt, you have a 50% chance of a collision after 2^64 passwords.


It being unique goes most of the way, you're right (though hopefully it actually is unique!). I was being dramatic when I said "not nearly as good". But making the salt easily guessable does allow an attacker to precompute rainbow tables, etc. So if there was a breach and an attacker got a dump of your password hashes, it might mean the difference between you having time to invalidate those passwords or not.

Good look at the issue here: http://security.stackexchange.com/questions/41617/do-salts-h...


But then you are using php so you already lost.


If there was any truth at all in that claim, you should be able to compromise paragonie.com right now simply for it running PHP.

Otherwise, that claim is false.


> No, you shouldn't. Your library should do that for you, and store it as a single string that's opaque to the developer.

I completely disagree. This implies that my DB ORM handles password stuff, which doesn't make sense.

Python 3 example:

Syntax: hashlib.pbkdf2_hmac(hash_name, password, salt, iterations, dklen=None)

Example:

>>> dk = hashlib.pbkdf2_hmac('sha256', b'password', b'salt', 100000)

>>> binascii.hexlify(dk)

b'0394a2ede332c9a13eb82e9b24631604c31df978b4e2f0fbd2c549944f9d79a5'

Extremely simple and safe to use. Trivial to store and save/retrieve the salt from DB.


I'm not too familiar with this library, but on inspection this approach seems to have a couple of drawbacks that libraries like bcrypt solve for you:

1) You need to store the salt alongside the password.

2) If you want to futureproof the stretching factor (e.g. change from 100000 to 1000000), you need to store that alongside the password hash as well.

3) If you want to futureproof the hashing algorithm, you need to store that alongside the password hash.

The value of the *crypt solutions is that they store the input parameters as part of the stored secret. So you can make adjustments later on without invalidating existing stored passwords, or having to resort to annoying "double-hashes" to migrate to a new approach.

I don't understand your comment about the ORM needing to handle passwords. It's a simple fetch of a field from the DB, which you then pass as an input to your password validator. How is that any harder than fetching a salt and a hash and passing those to your validator?


I did not realize that the hash and salt are concatenated


Cryptography is hard and best practice should be to use common libraries to help you get it right.


It's not about writing your own crypto. It's about knowing not to use the md5 library but the bcrypt library instead.


Ideally the md5 library documentation says "this is a digest function, not a secure bash"


Those words might as well be in greek.

How about:

You should use the md5 function to produce a string you can compare with the md5 function.

md5 is obsolete, does not offer much security, and should not be used in new programs except for the purposes of interoperating with old programs, and even in that case, one should weigh the risks of interoperation with the costs of replacing.

If you are looking for a suitable replacement for md5, you should instead be looking for specific use-cases, such as content verification (links...), authenticated content verification (links...), password verification (links...), random number generation (links...), shared secret encryption (links...)


In fact you only want to use it when your comparison can afford collisions (md5 generates a lot of collisions). Sha256 is pretty much collision free.


Personally, I think it's quite enough if a programmer at least knows that he doesn't know certain aspects of security — and instead of importing standard library's md5 module, spends at least 20 minutes on Google when faced with such a task.


Recruiters should be administering work sample tests tuned to the signal of what the company makes, and the company should be doing security audits at a regular interval.

It's strange that candidates would be expected to have any knowledge of security when security is very hard to get right. This is the domain of a dedicated specialist, not a generalist who's usually pressed for time and whose work usually isn't audited.

If you want to get security 90% right, you can spend a few days doing the cryptopals challenges. But you need to get security 100% right, because 90% right means your systems can be compromised. A situation like that pretty much demands security audits.

(It doesn't help that the go-to answer of "Who should we contact to get a security audit?" is "Someone that will charge half a year's salary of a decent engineer," though.)


Oh come on. Asking someone about password ecryption is basic computer science and general intelligence . Its. It about writing bcrypt. Its about thinking for five minutes about why we have passwords and what protection they need to be meaningful passwords.


That you would deny someone employment at your company based on their lack of bcrypt knowledge -- knowledge which can be gained in five minutes on Google, which candidates don't have access to during an interview -- is evidence of how broken our hiring processes are.

If you want secure systems, audit them.


Auditing isn't enough in my experience. Most audits are just focus on paper security. They'll run nessus and some other automated tools, do no filtering and have your developers actually check any results, and then go through an extensive box checking exercise that is incapable of handling any app specific context.

In the worst case, they may recommend outdated or harmful practices that will actually lower security. For example, complex password rules and rotations instead of strength checking and password managers. This leads to users just writing things down since they can't memorize ever changing passwords. Or they may force you to install antivirus on systems that don't need it, even though antivirus systems have been shown to be full of security flaws that actually weaken system security when installed into privileged or sensitive subsystems.

Paper security is distinct from genuine security. They overlap certainly, but paper security is not enough to make you secure.


I think you underestimate the difficulty of this concept. It's natural to me now, but I remember when I was first learning about this stuff, it took me months of on and off again study to understand why you do stuff, how you do stuff, and what's dumb to do, and that's just for learning the basics. If you hang out with someone who know's what they're doing, they can teach you what to do in a few pithy sentences, but the stuff about entropy and the differences between encryption, authentication, and hashing etc take time, as well as understanding what the nature of the likely attacks against these are.

That is to say, you can tell someone to google and they'll find "Use bcrypt", but they'll still feel scared and confused because they don't even know if this is quality suggestion or not let alone what bcrypt is doing and why it's better than MD5.



That would be useful, but won't solve the problem alone. It's highly likely there were any number of engineers in these companies fuming about poor practice, but lacking management support to prioritise it.

Look at how TalkTalk ignored a security researcher's warnings a year before incurring a major attack. They were also storing in plain-text, available to support staff, and came up with comments like "We're squeaky-clean on security". This is a PR response, not one from the engineers who know what's going wrong. https://paul.reviews/value-security-avoid-talktalk/


In theory, everyone knows what's the best (or at least a good enough) way to do something like this.

In practice, there's a lot more things involved leading to stupid decisions like this. Something that was supposed to be temporary made permanent by growing technical debt, unclear responsibilities, moving priorities, etc.

Security is never any company's top priority just because it's not visible. Changing a color of a button is normally more important.


Yeah, I'm currently taking a computer security class for master students and we spend quite some time on passwords. Made me think that this should be in an obligatory course for bachelor students (or some stripped down version that teaches the basics).

It's mind-boggling how many well-known companies stored passwords badly.


As another comment mentioned I think the issue is far less that computer scientists don't know good PW practice and far greater that management doesn't care to give them time to work on it.


Official local news: This never happened.

Slightly less advertised news: This never happened, the data is from a 2011-2012 leak, and everyone from it was forced to change passwords long ago.

The "33,236 times" password is actually mildly obscene, meaning something close to "everything went to hell". Doesn't make any special sense to me.


Does the number 0211 after it hold any significance?


My guess: someone was registering thousands of bots using common swearing words + some number as their password, and this particular spam campaign happened to be particularly large.


Can't think of any. Could be November 2nd, could be February 2011, could be whatever.

It's not a reference to anything i'm aware of.


Thanks I was wondering what it meant. Lol. Guess it's a bit of a prophetic password


I'm sorry, I hate pointing anything negative out. Was it a recent change to store passwords in clear text?

It's worrying because the founders of VK started Telegram which claims to be end-to-end encrypted.


No, in 2007 they even sent your password to email immediately after creating account: https://twitter.com/dchest/status/739804779296219136

There was also "remind password" feature: https://twitter.com/extractor/status/739801634423857152

Also, they used to store MD5(password) in cookies.

Yes, these are the same people who made Telegram.


Sending your password in plain text in email doesn't mean it's stored in plain text; it could be copied from memory into the email before being discarded at the end of execution of the initial request.


If you ever find a service that is stupid enough to send password by email, but smart enough to store it hashed, please let me know.

Also, you missed a part of my comment where I said that they sent passwords by email when you clicked "I forgot password".


> Almost as surprising is the 24,309 times that ‘marina’ is found as a password here

It's a common name, apparently the English speakers traditionally translated Greek name Μαρίνα (Marina) to "Margaret."


Clear text passwords? I thought it's 2016. If you are not using bcrypt/scrypt or similar in 2016 then you should change profession.


The founder now runs secure messaging service with roll-your-own crypto.


Yep. Pavel Durov was forced out by various Kremlin connected folk for daring to stand up to the FSB and the Kremlin. He also went public about their unlawful requests. He then fled Russia.

edit: ugly sentence structure.


Just yesterday I clicked on the "Forgot Password" link on the AutoTrader.com website. I was expecting a reset link in my email and instead they just emailed me my password in cleartext. This is a huge website! Not some small business. It completely baffles me.

I wish that there was some way to shame these companies. I've seen some websites that list some of these offenders but they don't appear to be effective enough. I want news articles written about these companies in the magazines that the CIOs care about, with their photo right next to the article.


This seems to be the most popular collection: http://plaintextoffenders.com/

There should be a warning in the browser that these websites are known to store password in plaintext. This can be achieved by an extension too.


> This can be achieved by an extension too.

In the website you linked, there is a "3rd party tools" link [1] that has a Chrome extension [2] and a Firefox addon [3] that do just that.

[1]: http://plaintextoffenders.com/tools

[2]: https://chrome.google.com/webstore/detail/plain-text-offende...

[3]: https://addons.mozilla.org/en-US/firefox/addon/plain-text-of...


Yes, it's 2016, but they say that passwords were collected in 2011-2012/2013. Also, phrasing in Russian sources suggests that passwords were not dumped from DB, but actively collected by other means.


To get close to 50% of the user base? Sounds like a very very high success rate.


Or PBKDF2, which would be my preference in any stuffy corporate environment. Nobody ever got fired for following NIST/FIPS standards.

I'd use bcrypt if I were working on my own project though. I think scrypt goes too far, I would actually be concerned about its speed and memory utilization hurting responsiveness for any sort of web server or other latency-sensitive system.


You just dropped like 80% of the workforce


I just logged into VK to change my password. It happily accepted a 30-character generated password (alpha-numeric with upper/lowercase and special characters). However, i was not able to login with it after changing the password. Tried using their password recovery tool that texts you an MFA code. It never came. Attempting to send another one gave me "You exceeded daily attempts limit" error message. I guess tomorrow i'm going to TRY to login to just delete the damn account all together.



Seen that site being spammed all over Reddit. At best, grey, at worst, scummy.

Edit: LeakedSource has been wrong in the past and will be wrong again. https://news.ycombinator.com/item?id=11805691

They're leeches of the worst kind.


I didn't know that. Thanks.

I just happened to find more information on the leak on the page that I linked.


What makes "PolniyPizdec0211" such a popular password?


Probably a bot creating those users for various reasons.


sidenote: it roughly translates to "total shit"


Discussed in other thread. I assume those users were created automatically.


Looks like it's being sold for 1 BTC [1]. Screen Shot:

https://www.instagram.com/p/BGUE5H2yqa_/

[1] http://trdealmgn4uvm42g.onion/listing/3716


judging from that 'gram shot, it looks like VK didn't bother actually deleting user profiles (a lá Ashley Madison). Will be interesting to see the fallout from this, if confirmed.


OT question: VK is the "Russian Facebook". The article claims it has 280 million users, but a quick Google shows Russia has a population of 143 million.

What gives? Russian speaking countries? Multiple accounts per user?


VK is actually available in essentially all modern languages and is very popular in Belarus, Ukraine, and Kazakhstan; and sometimes used in countries with large numbers of immigrants from those four countries (Canada, Israel, etc) to check on family members or friends.


it's not exclusively used by people in Russia, people in russian speaking countries, ukraine and other eastern european countries use it too.


What if the Kremlin asked them to not encrypt passwords, so they could be harvested?


Can anyone send a download link? I'm dying to analyze the passwords


Isn't this by the same folks who brought us Telegram?


It's crazy how passwords are stored in these sites with millions of users. Secure password storage is one of the top priorities of mine when I am training internees or teaching someone web app dev.


There's an old saying, "the world runs on shitty code". I always think of that when I see headlines like these.


'dadada' password...




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: