True, but in this case if you can write an invalid hash into a database, you can likewise write a valid one, and as such this doesn't really enable anything.
The one thing this does get you is that the original password would still work (technically any password would still work) so it may make it harder to detect since the user wouldn't "suddenly be locked out"...
> PHP. This is a known weakness in PHP's bcrypt implementation. From Wikipedia, "Many implementations of bcrypt truncate the password to the first 72 bytes." I would hope that they're using a competent implementation that either supports longer passwords or throws an error if it's asked to hash a longer password.
Actually, it's a known weakness in BCRYPT. PHP did not implement bcrypt, it was ported in via crypt(3). Meaning that ALL versions of bcrypt have this issue.
Some implementations error on > 72 bytes, but NONE of them accept longer passwords.
> I don't think we know enough to conclude that they were definitely doing it wrong, but it would be nice to know more details about the algorithm, though.
Given what has been shared so far, there's enough signs pointing that the chances are pretty high they did something wrong. 40 byte salt? Bcrypt only supports a 128 bit salt. So either they did something silly custom (at which point it's no longer bcrypt), they aren't actually using bcrypt, or they did something silly like concatenate the salt + pepper + password and pass it to the password field.
> I would recommend that the author read up on NFAs and DFAs -- they are a formalism better suited to lexers than tries.
Author here. The actual regex implementation uses a NFA. The start of it used a Trie, but it moved away.
The majority of what I wanted to get across here was the use of a minimal structure (single-character).
The next step was using a maximal radix implementation (as long of a prefix as possible). Then finally, throwing all of it away and going straight to parsing using a state machine.
As an attacker, I get SQL access to your DB (meaning no access to the encryption key). I then download the user names, and the hashes. I then attack the hashes offline. I recover only the weakest few percent (since you're using bcrypt). But since the weakest few are those most likely to be re-used (both by different users and by a single user across sites), they are going to be both more valuable to me and easier for the next steps:
Then, I take the highest frequency passwords and the user table, and I start validating them online in your system. Now if I do that too quickly, you'll notice and I'll be shut down. And if I do that all from the same IP, I'll be shut down.
But what if I had a botnet that I could distribute the load across. What if I kept my request rate small enough to stay under the radar of even a moderate scale system.
I would expect to start seeing meaningful results within days.
If you had 1000 users, then I could surmise that you don't have much traffic, and hence keep the request rate down to perhaps 100 per day. In 10 days I'd have at least a few u/p combinations that I know for a fact worked.
If you had 1000000 users, I could ramp it up quite a bit higher, to perhaps 1000 or 10000 per day.
And since they all came from separate IP addresses, it could be rather difficult for you to tell an attack was going on unless you were looking specifically for it.
Does that mean you should stop immediately? No. It's not that bad of a scheme. But be aware that it doesn't give you (or your users) the level of protection that it may look like on the surface.
So after a breach, you have our (currently) 2 million hashes, and let's say you recover only the weakest few percent of the passwords, which is 60000 known good passwords. Instead of owning 60000 accounts now, you have 60000 passwords, each of which is going to require on average one million attempts before you guess the correct username. Is this not self-evidently better?
The #1 password out of 3.3 million was 123456, which was used 20,000 times.
So extrapolating that for your 2 million hashes, we'd expect the top password to appear roughly 12,000 times.
Running those numbers, we'd expect each guess to have a 1/12000 chance of matching. Or more specifically, a 1988000/2000000 of not matching.
With some quick running of those numbers, we'd expect a 50% chance of finding a match after trying just 115 random usernames.
I'm not saying it isn't an interesting approach, I just don't think it's nearly as effective as if you encrypt the hash directly (which has no attack vector unless you can get the key).
> A properly implemented, simple pepper can only help password security and can't hurt it.
Well, yes. But what is the definition of "properly"? There are definitely constructions of "pepper" that look simple, but drastically hurt overall security:
It's sort of like the difference between birth control and counting based contraceptive methods (Standard Days Method). Executed perfectly, they are equally as effective. But with a slight error, one stays roughly as effective (losing maybe 5 to 10% effectiveness overall) while the other drops drastically (down to 10 to 20% effectiveness).
Considering using encryption is as effective as using a pepper, and it's less prone to weakening the core password hash, I suggest using encryption instead of peppers.
I am well aware of that misuse, as I've exploited it during a CTF before. :)
I would consider using the raw byte-output version of a function a very blatant example of "improper implementation".
Also, I agree regarding encryption. In my example I was actually referring to the random AES key as a pepper, even though it'd probably be better called an "application secret".
> Having the caller do it seems quite pointless when the recipient is anyway doing it.
Actually, I disagree. The caller is the only one who has semantic information about what the variable (and hence its value) means. All the callee (recipient) can do is blind cast it. The caller on the other hand can interpret it because it knows the meaning (talking about the developer, not the engine).
The one thing this does get you is that the original password would still work (technically any password would still work) so it may make it harder to detect since the user wouldn't "suddenly be locked out"...