Using separate hash column for querying is nice idea. It seems to me that some t...

hangonhn · on Nov 7, 2023

> From what I understand, it is important (in CBS mode) that the combination of key + IV + first block of plaintext is unique.

> So, if key and IV are constant, but data is unique it is still secure. For example, social security number is unique and nine digits, which means it fits into 128-bit block. Using constant IV to encrypt SSN should be secure, right?

I think you meant CBC mode? Yes, IV + Key + first block = unique ciphertex. But if you made IV constant, then anyone with the same first block as someone else would result in the same ciphertext first block because all three inputs are now the same.

SSN is unique(ish?) but the range is absolutely small. Do you allow your users to input their own data? If that's the case, then they can create a whole bunch of accounts that enumerates some range of SSNs and look for anyone with the same ciphertext for that field.

zigzag312 · on Nov 7, 2023

CBC mode yes. Thanks, I fixed it now.

Small range of unique values is good argument. But wouldn't hash field (used for lookup) expose the same information?

Is there any good online resource focused on field encryption techniques in databases?

hangonhn · on Nov 7, 2023

> But wouldn't hash field (used for lookup) expose the same information?

Yes absolutely which is why you pick a small substring as the input to the hash function.

For example, for the SSN field, you can use the first 2 digits as the input. Even if you get a digest match, the fields themselves may not actually be the exact same value. SSN, however, given its small range is kind of hard to secure. You might be able to do something like the email address plus SSN as the input into the digest function.

I don't have any online resources specific to this topic but I did find this book to be very accessible for someone like me who's an engineer:

https://www.amazon.com/gp/product/1593278268/ref=ppx_yo_dt_b...

zigzag312 · on Nov 7, 2023

Thanks, I'll take a look at it.

I find encrypting data in DB challenging, as you often still need to run queries on that data in a performant way.