Using separate hash column for querying is nice idea.
It seems to me that some type of data might allow constant IV and still be secure.
From what I understand, it is important (in CBC mode) that the combination of key + IV + first block of plaintext is unique.
So, if key and IV are constant, but data is unique it is still secure. For example, social security number is unique and nine digits, which means it fits into 128-bit block. Using constant IV to encrypt SSN should be secure, right?
Email can also be used as unique identifier. But length of an email can exceed 16 bytes, so we don't have a guarantee that first block of plaintext will be unique (as two different emails can have equal first 16 bytes). So, it's not secure to encrypt email address with constant IV.
But if we would use a 16 bytes long hash of an unique email as an IV, there would be very low chance of (IV + first block data) collision. Probably secure enough?
> From what I understand, it is important (in CBS mode) that the combination of key + IV + first block of plaintext is unique.
> So, if key and IV are constant, but data is unique it is still secure. For example, social security number is unique and nine digits, which means it fits into 128-bit block. Using constant IV to encrypt SSN should be secure, right?
I think you meant CBC mode? Yes, IV + Key + first block = unique ciphertex. But if you made IV constant, then anyone with the same first block as someone else would result in the same ciphertext first block because all three inputs are now the same.
SSN is unique(ish?) but the range is absolutely small. Do you allow your users to input their own data? If that's the case, then they can create a whole bunch of accounts that enumerates some range of SSNs and look for anyone with the same ciphertext for that field.
> But wouldn't hash field (used for lookup) expose the same information?
Yes absolutely which is why you pick a small substring as the input to the hash function.
For example, for the SSN field, you can use the first 2 digits as the input. Even if you get a digest match, the fields themselves may not actually be the exact same value. SSN, however, given its small range is kind of hard to secure. You might be able to do something like the email address plus SSN as the input into the digest function.
I don't have any online resources specific to this topic but I did find this book to be very accessible for someone like me who's an engineer:
It seems to me that some type of data might allow constant IV and still be secure.
From what I understand, it is important (in CBC mode) that the combination of key + IV + first block of plaintext is unique.
So, if key and IV are constant, but data is unique it is still secure. For example, social security number is unique and nine digits, which means it fits into 128-bit block. Using constant IV to encrypt SSN should be secure, right?
Email can also be used as unique identifier. But length of an email can exceed 16 bytes, so we don't have a guarantee that first block of plaintext will be unique (as two different emails can have equal first 16 bytes). So, it's not secure to encrypt email address with constant IV.
But if we would use a 16 bytes long hash of an unique email as an IV, there would be very low chance of (IV + first block data) collision. Probably secure enough?