I'm (re)doing our naming convention for our S3 keys. We have many (millions) of objects to store on S3. I'm planning on putting them all into one bucket and naming the keys to be the MD5 or SHA-1 digest of the files. I'll keep this synced with a database table which maps an auto-increment GUID (globally unique id) with the digest of the file.
Then I read this:
http://paltman.com/2007/05/29/amazon-s3-and-filename-magic
but I don't really understand the advantage of storing an object on S3 whose key is the hash and which points to the GUID.
One thing I don't want to do is store the GUID as the S3 key (to prevent massive scraping of all the assets).
How are you all dealing with this? Is anyone using a MD5 or SHA-1 digest as the key? A salted hash of the GUID as the key?
If I were going to use them for hosting my videos or something, I'd probably just use the GUID without the need for the MD5. Just make sure there's not chance of duplicates in the database schema and it should work.
I use Bacula for network backups and it does the same thing. All paths and filenames are stored as unique ID numbers.