So no primary key or uniqueness constraints?

Macha · on May 25, 2023

Unlikely, given Reddit's past schema design. One table of "things", and then another table of attributes of those things in a entity,key,value format.

https://kevin.burke.dev/kevin/reddits-database-has-two-table...

time0ut · on May 26, 2023

I built something inspired by this very post in 2013/2014. Not sure how the scale compares, but we insert ~10 million “things” with an average of 30 data attributes per day with a 30 day window. It definitely uses primary and foreign keys. It took some time to tune. Had to support an additional access pattern using a non-unique index. Had to work with a DBA to get partitioning right to handle the large write volume and manage expiration efficiently. It worked out great and is still chugging along. It does suck not having all the tools of an RDBMS at your disposal, but it was a good trade off.

bcrosby95 · on May 26, 2023

That doesn't mean those tables didn't have primary or unique keys.

According to that post, in 2010, they had about 10 million users. At a conservative 10 fields per user, you're looking at 100 million records.

I'm a bit skeptical that they table scanned 100 million records anytime they wanted to access a user's piece of data back in 2010.

ComodoHacker · on May 25, 2023

Modern DB engines can enforce Unique or PK constraints without indexes. Yes, they perform scans.

tomnipotent · on May 25, 2023

Every major RDBMS requires an index for both constraints.

hinkley · on May 25, 2023

FK constraints however are a pretty common gotcha. You have a table that allows deletes, and every table that has a column pointing to that table gets scanned on each delete operation.

So you have to add an index on that column, or start talking about tombstoning data instead of deleting it, but in which case you may still need the FK index to search out references to an old row to replace with a new one.

tomnipotent · on May 25, 2023

An FK also adds the burden of keeping a copy of each related row during the lifespan of a transaction. This means small-but-frequently-updated tables that are joined to a lot can be a source of unexpected headaches.

ComodoHacker · on May 26, 2023

After checking the docs, you are right, I stand corrected. I was pretty sure Oracle and DB2 don't.