Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why is that worse than a couple of dozen joins?


Because that means your data is highly denormalized and has plenty of duplicates. But in all likelihood it means no one knows wtf this table actually represents and you should be firing people.

I've seen this play out. Usually the many columns is because everyone misuses the table and eventually their special little business scenario or "filter" needs to be a column. Bonus points is whoever has to reference this table, they have to copy over whatever the hell your PK seems to be, and the cycle repeats, this time a bit worse.

Last place I did a brief project in had this. Queue 1000 tables spread across 25 schemas, each table having wide PKs, 20 redundant indexes on each table, and despite all this the database performs poorly. No one can tell you what each table represents, the table names are meaningless and the same data is everywhere. In order to get anything done you have to ask a small cabal of priests that knows the processes that write between these tables. After about 10 years, a partial rewrite happens and you now have 1/3rd of the data on each side with plenty of duplicate and overlap because hey.

I feel torn, I really wanna name&shame this company as a warning to all developers thinking about working there.


> But in all likelihood it means no one knows wtf this table actually represents and you should be firing people.

That's almost exactly the opposite of my experience, but then I've worked on smaller teams with long-term team members, so perhaps that's why.

Our tables are wide because law require we store the data for 10 years, and it's easier and faster to store things in a single row, rather than spread out over several dozen detail tables.

We don't denormalize in the sense of storing only invoice items, repeating the head data on each item. We do denormalize in the sense of storing full name and address for each party rather than storing that in a detail table. So minimal, if any, duplication of data. If we duplicate it's almost always due to the law, ie data might change in one table but shouldn't be changed in the other to preserve history.

For child tables we always have a unique ID, sequence or autoinc, and the FK to the parent, but we include the root-level ID as well. This way we can select all the child rows in a single select regardless of level, again without tons of joins.

This way our core data is just a handful of tables, and there's not much doubt about what goes where.


I feel you... I mean "...you should be firing people", this is my day to day way of thinking.

My thoughts are that, hyper-specialization, and grind breed this type of data structure. But so many companies are forced to choose, and generally tend to sacrifice on the database side of things. Then you end up with this type of unruly structure.

Database theory and practice should be a MUST on all software development courseware.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: