Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Based on my understanding of DB storage (which is decades old and as I write the comment my explanation seems stupid but it’s what I was taught. Please someone correct me).

Image a table with the following:

id (int), title (varchar), created_date (date)

That data is stored in a similar order on disk (is this still true?).

So, if the initial insert has a title of 10 characters, then the date will get placed after those 10 characters on disk:

Int, 10 characters, date

Later, the row is updated with a longer title of 500 characters.

Because it’s longer, you now have to get more space for the title on another part of the disk.

Since there are columns after the title, it’ll either leave dead space where the old 10 character title was or it’ll need to move those other columns too.

I can’t remember which I was taught. I just remember that columns after variable columns get impacted when the data in a preceding variable column changes.

If the title were at the end of the row, the db could expand the data without needing to move other columns over (if there happened to be available space right there).

If you were updating with a shorter value, it could shorten the row by moving the terminating character to the new shorter location, freeing up the space the longer title was using.

Bottom line is you want to keep variable columns which get updated most frequently towards the end of your table structure.

Knowing which column that would be isn’t always clear when creating the table.

Again, this understanding was taught to me over 20 years ago. I’m probably remembering parts of it wrong and DB storage has likely (hopefully) advanced since then.



> That data is stored in a similar order on disk (is this still true?).

There should be no requirement for this. Columns in relations are not conceptually ordered, so it shouldn't matter for the things you're doing with the data anyway, and the database should be able to reorder the data in whatever way it likes, since desire to isolate the user from physical data structures was one of the main reasons for the rise of RDBMS.


1. That’s not exactly true, AFAIK all RDBMS return columns in table order when order is unspecified (`*`), and while they could reorder on retrieval

2. postgres definitely does not, and column tetris is absolutely a thing in the same way struct packing is (with the additional complexity of variable-size columns)


But the order of attributes in the presentation of the result relation has nothing to do with physical layout of base relvars -- primarily because any relationship between the two is purely coincidental. The vast majority of useful queries will not reuse the order of attributes in base relvars, so optimizing for the unusual trivial case by prohibiting better rearrangements that could be useful for a much larger number of use cases seems rather pointless.

And what PostgreSQL does is of course an implementation detail of PostgreSQL.


> the database should be able to reorder the data in whatever way it likes

My point was that you’re trying to prevent the DB from reordering the data because there’s a performance cost when that happens.


Why would you be trying to prevent the DB from reordering the data? You're not supposed to have better knowledge of what's good for your use case than an RDBMS that can collect usage statistics on queries and such. Ditto for compilers rearranging structures and such. When you start having hundreds of tables and thousands of queries, I don't see how you can do a better job than an automated system at that point.


> Why would you be trying to prevent the DB from reordering the data?

Sure. “Reduce the need” would be a better word than “prevent”.

If I can do a good job organizing the table columns (as described above) it’ll lower the need for the DB to reorder data.

Reduced need to reorder, improves performance.


DB storage is a lot more sophisticated than what you were taught.

Most databases use Slotted Pages to organize storage. Pages are fixed size and numbered by their offset within the database file. The page header contains the number of rows, followed by an array of offsets for individual rows within the page. Rows themselves generally are stored at the end of the page filling downwards. The storage engine can move around rows in arbitrary ways to consolidate free space.

Fundamentally there's no connection between SQL schema order and how table storage is organized on disk. For example in a column store there's often no contiguous row stored anywhere, instead there's just separate indexes per column.


That would seem like an argument in favor of allowing column reordering.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: