Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’m not familiar with the VACUUM changes but the situation you’re describing suggests something is wrong with the table definition or database configuration.


How do you figure?

Postgres docs are quite clear. Table space is not reclaimed without a vacuum full. So delete a column in a big table? you are storing that data forever.


Not quite true. Regular VACUUM marks dead tuples as available for re-use, so the system can overwrite the dead tuples with fresh ones after a VACUUM. VACUUM FULL completely rewrites and repacks the table.

https://www.postgresql.org/docs/current/sql-vacuum.html


Postgres has a free space map, which allows it to reuse the space of deleted tuples for new or updated tuples created in the same table. If no new/updated tuples are created in the table after the DELETE, you have fragmentation, which means the file remains large so the space can't be used for other tables (or other unrelated data residing in the same filesystem).

The nature of fragmentation means that you need to move a lot of data around to actually make that file smaller. In Postgres, that's typically done with VACUUM FULL. The problem of fragmentation is not unique to Postgres, it's a fundamental issue; but perhaps other systems are able to move the data around in a less disruptive way.

If you just delete a column, that creates a different type of fragmentation within the tuples themselves (e.g. you delete the middle column, and the tuple itself doesn't shrink, it just ignores that middle column). You are right that can be a problem. Postgres could be improved to rewrite tuples to eliminate the wasted space from deleted columns in the middle, which would probably be (computationally) worth it to do if it's already performing cleanup on the page.


...okay ? What's the use case where data shrinks ?

The data that will be "not removed" will just be used by new data.

Only real use case is "we've loaded way too many data, removed it, and want to recover that space because we will never need it", and that is not enough to matter, as usually database have its own filesystem and most filesystems can't be shrunk online so any shrinking needs downtime


> What's the use case where data shrinks ?

Parent literally provided one: deleting a column.

Another is that it's very easy in PostgreSQL to bloat indexes. Load a bunch of data. Update (or delete) that data and now your index is bloated.

The only resolution is to REINDEX (or VACUUM FULL).


The nice thing about index bloat is REINDEX has a CONCURRENTLY option, no need to block writes.


True, good point.


If rewriting the table is part of a regular workflow, then database configuration can play a significant role in its performance. However I still contend that rewriting tables regularly is an anti pattern.

Are you able to partition any of your tables so that you can VACUUM FULL the partitions individually?


Yah partition is my eventual end goal. It’s getting less and less awkward with each PG version to partition. It’s still a little awkward in Django. But heck, I haven’t tried it in a year, maybe worth another shot.


Have you tried pg_repack?


No but someone else suggested it. If it works on google cloud I might be in business.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: