I’m not familiar with the VACUUM changes but the situation you’re describing sug...

brianwawok · on Sept 14, 2023

How do you figure?

Postgres docs are quite clear. Table space is not reclaimed without a vacuum full. So delete a column in a big table? you are storing that data forever.

_bohm · on Sept 14, 2023

Not quite true. Regular VACUUM marks dead tuples as available for re-use, so the system can overwrite the dead tuples with fresh ones after a VACUUM. VACUUM FULL completely rewrites and repacks the table.

https://www.postgresql.org/docs/current/sql-vacuum.html

jeff-davis · on Sept 14, 2023

Postgres has a free space map, which allows it to reuse the space of deleted tuples for new or updated tuples created in the same table. If no new/updated tuples are created in the table after the DELETE, you have fragmentation, which means the file remains large so the space can't be used for other tables (or other unrelated data residing in the same filesystem).

The nature of fragmentation means that you need to move a lot of data around to actually make that file smaller. In Postgres, that's typically done with VACUUM FULL. The problem of fragmentation is not unique to Postgres, it's a fundamental issue; but perhaps other systems are able to move the data around in a less disruptive way.

If you just delete a column, that creates a different type of fragmentation within the tuples themselves (e.g. you delete the middle column, and the tuple itself doesn't shrink, it just ignores that middle column). You are right that can be a problem. Postgres could be improved to rewrite tuples to eliminate the wasted space from deleted columns in the middle, which would probably be (computationally) worth it to do if it's already performing cleanup on the page.

adql · on Sept 14, 2023

...okay ? What's the use case where data shrinks ?

The data that will be "not removed" will just be used by new data.

Only real use case is "we've loaded way too many data, removed it, and want to recover that space because we will never need it", and that is not enough to matter, as usually database have its own filesystem and most filesystems can't be shrunk online so any shrinking needs downtime

paulddraper · on Sept 14, 2023

> What's the use case where data shrinks ?

Parent literally provided one: deleting a column.

Another is that it's very easy in PostgreSQL to bloat indexes. Load a bunch of data. Update (or delete) that data and now your index is bloated.

The only resolution is to REINDEX (or VACUUM FULL).

efxhoy · on Sept 14, 2023

The nice thing about index bloat is REINDEX has a CONCURRENTLY option, no need to block writes.

paulddraper · on Sept 15, 2023

True, good point.

ellisv · on Sept 14, 2023

If rewriting the table is part of a regular workflow, then database configuration can play a significant role in its performance. However I still contend that rewriting tables regularly is an anti pattern.

Are you able to partition any of your tables so that you can VACUUM FULL the partitions individually?

brianwawok · on Sept 17, 2023

Yah partition is my eventual end goal. It’s getting less and less awkward with each PG version to partition. It’s still a little awkward in Django. But heck, I haven’t tried it in a year, maybe worth another shot.

salojoo · on Sept 14, 2023

Have you tried pg_repack?

brianwawok · on Sept 17, 2023

No but someone else suggested it. If it works on google cloud I might be in business.