Hacker News new | past | comments | ask | show | jobs | submit login

Also, CoW isn't some kind of magic. There are two meanings I can think of here:

A) When you modify a file, everything including the parts you didn't change is copied to a new location. I don't think this is how btrfs works.

B) Allocated storage is never overwritten, but modifying parts of a file won't copy the unchanged parts. A file's content is composed of a sequence (list or tree) of extents (contiguous, variable-length runs of 1 or more blocks) and if you change part of the file, you first create a new disconnected extent somewhere and write to that. Then, when you're done writing, the file's existing extent limits are resized so that the portion you changed is carved out, and finally the sequence of extents is set to {old part before your change}, {your change}, {old part after your change}. This leaves behind an orphaned extent, containing the old content of the part you changed, which is now free. From what evidence I can quickly gather, this is how btrfs works.

Compared to an ordinary file system, where changes that don't increase the size of a file are written directly to the original blocks, it should be fairly obvious that strategy (B) results in more fragmentation, since both appending to and simply modifying a file causes a new allocation, and the latter leaves a new hole behind.

While strategy (A) with contiguous allocation could eliminate internal (file) fragmentation, it would also be much more sensitive to external (free space) fragmentation, requiring lots of spare capacity and/or frequent defrag.

Either way, the use of CoW means you need more spare capacity, not less. It's designed to allow more work to be done in parallel, as fits modern hardware and software better, under the assumption that there's also ample amounts of extra space to work with. Denying it that extra space is going to make it suffer worse than a non-CoW file system would.






Which is exactly why you periodically do maintenance to compact the free space. Thus it isn't an issue in practice unless you have a very specific workload in which case you should probably be using a specialized solution. (Although I've read that apparently you can even get a workload like postgres working reasonably well on zfs which surprises me.)

If things get to the point where there's over 1 TB of fragmented free space on a filesystem that is entirely the fault of the operator.


What argument are you driving at here? The smaller the free space, the harder it is to run compaction. The larger the free space, the easier it is. There are some confounding forces in certain workloads, but the general principle stands.

"Your free space shouldn't be very fragmented when you have such large amounts free!" is exactly why you should keep large amounts free.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: