It's strange to me that there's no file system with basic features like journaling and support for files larger than a couple of GB, that is supported across all major desktop OSes (MacOS, Windows, Linux, and FreeBSD). All platforms support the same i/o standards like USB or DisplayPort, why did filesystems never make the cut to become a cross-system standard?
Imagine if you could have a backup drive (with reasonable modern data protections) that you could just plug into different systems and save all your files to. Isn't it odd that such a simple thing isn't possible? I guess network attached storage has gotten pretty accessible at this point so there's no need for it?
I think the basic problem is that FAT is generally "good enough," and in the increasingly common case where it isn't exFAT is close to universal and addresses the only problem that consumers frequently run into (file size limit).
While FAT/exFAT leave the possibility of a variety of different types of filesystem inconsistency, these seem to be fairly rare in actual practice, probably in good part due to Windows disabling writecaching on devices it thinks are portable. This kind of special handling of external devices is sort of distasteful, and leads to some real downsides on Windows (e.g. LDM handling USB devices weird), but using a newer file system doesn't really eliminate that problem - NTFS and Ext* external devices require special handling on mounting to avoid the problems that come from file permissions traveling from machine to machine, for example.
> in good part due to Windows disabling writecaching on devices it thinks are portable. This kind of special handling of external devices is sort of distasteful
What's distasteful is that on my linux machine with a lot of ram, when I copy a multiple gigabyte file to a USB key it "completes" the copy almost immediately, when actually all it has done is copy the file to a ram buffer. Then when I try to disconnect the drive, it will hang for ages while it actually finishes the write. IMO windows does it better here (although I never realised what exactly they did, nice to know.
GUI file copy tools should be using O_DIRECT, or periodically calling f/sync(). An argument could also be made that the kernel write cache should have a size limit so that one-off write latency is masked, but very slow bulk I/O is not masked.
O_DIRECT seems like overkill, and the lack of write buffering could be a real detriment in some circumstances. Syncing at the end of each operation (from the user's perspective) should be the best mix of throughput and safety, but it makes it hard to do an accurate progress bar. Before the whole batch operation is finished, it may be useful to periodically use madvise or posix_fadvise to encourage the OS to flush the right data from the page cache—but I don't know if Linux really makes good use of those hints at the moment.
On really new kernels, it might work well to use io_uring to issue linked chains of read -> write -> fdatasync operations for everything the user wants to copy, and base the GUI's progress bar on the completion of those linked IO units. That will probably ensure the kernel has enough work enqueued to issue optimally large and aligned IOs to the underlying devices. (Also, any file management GUI really needs to be doing async IO to begin with, or at least on a separate thread. So adopting io_uring shouldn't be as big an issue as it would be for many other kinds of applications.)
Not always. If you're reading from a SSD and writing to a slow USB 2.0 flash drive, you could end up enqueuing in one second a volume of writes that will take the USB drive tens of seconds to sync(), leading to a very unresponsive progress bar. You almost have to do a TCP-like ramp up of block sizes until you discover where the bottleneck is.
Which distro/desktop? My standard ubuntu 18.04 with gnome and mounted through the file manager doesn't do this and copying to a slow USB drive is as glacial as it should be, but copying between internal drives is instant and hidden.
Default gnome on Ubuntu 20.04. How much free ram do you normally have? If you don't have enough to buffer the whole operation, then it's not a problem.
Now that I think about it, this might actually explain some bugs I've seen when copying multiple files. Copying one file seems to work but then copying a second the progress sits at 0%, it's probably waiting for the first transfer to sync.
I don’t know if it’s just because I’m largely on Mac and Apple’s drivers suck, but I have run into lots of data corruption issues with exFAT on large drives (1TB+). More than enough for me to stop using it.
(Now I use NTFS and Tuxera’s commercial Mac driver, because I don’t know how else to have a cross-platform filesystem without a stupidly-low file-size limit.)
It isn't a 1st class fs in any OS so it lacks some polish tooling-wise etc., but it should work fine for basic file transfer jobs.
I'm using it on an external HDD for copying/watching video files between Linux and Windows boxes and haven't had any problems yet.
On the other hand 7-8 years ago I tried to use an UDF partition for sharing a common Thunderbird profile between Windows and Linux and had done strange errors on Windows side after a while. I didn't dig further so it may have been a non-udf os or tooling issue, or it may have been solved in the meantime.
> why did filesystems never make the cut to become a cross-system standard?
Many reasons, the most important being patents and insistence by vendors to keep stuff proprietary (exfat, ntfs, HFS, ZFS).
Also, there are fundamental differences on the OS level:
- how users are handled: Unices generally use numerical UID/GID while Windows uses GUIDs
- Windows has/had a 255-char maximum length of the total path of a file
- Unices have the octal mode set for permissions, old Windows had for bits (write protected, archived, system, hidden) and that's it
- Windows, Linux and OS X have fundamentally different capabilities for ACLs - a cross platform filesystem needs to handle all of them and that in a sane way
- don't get me started on the mess of advanced fs features (sparse files, transparent compression, immutability of files, transparent encryption, checksums, snapshots, journals)
- Exotic stuff like OS X and NTFS additional metadata streams that doesn't even have representation ín Linux or most/all BSDs
And finally, embedded devices and bootloaders. Give me a can of beer and a couple weeks and I'll probably be able to hack together a FAT implementation from scratch. Everything else? No f...in' way. Stuff like journals is too advanced to handle in small embedded devices. The list goes on and on.
Filesystems have never been standardised. In mainframe/mini days manufacturers supplied a selection of OS options for the same hardware, and there was no expectation that the various filesystems would be compatible between different OSs.
Which is why we have abstraction layers like Samba (etc) on top of networked drives. They're descendants of vintage cross-OS utilities like PIP which provide a minimal interface that supports folder trees and basic file operations.
But a lot of OS-specific options remain OS-specific, and there's literally no way to design a globally compatible file system that implements them all.
This isn't to say a common standard is impossible, but defining its features would be a huge battle. And including next-gen options - from more effective security and permissions, to content database support, to some form of literally global file ID system, to smart versioning - would be even more of a challenge.
> Windows has/had a 255-char maximum length of the total path of a file
The actual path of file in Windows can be practically unlimited, but either requires using special network notation that can exceed 256 characters or relative addresses. Recent versions of Windows includes a setting that removes the limitation in the APIs because of development issues like node nested packages.
UDF looks a lot like that file system on paper, but my understanding is that it practically falls apart because there are too many bugs and too much variation in which features are really supported among the various implementations. Everyone agrees on the DVD-ROM subset, but beyond that it seems like a crapshoot.
Using a non-FAT filesystem for portable storage opens up lots of issues regarding broken permissions. Try to create a file on an ext4-formatted USB flash drive using your current user; rest assured that, unless your remembered to set its permissions to 777, on another computer you'll have to chmod it because it still belongs to the possibly dangling creator's UID and GIDs. If you don't have root access, and unless coincidentally your user has the same UID as your other machine, you're screwed and you have to go back grumpingly to a system where you have administrative privileges.
Same thing applies to NTFS, but I've seen that more often than not Windows creates files on removable drives with extremely open permissions, and in my experience NTFS-3G just straight ignores NTFS ACLs on the drives it mounts, so more often than not it JustWorks™ in common use cases.
I think a journaled extFAT-like filesystem would be perfect for this task, but given how hard it was for exFAT to even start to displace FAT32, even if it actually existed I wouldn't expect it to succeed any time soon.
Not standardising means being able to distinguish yourself easier, it's just that for anything that integrates with third-party stuff, standardisation is way cheaper.
Apple doesn't make monitors or flash drives, so making their own specs for those wouldn't be beneficial to them and would increase prices of compatible products.
But a filesystem is something they do make and being able to do whatever they want with it is quite beneficial, with no downsides as there are no third parties (that they care about) integrating with it that would have to put in extra work to support it (besides their direct competitors, which is a nice bonus).
In all three cases you list, it's a third-party module providing support, rather than it being a standard feature of the OS that can be expected to be generally available on more than 1% of the install base.
ZFS development moves so fast that it is common for my (FreeBSD-based) FreeNAS box to warn me when I upgrade my OS that certain actions will make it incompatible with the prior version of FreeNAS.
That is fine and appropriate for a drive that will be connected to the system for the foreseeable future.
That kind of compatibility concern makes me squeamish about using ZFS for a drive that I want to share between different systems. If it's easy to make it incompatible between two releases for the same system, that smells like a waiting nightmare trying to keep it compatible between Linux, FreeBSD and Windows.
Yeah it should stabilize in the next couple of months with the release of OpenZFS 2.0 as that release is supposed to signify the unification of ZFS on Linux and FreeBSD. ZFS on FreeBSD is being rebased onto ZFS on Linux. Theres also been some talk on adding MacOS zfs support OpenZFS but thats still up in the air.
Agreed! I’ve been running FreeBSD on various computers for very close to a decade now, and still run it on my mail server, but one problem that I faced a couple of years ago when I sold my old laptop, which I was running FreeBSD on, was that my other computer at home at the time was running Linux but I had an external HDD that I’d been using with the laptop and which I was using GELI encryption on.
Since I didn’t have money for any more hard drives at the time, I couldn’t transfer the data to anything else. So then when I wanted to access that data I’d do so via a FreeBSD VM running in VirtualBox. The performance was... not great.
I took the data that I needed the most, and for the rest of the data I let it sit at rest.
This week I wanted to use the drive again, and in the end because I was doing general cleanup, I decided to install FreeBSD on my desktop temporarily.
I actually love FreeBSD but the reason that I prefer to have my desktop running Linux is in big part because I want software on the computer to be able to take advantage of CUDA with the GTX 1060 6GB graphics card that I have in it, and unfortunately only the Linux driver by Nvidia has CUDA, the FreeBSD driver by Nvidia does not.
I was actually looking at installing VMWare vSphere on the computer instead, so that I could easily jump between running Linux and running FreeBSD with what I understand will probably be good performance compared to VirtualBox at least. But the NIC in my machine is not supported and vSphere would not install. I found some old drivers, messed around with VMWare tooling which required PowerShell, and which turned out not to work with the open source version of PowerShell on any other operating system than Windows. So then I downloaded a VM image of Win 10 from Microsoft [0], and used that to try and make a vSphere installer with drivers for my NIC. No luck at first attempt unfortunately. A decade ago I probably would have kept trying to make that work, but at this point in my life I said ok fine fuck it. I ordered an Intel I350 NIC online second-hand for about $40 shipping included, and the guy I bought it from sent it the next day. It is expected to arrive tomorrow. Meanwhile, I installed FreeBSD on the desktop. When the NIC arrives I will do some benchmarking of vSphere to decide whether to use vSphere on the desktop or to stick to either FreeBSD for a while on that machine or to put it back to just Linux again.
Anyways, that’s a whole lot more about my life and the stuff that I spend my spare time on than anyone would probably care to know :p but the point that I was getting to is that, with OpenZFS 2.0 I will be able to use ZFS native encryption instead of GELI and I will be able to read and write to said HDD from both FreeBSD and Linux.
I still need to scrape together money for another drive first before I can switch from GELI + ZFS to ZFS with native encryption though XD
Oh, and one more thing, with the external drive I was having a lot of instability with the USB 3.0 connection on FreeBSD, leading to a bit of pain with transferring data because the drive would disconnect now and then and I’d have to start over. But yesterday I decided to shuck the drive – that is, to remove the enclosure and to connect the drive with SATA like you would any other regular internal drive. It worked out excellently, the WD Essentials enclosure was easier to pry open than I had feared, and a video on YouTube showed me how to do it [1]. As prying tools I used a couple of plastic rulers. As a bonus, it also looks like I/O performance is better with the direct SATA connection than what I was getting with the USB 3.0 connection.
Speaking of that, some people have reported finding that the drives in their WD Essentials external drives were WD Red HDDs. I didn’t have the same luck with mine; mine was WD Blue. But idk if WD Red is even common with the capacity that mine has anyways. Mine is “only” 5TB and I think the people that have been talking about finding WD Red drives in theirs has bought 8TB models often. Idk. The main thing for me anyways is just to have my data and someplace to store it ^^
The ZFS driver is still early in development and quite unstable! I’ve used it in read-only mode just so I could have some access to my ZFS pool while booted into Windows, and although it kind of worked in that use case it was would still do weird things like randomly refuse to open certain files.
One thing that occasionally causes data interoperability problems for me is forgetting that Windows can't have colons in filenames. Not really sure what a filesystem driver could do about that.
You don't need to rebuild a disk because some clusters were orphaned. It doesn't mean disk was corrupt, it just means there is some allocated space that isn't used. It's trivial to fix too. There is no danger there. Actually, what journaling does is to rollback that allocated space. Doing that manually doesn't make it less safe. For backup scenarios exFAT is perfectly feasible.
What is feasible and what is desirable are two different things.
With journalling, I don't have to know or care about any of this: I restart and chances are it's all back to normal. This is desirable.
You can store your backups on stone tablets, with a machine that carves rock to write 1s and 0s and a conveyor belt that feeds new tablets. That is perfectly feasible. It is also not desirable.
Yes there are still dreams about bringing your home directory on a stick anywhere with you and just plugging it into nearest public toilet when you need it.
Ironically, a fully encrypted Linux installation with Btrfs or ZFS on a removable drive is quite easy to make and it works really well. I've one I made on an old SSD I had lying around and a 2.5" USB-C caddy and it's wonderful, you can have your work environment everywhere you want and even plug it into a running machine and boot it up using Hyper-V or KVM.
Imagine if you could have a backup drive (with reasonable modern data protections) that you could just plug into different systems and save all your files to. Isn't it odd that such a simple thing isn't possible? I guess network attached storage has gotten pretty accessible at this point so there's no need for it?