It's a little more than a key-value store. The keys are impure.
One of the first issues is that the disk likes to work in terms of blocks, which tend to lend themselves to arrays, preferably of fixed size so that related data is contiguous. This leads to a limit on the number of files in a directory and so nesting directories is one of the easiest ways to contain more files.
But it's also a little more than that. Directories can group semantically related files together. This means their meta-data is in the same directory inode and so can be read in as a group. This creates more efficiency. Chances are that you'll often access many related files at once, even if it is just to list a directory, so it helps the file system to have some structure so that the meta-data of related files is all read in at once. It's an optimization that takes advantage of our own semantic information to structure the data.
This arises from disks being a really crappy way to access data. They are slow and work best with sequential reads of large amounts of data. It really isn't that useful for a persistent key-value store whose meta-data alone may be much larger than the amount of available memory.
But I tend to think of it as a KV store most of the time anyway and often wonder why we have the silly idea of directories.
Yes, but often not just one. Now we can use links (mainly symbolic), but this can be a mess.
Tags may be a better way for an end user to sort out the data. You don't have to lose the hierarchical structures that directories give us. But it may be simpler to think of a file as some atomic stuff that just lies at the root of some disk, and belongs to a number of (sub)categories.
Sounds good, but what uniquely identifies a file? Right now it's path + name.
If I have two files with the same name, one tagged A and the other tagged A and B, are they the same file or not? What if I add a tag of B to the first one?
I think we should use several mechanisms at once to identify files.
Tags. The default mechanism for sorting and searching files. The assumption is, most files are passive data. When sharing a file, its tags should be sent along with it, so the receiving system can propose them by default to its user. Note that one may want to categorize tags themselves (meta-tags?). I'm not sure, but it may be necessary if a given system use many tags.
Descriptive names. This is the user-facing name of the file. No need for it to be unique. Like tags, a file's descriptive name should be sent along with it.
Locations. It may be of import to know where a given file is physically located. It is cool to transparently access more files when you plug you thumb drive in. It is less cool to forget to actually copy those you need.
Unique keys. Devised by the system and not directly accessible by the user. When a search yields several files with the same descriptive name, or when two files share tags and name and location, the system can be explicit about the ambiguity.
Unique names. Devised by the user. The system checks uniqueness (or, more likely, uniqueness by location). Follow a directory structure convention. Discouraged by default. Their primary usefulness would probably be for system and configuration files, which need to be accessed automatically and unambiguously by programs. May be implemented on top of descriptive names (the system could treat descriptive names that begin with "/" as unique, and enforce that uniqueness).
There. End users would primarily use tags, descriptive names, and locations. With the right default, users may actually be able to avoid making a mess of their data. To prevent unwanted access to sensitive system files, the system can by default exclude those files from search results. Typically those both tagged "system", and located on the main drive. Unique names would be for programs, power users, and those who want to fiddle with their system settings (either directly or through a friendly interface). Unique keys belong to the kernel.
Hard to say. It may be brilliant, and may be the Future Of Files, for all I know.
My first reaction, though, is that it sounds a bit confusing to me, and very confusing for novice users.
Right now, Mom understands that "C:\My Documents\bird.jpg" is not the same as "C:\My Documents\My Pictures\bird.jpg". The rule is simple: unique names per folder.
This is kind of a paradigmatic change. Right now, the default when dealing with files is to point to them. What I envisioned in the grandparent was to make search the default. Tags, descriptive names, and locations are all search criteria.
In a way, it is more complicated: instead of 0 or 1 file, you now get a whole list. On the other hand, everyone understands search. My hope is, the initial extra complexity would be dwarfed by the ease of sorting and finding your files. Because right now, one or the other is difficult: it's hard (or at least bothersome) to properly sort one's data in a directory tree, but it's even harder to find it if your disk is messy.
Now there are two snags we might hit: first, I'd like to do away with unique names, because they get us back to the old, difficult to manage, directory tree. Second, to have good tags, you have to internationalize them. For music stuff for instance, French speaking folks would like to use "musique", while English speaking ones will use "music". It has to work transparently when they exchange files, or else it would defeat the purpose of default tags. I can think of solutions such as aliases, normalization at download time, or standard tag names that can be translated by the system, but I'm not sure that's really feasible or usable.
One of the first issues is that the disk likes to work in terms of blocks, which tend to lend themselves to arrays, preferably of fixed size so that related data is contiguous. This leads to a limit on the number of files in a directory and so nesting directories is one of the easiest ways to contain more files.
But it's also a little more than that. Directories can group semantically related files together. This means their meta-data is in the same directory inode and so can be read in as a group. This creates more efficiency. Chances are that you'll often access many related files at once, even if it is just to list a directory, so it helps the file system to have some structure so that the meta-data of related files is all read in at once. It's an optimization that takes advantage of our own semantic information to structure the data.
This arises from disks being a really crappy way to access data. They are slow and work best with sequential reads of large amounts of data. It really isn't that useful for a persistent key-value store whose meta-data alone may be much larger than the amount of available memory.
But I tend to think of it as a KV store most of the time anyway and often wonder why we have the silly idea of directories.