I'm just now starting to crack open some Lisp tutorials, coming from the OOP sid...

derefr · on Oct 12, 2018

Instead of a file system (a system that consumes a block device and exposes a database mapping a hierarchy of tree-nodes with stringly-keyed names to seekable byte-buffers), picture something like Erlang’s DETS (a system that consumes a seekable byte-buffer and exposes a tuple store, where the tuples are regular in-memory objects that happen to live in an isolated heap which is memory-mapped from the file.)

Now consider that anything that consumes a seekable byte-buffer as its backing store could just-as-well be modified to consume a block device as its backing store.

Voila: your OS now has “durable memory” in place of a filesystem.

(And, in fact, in modern systems you can skip the whole block-device layer and just sit your tuple store directly on top of NVMe.)

skissane · on Oct 12, 2018

I've often wondered, instead of a filesystem, why not a database?

A filesystem is effectively a hierarchical model database with a very limited feature set (usually no transactions, minimal or no schema enforcement, very limited query language.)

Why not add some of those features?

Filesystem transactions: Windows supports them, but Microsoft has deprecated them; part of the reason, I think, is transactions required use of a different API to access files, which meant almost nobody used them; if they had been accessible through the same filesystem API as non-transactional operations, they might have seen more adoption.

Schema enforcement: I could create a directory called "images", and then specify that all files in images have to be of MIME type image/*, and the FS will refuse to let me put a text file or executable in there. I could have a directory called "logs", and require all file names in that directory to have names containing a valid date/timestamp. I could enforce the rule that a .json file has to contain well-formed JSON, or a .xml file must contain well-formed XML.

Querying: filesystem could support SQL queries over file extended attributes. (It doesn't have to be SQL, SQL syntax is pretty ugly.) "Find all files of MIME type image/png which are greater than 100KB in size?" "How many executable files are there? How much space do they consume?"

jcelerier · on Oct 12, 2018

> I've often wondered, instead of a filesystem, why not a database?

yeah, that's what a lot of people wondered in the 70s too and it failed spectacularly, and failed again spectacularly in the 2000s with microsoft's WinFS.

goatlover · on Oct 12, 2018

Did it fail (outside mainframes) for technical reasons, or because the filesystem won the popularity contest? There's plenty examples on tech where one thing wins out for reasons other than technical merit.

pjmlp · on Oct 12, 2018

With that kind of remark I guess you never used a mainframe.

ori_b · on Oct 13, 2018

In what way do mainframe file systems differ? As far as I'm aware, they largely use heirarchical file systems, like the confusingly named HFS or ZFS -- both of which are colliding acronyms from within IBM. HFS is the older Hierarchical File System, and ZFS being the newer z/Architecture File System.

As far as I'm aware, the main difference is support for record oriented files, but the naming and lookup isn't so different.

skissane · on Oct 13, 2018

Well, I think even more than HFS or zFS, they use the MVS classic filesystem (datasets etc.) But that too is semi-hierarchical. (I say "semi" because to some extent the hierarchy is just a naming convention, but to some extent it's real – the initial qualifiers of a dataset name can be really hierarchical in that can select which catalog is used; and then of course PDS/PDSE members are an additional one-level hierarchy on the end.)

But you are right, there is nothing especially database-oriented about file lookup and naming on IBM mainframes. Record-oriented files and key-sequenced VSAM files (and once upon a time ISAM too) are database-oriented features, but they relate to file contents not file naming/lookup/etc.

I think the idea of catalogs is interesting, in that they permit a separation between the naming of datasets and the volumes they are stored upon. A dataset can be moved to another volume without changing the name used to access it. That is arguably more complex on Unix-like systems, since you need to muck around with symlinks or bind mounts to get the same effect.

nickpsecurity · on Oct 12, 2018

OpenVMS', BeOS', IBM's ISAM's, and Reliant Nitro'sfilesystems were database-like:

http://neilrieck.net/docs/openvms_notes_rms_rdb.html

https://en.m.wikipedia.org/wiki/Be_File_System

https://en.m.wikipedia.org/wiki/ISAM

https://en.m.wikipedia.org/wiki/Datalight#Reliance_Nitro

vt240 · on Oct 13, 2018

I installed RDB last week on my hobby I64 machine. Although I had glancing encounters with VMS through work, it has been a lot of fun to explore some of the features of the OS in depth. I remember having a WinNT RDB eons ago too. But it seems to have disappeared from the web. The Internet Archive has a copy of "Rdb : a comprehensive guide" for loan if anyone is interested.

panic · on Oct 12, 2018

People like the simplicity of the file/folder paradigm. That said, it may make sense to build your OS's file/folder abstraction on top of a more powerful database system. The ad-hoc, user-oriented nature of filesystems causes lots of problems when you try to build reliable software on top of them. There's no reason an app's internal network cache (for example) needs to be stored in the same way as a folder of user-facing documents.

derefr · on Oct 12, 2018

I feel like the modern OSes have all developed a two-layer storage paradigm, where, from the kernel and base-userland's perspective, you just have an ordinary filesystem; and then, from the Desktop Environment's perspective, this filesystem gets combined with a content indexer to derive a more database-like system that tracks "documents."

And documents themselves can be databases (e.g. SQLite3 databases) in a way that's understood by this same database-like system, such that there can be a vertical integration between changes to "documents" that happen through Desktop-Environment-level APIs, and events elsewhere in the system.

(For example: the interaction in macOS/iOS between Core Data "documents" and iCloud. If you have a KV-store-typed Core Data "document", then iCloud backs it with a KV-store on its backend, and syncs changes between your KV-store and the cloud KV-store keywise, rather than just re-uploading the entire document whenever it changes.)

From a userland perspective, it's a bit hard to understand why these abstractions haven't been "pushed down the stack" into the base system, such that even POSIX utilities can see and deal with "documents" as a whole.

But from a systems-development perspective, I think I understand why it hasn't happened—modern OSes are developed from the inside out, in a way where development on the kernel or the base-system usually involves stripping away all the complexities of the higher level. A good example of this layering is Android: there's the Linux base-system, and then the Android runtime on top.

People who need to develop new kernel extensions for Android devices, need to work on them "at a Linux level", rather than trying to figure out what's going on by peering into the Linux part of the stack from the Android-runtime part of the stack. When they're doing that, none of the Android niceties are available. If the whole filesystem was vertically integrated such that you didn't have one before the Android runtime successfully booted, that'd be really annoying in this case. A filesystem is very useful for doing this kind of development. (Picture if Linux didn't have the initramfs abstraction—if, before the real root filesystem was mounted, there was just no filesystem at all, and so no POSIX userland utilities to rely on. That'd make developing a boot process much harder.)

panic · on Oct 12, 2018

Yeah, this "two-layer storage paradigm" is exactly what I'm getting at. There are a lot of other ways you could slice it, though. System services want simple APIs with high performance -- they'd prefer not to pay the cost of the higher-level user document representation.

Personally, I think user documents shouldn't be a kernel concept -- it should be a protocol implemented by user-space system services. This would let any app expose its storage as "documents" without having to change its internal representation.

snuxoll · on Oct 13, 2018

The concept is nice, but every time somebody has tried to implement it there has been some miserable failure either from a business or technical sense. See the Be File System for the former, and WinFS for the latter.

In fact, the only successful implementation of the concept (even if only partially) I've seen is the object-oriented storage on IBM i (formerly i5/OS and OS/400).

pjmlp · on Oct 12, 2018

Mainframes like the OS/400, now IBM i, do have databases instead of plain file systems.

skissane · on Oct 12, 2018

Most mainframe and minicomputer filesystems support record-oriented files, and indexed/keyed files. In Unix/Windows land, if my file has 80 byte fixed width records, that's an application file format detail, and the filesystem knows nothing about it. In mainframe/minicomputer systems, the file is declared to the filesystem as F80 (fixed-width 80 byte records), and the FS will force all reads/writes to be in multiples of the record size.

Now, you can call this a "database", although it is basically a flat-file database (ISAM/VSAM). But then, many mainframe/minicomputer relational database products use this as their underlying storage layer, and put SQL on top of it. But the SQL access is an application layer, not part of the actual OS-level filesystem. In some cases, you can have legacy apps directly accessing the files via ISAM/VSAM, bypassing the relational layer, while having newer apps going through the SQL interface instead to read/write the very same files.

Where OS/400 / IBM i makes this picture a bit more complex, is the relational layer is shipped as part of the OS, not as a separate product. What I can't work out, is how deep the integration actually is. Is it just like a relational database bundled with the OS (like how many Linux distros bundle MySQL and Postgres?) Or does it reach deep down into the OS kernel? There is not much technical info available on OS/400 internals, and I wonder if IBM's marketing/evangelism makes the integration sound deeper than it really is?

But, the idea I had was you can use SQL queries to locate ordinary files. Imagine if I had a view VW_FILES which contains metadata (name/directory/size/etc) of every file in my filesystem, and I could query it with SQL. I don't believe OS/400 / IBM i offers that kind of feature. It just allows SQL queries of the contents of database files.

pjmlp · on Oct 12, 2018

Last time I touched OS/400 was in 1993, where my sole job was to manage weekly tape backups, so I don't remember much technical details.

I ended up reading OS/400 Redbooks years later.

However I remember that all file related activities were done via the catalog management tools, files weren't visible as such.

protomyth · on Oct 12, 2018

The Newton had a really nice object-database for a file system: https://en.wikipedia.org/wiki/Soup_(Apple)

snuxoll · on Oct 13, 2018

IBM i isn't a mainframe operating system, that would be Z/OS - there's a world of difference between the two and the hardware they run on.

That being said, both share some similar design choices in data storage, among other things.

pjmlp · on Oct 13, 2018

I tend to put all IBM and Unisys systems that survived UNIX under the same mainframe umbrella, even if it isn't technically correct.

GavinMcG · on Oct 13, 2018

As the file clerk for a small law firm, I've been thinking about exactly this. The fact that an OS exposes only hierarchical folders is such a loss.

Even aliases aren't well thought out, at least on Windows. No automatic updating if the name changes, no "find folders that contain an alias to this one", etc.

Don't get me started on versioning. Apple's Time Machine is a decent interface, but it's solving a different sort of problem.

laumars · on Oct 13, 2018

There are a multitude of different tools you can use to handle versioning, from file system snapshots (if you want time based versioning), to VCS such as git (if you want transaction based versioning) and then there is stuff like Dropbox and S3 which will do versioning per file change.

Each will have their own pros and cons but it’s fair to say there are a wealth of options available.

GavinMcG · on Oct 13, 2018

Absolutely, but that requires me to implement and set up long-term maintenance for a software system, and then train non-technical people in its use. Not gonna happen.

The other thing is that the model has been the basis for software's approach, in a way that has too much inertia. For example, Word documents that are drafts are saved as individual files. Why? If the filesystem enabled it, programs could save deltas and have an interface to freeze a draft.

laumars · on Oct 15, 2018

> Absolutely, but that requires me to implement and set up long-term maintenance for a software system, and then train non-technical people in its use. Not gonna happen.

You'd have to train people how to use versioning even if versioning was baked into the file system itself.

> The other thing is that the model has been the basis for software's approach, in a way that has too much inertia. For example, Word documents that are drafts are saved as individual files. Why? If the filesystem enabled it, programs could save deltas and have an interface to freeze a draft.

Those temporary files are there as a volatile backup of the running state of the program. It's not really the same thing as a "draft" so shouldn't be treated the same. That is unless you're literally talking about unsaved documents in which case I'm not really sure how you expect the file system to predict what you want to do with a file that even Word - you're exampled running application - doesn't yet know what the user intends.

So what you're asking for there doesn't really make a whole lot of sense. That is unless you're trying to describe an auto-save feature - in which case Word does have literally that feature already bundled as part of it's default set up. Of course you do still need to save the file once to at least nominate where to auto-save to. But saving your drafts is good practice anyway (particularly on Windows).

sdegutis · on Oct 12, 2018

I vaguely remember hearing (maybe ~2007) that Microsoft was starting to use an actual real database for their filesystem, and that they then abandoned the idea. Could this be related to the transaction API you mention, or a separate project? Either way, seeing none of the major OSes go in that direction in the past 10-20 years gives me confidence that it's probably just not practical.

skissane · on Oct 12, 2018

Two different things. Transactional NTFS actually shipped as part of Vista, and still exists in Windows 10, but was rarely used, and is now deprecated.

Vista/Longhorn was also supposed to feature WinFS, which was integrating NTFS with an embedded version of SQL Server, so you could do SQL queries to find files on your filesystem. This was pulled from Vista before release, because it wasn't ready. Some of the enhancements to SQL Server developed as part of the project were released as part of SQL Server 2008 (e.g. FILESTREAM data type), but the core idea was abandoned.

That wasn't the first time Microsoft had tried something like that. In the 1990s, their Cairo project had an object-based filesystem (OFS). That got abandoned as well, although just like WinFS, some of the technologies developed for it actually got released and used. (e.g. COM Structured Storage, which was used as the basis of the file format in older version of Office.)

Phrodo_00 · on Oct 12, 2018

> Filesystem transactions

Journaling file systems all have transactions, do you want to run multiple actions in the same transaction? that'd be neat.

> Schema enforcement

How would the database know that a blob of bytes is an image or an xml file? Will you need to implement heuristics for every possible file type, or do users establish the type of the file? if the later, what's the point?

> Querying

unix `find` can do that. It's also pretty ugly

wongarsu · on Oct 12, 2018

>Journaling file systems all have transactions, do you want to run multiple actions in the same transaction? that'd be neat.

What I want is a transaction in the sense of "these actions should all succeed or all fail, and any observer should only see the initial or final state, never any intermediate states produced by the actions". The lack of such a mechanism feature leads to a lot of bugs and security critical race conditions.

For example let's say I want to do rename /var/logs and create a new /var/logs that should be used from then on. Normally I would do something like 'mv /var/logs /var/logs.old && mkdir /var/logs' (or the equivalent in my language's API). But what if the first command succeeds and the second fails? What if somebody executes 'ln /etc /var/logs' after the execution of the first and before the execution of the second command?

> unix `find` can do that. It's also pretty ugly

for windows, Everything [1] is great. It builds and maintains an index of all filenames (and some file attributes) and searches as fast as you can type. Great for finding everything if you know any part of its filename. Such a capability is the norm for databases, but rare for file systems.

1: https://www.voidtools.com/

kazinator · on Oct 12, 2018

In the Unix and GNU/Linux world, "locate" with its updatedb database builder has existed before Windows 95.

It's part of a the standard install of various mainstream distros.

To find all cpp files, just:

  locate '***.cpp'

Find all .conf files under /etc:

  locate '/etc/**.conf'

https://en.wikipedia.org/wiki/Locate_(Unix) says:

"locate was first created in 1982.[1] The BSD and GNU Findutils versions derive from the original implementation."

michaelmrose · on Oct 13, 2018

You could make a zfs clone of a given filesystem, do the operation to the clone and then promote the clone to be the master to effectively have this. If I understand correctly the promotion operation is atomic I don't think it would be possible to see an intermediate state.

This said you couldn't have multiple operations in flight at once for the same dataset.

jtr_47 · on Oct 12, 2018

How about hooking into SQLite in some way, in which an instance of it acts like a file system.

laumars · on Oct 13, 2018

I had started work on a FUSE file system which used MySQL as a backend for a file system. I opted for MySQL because it offered a few features over SQLite which was desirable to me at the time but it would have been trivially easy for me to switch DB engines to SQLite (particular as I do have a fair amount of experience embedding SQLite into applications already)

Sadly time commitments got in the way so I’d only gotten as far as building the read interfaces (in effect it only got as far as being a read only file system).

nerdponx · on Oct 12, 2018

consumes a block device and database mapping a hierarchy of tree-nodes with stringly-keyed names to seekable byte-buffers

Sounds pretty much like a filesystem to me. Isn't this basically what Linux already does?

idle_zealot · on Oct 12, 2018

OP is describing a file system in the text you quoted. So I should hope that it sounds like what Linux does now.

jonjacky · on Oct 12, 2018

Robert Strandh's LispOS proposal [1] has a single-level object store as an alternative to a conventional file system. See especially [2] and [3]

[1] https://github.com/robert-strandh/LispOS

[2] https://github.com/robert-strandh/LispOS/blob/master/Documen...

[3] https://github.com/robert-strandh/LispOS/blob/master/Documen...

frou_dh · on Oct 12, 2018

From the README of this specific project, it's clear that the line between where the language ends and the OS starts is blurred.

If an OS is an exclusive environment for a single language, then that language's abstractions for handling data could be pushed much deeper than we're used to.

vbuwivbiu · on Oct 12, 2018

images