I understand conceptually how you could architect a desktop/server OS based on a local database instead of a local filesystem, and it's deeply intriguing to me in terms of how everything could share a common data language that is far more flexible than files but far more structured than text. Presumably things like terminal output would be formatted as query results instead of text. The database wouldn't reside in files on disk, it would reside directly in blocks on disk and in-memory as required.
But this seems to be proposing a distributed database that runs across computers, which confuses me. Anything that runs across computers seems to me to be application-level, not OS-level, by definition. What does it even mean to be running a distributed database on "microkernel services" instead of on fully-fledged OS's? And then... where is the OS-level CPU coming from? If the database is distributed among computers then fine, but which computer is "running" the OS?
Both the nomenclature and distributed aspect are really throwing me off here.
So I would strongly disagree with your notion that an OS cannot be a distributed system. Several OS's, such as Plan 9, were explicitly designed as distributed systems from the beginning. For Plan 9, running it on a solo computer is more the edge case than the common case they designed for, which was a small team with workstations sharing a central server.
All of the computers are "running" the OS. The OS is more than one service on more than one machine.
"Research and experimentation efforts began in earnest in the 1970s and continued through the 1990s, with focused interest peaking in the late 1980s. A number of distributed operating systems were introduced during this period; however, very few of these implementations achieved even modest commercial success."
So looks like they're trying to keep the idea going?
I think there is a very real need for a distributed operating system. Operating systems ultimately just virtualize hardware, and its clear to anyone that has used VMware or Xen that we're building such a system piecemeal and constantly reinventing the operating system wheel. I think this is absolutely where OS design is inevitably destined to evolve into.
It wasn't the right time yet. Virtualization has only just come to the data center and taken off. People no longer have their single expensive PC, but utilize a multitude of devices. Cloud storage is just making its way onto people's phones and devices. The full suite of VMware products is essentially already a commercially lucrative operating system for the data center / cloud. I think this is a long term trend rather than a fad. Its not the hottest new framework, but rather the culmination of incremental changes over decades. We might not call it an operating system - we'll probably give it a fancier name to generate hype. But the idea is the same none the less.
Your comment is about 10 years out of date. Possibly more.
VMWare usage is actually on the decline now with many companies switching to cloud services (which are generally running Xen). And containerisation, while not a like-for-like replacement for virtualisation, has also eaten aware at some of VMWares market share. Particularly with services like k8s. Those that are dependent on running Windows might still use VMWare but Microsoft Hyper V has also eaten away at VMWares market share too.
* Conditions apply. Use the comment in the context of the HN bubble, and ignore that a huge part of the industry is still deploying wordpress on VPS, using excel spreadsheets for database and gmail for bug tracking.
The GP was talking about data centres. I’ve worked in data centres and thus replied with the same context as the GP was discussing.
Also people deploying Wordpress on a VPS and using Excel are categorically NOT doing any form of distributed computing, which is what this discussion is about. So it’s correct to discount those contexts.
What I’m noticing here is a lot of people are confused about the terms “distributed” and “data centre”.
But even in cloud computing, the really compelling offerings (ie serverless) is all containers. Xen is only used as an additional security boundary (ie Docker running on top of Xen). Containers are the software abstraction providing the “distributed” aspect of the GPs argument.
And as for Hyper-V, that’s only there for people locked into the Windows ecosystem. You rarely see much distributed computing happening in Windows. Frankly it’s silly to even mention Windows in the same conversation as Plan 9 and other distributed systems.
Okay, but lets say containers are how we manage applications at scale. And so now you want these containers to still be distributed among regions, among different computers, to be increasingly interconnected, online, to be efficient. Maybe even you want kubernetes on bare metal for extra performance. Lets say that's the trend. Where is it going? Its moving towards grouping together heterogenous hardware in different locations as a single abstraction and managing that. You're going to want to abstract away various forms of memory, processors, GPUs, networking, file systems, and you're going to want something like a task manager, you're going to want to manage permissions and licenses. That is, you want the next layer of abstraction beyond that of a single computer. Docker, virtualization, hyperconverged disaggregated infrastructure - whatever the specific incarnation is, they represent different solutions to the same underlying, long-term trends.
You’re making a really vague meta point here. Yes I do agree that scaling horizontally has is presently more economical than scaling vertically. But that has also been true for years — for about as long as x86 servers have been around. So even longer than you’re original argument about virtualisation. Furthermore you don’t really need any distributed layer to achieve that either. In the early days I used to manage fleets of x86 bare metal servers with little more than a few shells scripts I hastily cobbled together.
Saying “infrastructure needs to scale” is such a generalised truism that it doesn’t really contribute anything to the discussion. And the way how Plan 9 manages scale vastly different to how Docker, VMWare and other solutions manage scale.
I do get the general point you’re making. I honestly do. But as I said in my initial post, it’s not a new trend nor emerging trend like you claimed. It’s already been the industry norm for the lifetime of most engineers careers.
So the real crux of our disagreement isn’t about technology nor whether infrastructure at scale is even needed, it’s the timeline you suggested. You’re out by a couple of decades — so far out that entire architectural designs to solve these problems have come and gone.
If you were looking for a technology that is inevitably on its way out, I'd reckon that something like OpenVZ might be a better fit for such criteria, albeit for different reasons (being pinned to an older kernel version for the most part).
> VMWare usage is actually on the decline now with many companies switching to cloud services (which are generally running Xen). And containerisation, while not a like-for-like replacement for virtualisation, has also eaten aware at some of VMWares market share. Particularly with services like k8s. Those that are dependent on running Windows might still use VMWare but Microsoft Hyper V has also eaten away at VMWares market share too.
Containers are pretty great, however I wouldn't say that they're always exclusive to virtualization - I've seen plenty of useful setups splitting up physical hardware in virtual machines per project/team for access control and hard resource limit reasons and then the team using container orchestrators inside of those for easier deployment/management of apps and further resource limits on a per-service basis (e.g. how one would otherwise use something like systemd slices).
Sure, there are benefits to running containers directly on hardware, but also some challenges associated with it, compared to the VM based setup which gives you more flexibility (including not running containers in parallel for legacy systems and/or teams where they aren't a good fit) and the benefit of a technology that has been around for a long time, as opposed to needing to worry about picking the correct rootless runtime and figuring out how to better enforce resource limits across different teams with varying quality of engineering standards (Kubernetes namespaces are helpful here, but Kubernetes is not the only orchestrator you might use, the requirements for which might also vary on a per-project basis).
I don't really have a horse in the race, I merely enjoy the benefits of the (various) virtualization solutions, container orchestrators and runtimes, as well as any number of storage abstractions (GlusterFS and Ceph come to mind) and networking solutions (WireGuard seems like a pretty cool recent one) that come to mind. More so, many of those can easily work in tandem in many cases. Building on the progress of the past few decades seems like a pretty decent idea and I don't really see VMware or other solutions as on their way out anytime soon.
None of that might be very relevant for the serverless cloud, though, or people who just run managed Kubernetes in the cloud and don't care about the actual infrastructure, but that is nowhere near anyone.
Im just posting my observations having managed on prem and data centres over the last couple of decades.
> Containers are pretty great, however I wouldn't say that they're always exclusive to virtualization
I literally said it’s not a like for like replacement. Plenty of SaaS offerings run virtualisation for security with containers on top. But in those instances the VMs aren’t used for distributed computing; they’re used as a security layer. It’s the containers that perform the distributed aspects.
Funny thing about steam engines is that we think about them as a thing of the past. But in reality steam engines just scaled up and turned into steam turbines, which are a backbone of society
They succeeded just fine. Most people don't have a use for a distributed operating system, so they were not commercially viable. Since many of them were never meant to be commercial products, it's not really accurate to claim they were unsuccessful. The principles involved work fine to this day.
Today’s filesystems are more like the NoSQL equivalent of hierarchical databases, which were the very first database design, created in the 1960s, preceding RDBMSs.
There are plenty of databases without associated query languages. RocksDB, FoundationDB, and LMDB to name three. "Queries" are generally "list keys in this range" and all of the higher-level metadata you would want to query on (e.g. relations) are encoded in the key. I would absolutely put filesystems in this camp.
Considering things like wildcards and glob matching these don't seem far off from what you'd use in a graph db query language. (E.g. something like neo4j)
Databases have schema that can change (understood without having to scan content) and indexes - beyond trivial ones like "name; content" and "order written to disk".
Anything that runs across computers seems to me to be application-level, not OS-level, by definition
Novell Netware was also marketed as a Network Operating System, in the sense that none of its services were confined to the local machine: its entire purpose was to combine a network of computers into a single management unit.
Disagree, IBM i uses files: how do you create a database table in it? CRTPF command ("Create Physical File"). You create a file.
And I don't know what you mean by "catalogs", in the context of IBM i. Are you talking about DB2 catalog views? (Which exist in DB2 on every other platform, and most other RDBMS have something equivalent, such as the ANSI standard INFORMATION_SCHEMA)
Or are you confusing IBM i with MVS (in which the OS contains databases called "catalogs", in which you lookup a file name or file name prefix to find out which disk volume a file is stored on?)
Makes sense. Although, what's the difference between a "library" and a "directory", other than the name? I think the difference is more terminological than conceptual.
"OS-level CPU" is just normal CPU that's been configured and handed out by some program (conventionally, the OS) for other things to use. If that initial process management program (call it an OS) also supports communicating with similar setups and receiving/dispatching jobs and queries then that's precisely the distributed OS you're looking for.
You could _also_ architect a comparable system as user-space networking on top of a conventional OS, but that's by no means required.
But this seems to be proposing a distributed database that runs across computers, which confuses me. Anything that runs across computers seems to me to be application-level, not OS-level, by definition. What does it even mean to be running a distributed database on "microkernel services" instead of on fully-fledged OS's? And then... where is the OS-level CPU coming from? If the database is distributed among computers then fine, but which computer is "running" the OS?
Both the nomenclature and distributed aspect are really throwing me off here.