To people which don't understand the overall decision to create another system, ...

jpfr · on April 11, 2018

A huge portion of Linux is drivers and support for different processor architectures. Yes, development was chaotic in the nineties and the code showed. But a lot of engineering effort went into making the core really nice.

https://unix.stackexchange.com/a/223763

With regards to POSIX, it is amazing how well this API is holding up. There are quite a few implementions from GNU, BSDs, Microsoft (at least partial support in MSVC) and a few others (e.g. musl). So POSIX support is a given on most systems. Why replace it with something that breaks existing code?

https://www.musl-libc.org/faq.html

Not to say there is no bloat. But some bloat is the patina that all succesful systems take on over time. Is the bloat small enough to be managed and/or contained? I say yes.

derefr · on April 11, 2018

> So POSIX support is a given on most systems. Why replace it with something that breaks existing code?

You're not necessarily breaking existing code. Both macOS and Windows are built on non-POSIX primitives that have POSIX compatibility layers.

It seems that the conclusion most of industry has reached is that, whether or not POSIX is a useful API for your average piece of software, there are still better base-layer semantics to architect your kernel, IPC mechanisms, etc. in terms of than the POSIX ones. You can always support a POSIX "flavor" or "branded zone" or "compatibility subsystem" or whatever you want to call it, to run other people's code, after you've written all your code against the nicer set of primitives.

An potentially-enlightening analogy: POSIX is like OpenGL. Why do people want Vulkan if OpenGL exists? Well, because Vulkan is a more flexible base-layer with better semantics for high-efficiency use-cases. And if you start with Vulkan, the OpenGL APIs can still be implemented (efficiently!) in terms of them; whereas if you start with an OpenGL-based graphics driver, you can't "get to" (efficient) Vulkan support from there.

All that aside, though, I would expect that the real argument is: Fuchsia is for ChromeOS. Google are happy to be the sole maintainers of ChromeOS's kernel and all of its system services, so why not rewrite them all to take advantage of better system-primitive semantics? And Google doesn't have to worry about what apps can run on a Fuchsia-based ChromeOS, because the two ways apps currently run on ChromeOS are "as web-apps in Chrome", or "as Linux ELF executables inside a Linux ABI (or now also Android ABI) sandbox." There is no "ChromeOS software" that needs to be ported to Fuchsia, other than Chrome itself, and the container daemon.

emn13 · on April 11, 2018

Total speculation: but I seriously doubt that Fuchsia is specifically for chromeOS. The whole point of decent, efficient, simple, non-bug-prone APIs is that you probably want to implement pretty much everything on it. Simplicity and low-overhead allow for generality and flexibility.

If all you wanted to do was support chromeOS - well, typically you can add hacks even to a messy codebase to support specific usecases. And there are a bunch of linux and ?BSD distros that demonstrate that you can adapt such a system to even very small devices; small enough that there's not much niche left below. Moore's Law/Denard scaling may be comatose on the high-end; but lot's of long-tail stuff is generations behind; which implies that even really low-power IoT stuff that linux is currently ill-suited for will likely be able to run linux without too many tradeoffs. I mean; the original raspberry pi was a 65nm chip@700MHz - that's clearly overkill; and even if chip development never has a breakthrough again, there's clearly a lot of room for those kind of devices to catch up, and a lot of "spare silicon" even in really tiny stuff once you get to small process nodes.

But "being able to run linux" doesn't mean it'll be ideal or easy. And efficiency may not be the only issue; security; cost; reliable low latency... there are a whole bunch of things where improvements may be possible.

I'm guessing Fuchsia is going to be worse than linux for ChromeOS - in the sense that if ChromeOS really was what google wants it for, they could have gotten better results with linux than they'll be able to get with Fuchsia in the next few years and at a fraction of the cost. Linux just isn't that bad; and a whole new OS including all the interop and user-space and re-education pain is a huge price to pay. But the thing is: if they take that route they may end up with a well tuned linux, but that's it.

So my bet is that you'd only ever invest in something like Fuchsia if you're in it for the long run. They're not doing this "for" ChromeOS, even if that may be the first high-profile usage. They're doing this to be enable future savings and quality increases for use cases they probably don't even know they have, yet. In essence: it's a gamble that might pay off in the long run, with some applicability in the medium term - but the medium term alone just doesn't warrant the investment (and risk).

derefr · on April 11, 2018

I guess I left a bit too much implicit about my prediction on what Google's going to do: I have a strong suspicion that Google sees the Linux/POSIX basis of Android as an albatross around its neck. And ChromeOS—with its near-perfect app isolation from the underlying OS—seems to be a way of getting free of that.

ChromeOS has already gained the ability to run containerized Android apps; and is expecting to begin allowing developers to publish such containerized Android apps to the Chrome Web Store as ChromeOS apps. This means that Android apps will continue to run on ChromeOS, without depending on any of the architectural details of ChromeOS. Android-apps-on-Android prevent Android from getting away from legacy decisions (like being Linux-based); Android-apps-on-ChromeOS have no such effect.

I suspect that in the near term, you'll see Google introducing a Chrome Web Store for Android, allowing these containerized, CWS-packaged Android apps to be run on Android itself; and then, soon after that, deprecating the Play Store altogether in favor of the Chrome Web Store. At that point, all Android apps will actually "be" ChromeOS apps. Just, ones that contain Android object files.

At that point, Google can take a Fuchsia-based ChromeOS and put it on the more powerful mobile devices as "the new Android", where the Android apps will run through Linux ABI translation. But in this new Android (i.e. rebranded ChromeOS), you'll now also have the rest of the Chrome Web Store of apps available.

Google will, along with the "new Android", introduce a new "Android Native SDK" that uses the semantics of Fuchsia. Google will also build a Fuchsia ABI layer for Linux—to serve as a simulator for development, yes, but more importantly to allow people to install these new Fuchsia-SDK-based apps to run on their older Android devices. They'll run... if slowly.

Then, Google will wait a phone generation or two. Let the old Android devices rot away. Let people get mad as the apps written for the new SDK make their phones seem slow.

And then, after people are fed up, they'll just deprecate the old Android ABI on the Chrome Web Store, and require that all new (native) apps published to the CWS have to use the Fuchsia-based SDK.

And, two years after that, it'll begin to make sense again to run "the new Android" on low-end mobile devices, since now all the native apps in the CWS will be optimized for Fuchsia, which will—presumably—have better performance than native Android apps had on Android.

notriddle · on April 12, 2018

From a branding perspective, that would be terrible. They've already invested a bunch in Google Play brand that isn't Android Apps (Play Music, Play Books, etc).

Seems more likely they'll allow HTML apps into the Play Store, eventually getting rid of the Web Store entirely. They've already done the WebAPK stuff to glue HTML apps into Android.

pjmlp · on April 12, 2018

Google IO schedule has just been published.

Ironically they have two sessions named "The future of the Android app model and distribution on Google Play".

derefr · on April 12, 2018

If, as I suspect, they'd be willing to rename ChromeOS to be "just what Android is now" (like how Mac OS9 was succeeded by NeXTStep branded as Mac OSX), then I don't see why they wouldn't also be willing to rebrand the Chrome Web Store as "what the Google Play Store is now." Of course, they'd keep the music, books, etc.; those are just associated by name, not by backend or by team.

But they wouldn't keep the current content of the Play (Software) Store. The fact that every Android store—even including Google's own—are festering pits of malware and phishing attempts, is a sore spot for Google. And, given their "automated analysis first; hiring human analysts never (or only when legally mandated)" service scaling philosophy, they can't exactly fix it with manual curation. But they would dearly love to fix it.

Resetting the Android software catalogue entirely, with a new generation of "apps" consisting of only web-apps and much-more-heavily-containerized native apps (that can no longer do nearly the number of things to the OS that old native apps can do!) allows Google to move toward a more iOS-App-Store-like level of "preventing users from hurting themselves" without much effort on their part, and without the backlash they'd receive if they did so as an end unto itself. (Contrast: the backlash when Microsoft tried that in Windows 8 with an app store containing only Metro apps.)

I expect that the user experience would be that, on Fuchsia-based devices, you'd have to either click into a "More..." link in the CWS-branded-as-Play-Store, or even turn on some setting, to get access to the "legacy" Play Store, once they deprecate it. It'd still be there—goodness knows people would still need certain abandonware things from it, and be mad if it was just gone entirely; and it'd always need to stick around to serve the devices stuck on "old Android"—but it'd be rather out-of-the-way, with the New apps (of which old Chrome Apps from the CWS would likely be considered just as "new" as newly-published Fuchsia apps upon the store's launch) made front and centre.

> Seems more likely they'll allow HTML apps into the Play Store, eventually getting rid of the Web Store entirely.

I would agree if this was Apple we were talking about (who is of a "native apps uber alles" bent) but this is Google. Google want everyone to be making web-apps rather than native apps, because Google can (with enough cleverness repurposed from Chrome's renderer) spider and analyze web-apps, in a way it can't spider and analyze native apps. Android native apps are to Google as those "home-screen HTML5 bookmark apps" are to Apple: something they wish they could take back, because it really doesn't fit their modern business model.

muro · on April 12, 2018

> The fact that every Android store—even including Google's own—are festering pits of malware and phishing attempts, is a sore spot for Google.

Lol, citation needed.

jhasse · on April 12, 2018

And then they will stop releasing Fuchsia's (and Android's) source code and become the new Microsoft of the 90s.

__s · on April 11, 2018

Take for example their "replace Intel ME with Linux" project

jacksmith21006 · on April 11, 2018

Agree 100%. I very much doubt it is to replace ChromeOS.

bitmapbrother · on April 11, 2018

They may not replace ChromeOS, but I'm going to guess they'll replace the Linux kernel with Zircon.

jacksmith21006 · on April 13, 2018

Completely agree. I very much doubt Fuchsia is to replace ChromeOS.

mediocrejoker · on April 11, 2018

> You're not necessarily breaking existing code. Both macOS and Windows are built on non-POSIX primitives that have POSIX compatibility layers.

I was under the impression that MacOS was built from a POSIX kernel, all the way down.

derefr · on April 11, 2018

Nope; the XNU (macOS) kernel has Mach semantics (https://developer.apple.com/library/content/documentation/Da...), not POSIX semantics.

XNU does embed BSD (and so POSIX) semantics into the kernel—some of which are their own efficient primitives, since there's no way to efficiently implement them in terms of Mach. But whatever BSD syscalls can be implemented kernel-side in terms of Mach primitives, are.

monocasa · on April 12, 2018

It has both. Mach system calls are negative, BSD system calls are positive. The BSD side has system calls for stuff like fork() that would otherwise be pretty clearly in Mach's domain.

jacksmith21006 · on April 11, 2018

Hard to imagine going to be used for ChromeOS before Android. Android runs native on Chromebooks because it shares a common kernel with Android so it can run in a container.

That would be lost which is a huge deal. The new gnu/Linux on ChromeOS would be fine as it runs on a VM and would still work.

Now they could move Android to using a VM but that it less efficient and most importantly takes more RAM and CBs do not normally have a ton of RAM.

pjmlp · on April 12, 2018

Linux kernel is not directly exposed to userspace Android apps, so it is actually irrelevant.

In fact, starting with Android 7 they have started to lock down NDK apps that try to use APIs not listed as stable.

jacksmith21006 · on April 13, 2018

It is relevant as unless they replace both ChromeOS and Android kernels you lose the ability to run Android native on a ChromeOS box.

bitmapbrother · on April 11, 2018

With the introduction of Project Treble wouldn't a kernel swap be relatively easy providing the Treble interface contracts are met?

jacksmith21006 · on April 13, 2018

It would have to be as efficient as Linux as the big plus with ChromeOS is the ability to have a peppy computer on minimal hardware.

I am not convinced a micro kernel would be able to achieve.

naasking · on April 11, 2018

> So POSIX support is a given on most systems. Why replace it with something that breaks existing code?

Because POSIX has horrible security properties, and does not provide enough guarantees to create truly robust software. See for instance, the recent article on how you simply cannot implement atomic file operations in POSIX -- sqlite has to jump through 1,000 hoops to get something pretty robust, but it shouldn't be this way.

rurban · on April 11, 2018

The biggest problem is the blocking IO. With async throughout you can design a proper concurrent system.

zaarn · on April 12, 2018

The Linux kernel has Async IO. For a while now.

You don't need async for a proper concurrent system. Systems have been concurrent before async IO. The trick is that when a process does an IO you yield it and put it on a wait list and run literally anything else until the kernel receives the OK from the hardware control and resumes the process.

Using Async IO in Linux merely means your specific thread won't be immediately suspended until you get data (or it's in a reasonably close cache)

It would be quite silly if Linux would wait for every IO synchronously, any single core system would immediately grind to a halt.

cturner · on April 12, 2018

  > Linux kernel has Async IO

Linux offers some async io features, but does not offer async throughout.

In a fully async platform you would be able to do general-purpose programming without ever needing to use multithreading.

Example of a situation you can't do in linux: have a program doing select(2) or equivalent on both keyboard input and network input, in a single thread.

Since linux does not support this, you are steered to adapt solutions that are more complicated than a pure async model would be,

* Spin constantly looking for activity. These heats up your computer and uses battery.

* Have short timeouts on epoll, and then glance for keyboard input. This leads to jerky IO.

* Have child processes block on these operations and use unix domain sockets to feed back to a multiplexor (fiddly, kernel contention).

* The child-process thing but with shmem (fiddly)

* Something equivalent to the child process thing, but with multiple threads in a single process. (fiddly)

You would think that x-windows might help out here. What if you had a socket to X, and then multiplexed on that, instead of looking for keyboard input from a terminal? This opens new issues: what if X has only written half of an event to your socket when select notifies you? Will your X library handle this without crashing?

Rurban's comment above is correct. Linux is not async throughout.

On OSs that offer kevent you can get a fair bit further, but (I believe) you still can't do file creation/deletion asynchronously.

cturner · on April 12, 2018

This is broken. (Woke from a sleep, face-palmed. I have been in Windows IPC land and got my wires crossed, sorry.) In linux you can select on both the stdin fd and network sockets in a single call. There is a way to get a fd for AIO also. AFAIK the sync-only file creation/deletion stands.

zaarn · on April 13, 2018

Linux is async throughout since it can do task switching on blocking.

You're not the only task running.

Or do you expect Linux to pause the system when you read a file until the DMA is answered?

Linux itself is fully async, anything blocking (even interrupts to some extend) will be scheduled to resume when the reason it blocked is gone or alternatively check back regularly to resume.

A program running on Linux can do a lot of things async as mentioned via POSIX AIO. It's not impossible, go seems to do fine on that front too (goroutines are put to sleep when doing blocking syscalls unless you do RawSyscall).

The conclusion that lack of 100% asynchronity means it can't be properly concurrent is also wrong. As evident that sending something to disk also doesn't not halt the kernel.

pjmlp · on April 12, 2018

Which isn't part of POSIX.

zaarn · on April 12, 2018

AIO is definitely part of POSIX.

http://pubs.opengroup.org/onlinepubs/009695399/basedefs/aio....

pjmlp · on April 12, 2018

I thought you were speaking about some Linux specific APIs.

zaarn · on April 12, 2018

There are some Linux specific APIs that build on this and I've got the most experience with Linux so I was refering to that.

But Async IO is part of POSIX.

I was mostly responding to the comment which cited synchronous IO as a problem with POSIX. Linux is most widely used so on top of being my primary domain on the issue, it's going to ensure best understanding.

elderK · on April 12, 2018

That's not to say it's implemented as efficiently as it could be.

See here: https://linux.die.net/man/7/aio

AFAIK, system-specific APIs such as epoll or kqueue are the way to go if you're serious about performing asynchronous IO?

dingo_bat · on April 12, 2018

Is there an example of an os which doesn't have blocking io? To me it seems that blocking io will be needed at some level. You can of course put wrappers around it and present it as async. But in many applications you want to go as low as possible, and in my imagination blocking calls will be the lowest.

zzzcpan · on April 12, 2018

It's the other way around. Low level is all async everywhere, blocking is sort of just telling the kernel to wait for async op to complete before running the process.

rurban · on April 12, 2018

Midora, Singularity in an OS. Pony in a language.

The goal is to avoid blocking, not to offer async also. That's a lost cause.

Fuchsia once had the goal to offer async only, but then some manager decided that he needs some blocking, and then it was gone.

non-blocking only is lower level than blocking. You can always wait indefinitely for that callback, but usually you have a default timeout. L4 for example offers that API.

mempko · on April 11, 2018

And yet here I am, working day in and day out on a POSIX system, uptime in the years range... strange.

naasking · on April 12, 2018

Anything can be made to work with enough effort! Imagine how much easier it would be if POSIX actually had a better API.

StillBored · on April 11, 2018

> So POSIX support is a given on most systems. Why replace it with something that breaks existing code?

POSIX is a lowest common denominator API, its under specified and loosely interpreted. Which follows from its initial goals, which was basically to specify the bits common between various UNIX implementations. Those implementations obviously didn't want to change their implementation to match a rigid standard, so a lot of wiggle room exists in the standard.

The end result is that is pretty much useless for anything beyond "hello world" kinds of applications, both in terms of portability, as well as actual behavior (I could list a lot of cases, but lets leave that to google, with the starting idea to look at a couple posix API's, say close()'s errno's and the differing cases and what causes them on different OS's/filesystems). That is why there isn't a single OS out there that is _ONLY_ POSIX compliant. You need look no further than the 15 year old https://personal.opengroup.org/~ajosey/tr28-07-2003.txt and consider that the gap has widened as more performance or security oriented core API's have been introduced and linux's POSIX layer is further refined upon them.

Plus, the core API's in no way reflect the hard reality of modern hardware, leaving the standard even more under specified in the case of threads and async IO, which have been poorly bolted on.

Then there are all the bits everyone ignores, like the bits about the posix shell (ksh), and how certain utilities behave, while completely ignoring important things like determining metadata about the hardware one is running on.

I leave you with: https://stackoverflow.com/questions/2693948/how-do-i-retriev...

and https://stackoverflow.com/questions/150355/programmatically-...

Which is pretty basic information about a modern machine.

pjmlp · on April 12, 2018

> POSIX is a lowest common denominator API,...

Only useful for CLI tty old style apps and daemons.

kuschku · on April 11, 2018

> So POSIX support is a given on most systems. Why replace it with something that breaks existing code?

POSIX says that usernames and uids can overlap, and if a program takes either, it should accept the other a well. And if something could be a username or uid, it should be presumed to be a username, and resolved to a UID.

Now assume you have a user "1000" with uid 2000, and a user "2000" with uid 1000... you can see where this goes.

And this is why the BSDs have broken POSIX compatibility and require uids to be prefixed with # when they’re used on the CLI.

JdeBP · on April 14, 2018

No, it does not say that. It says, specifically for certain standard utility programs such as chown and newgrp, that particular program arguments should always be considered to be user names if such user names exist even if they happen to be in the forms of numbers.

Nor is what you claim about the BSDs true. Aside from the fact that using the shell's comment character would be markedly inconvenient, especially since the colon is a far better choice for such a marker (c.f. Gerrit Pape's chpst), the OpenBSD and FreeBSD implementations of chown and newgrp do not have such a marker either documented or implemented, and in fact operate as the SUS says.

User names and user IDs are, by contrast, explicitly called out by the standard as strings and integers, two quite different things which have no overlap.

* http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_...

dmitryminkovsky · on April 11, 2018

> But some bloat is the patina that all succesful systems take on over time.

What a nice analogy.

chris_wot · on April 11, 2018

One man’s “bloat” is another man’s feature.

temac · on April 12, 2018

> But a lot of engineering effort went into making the core really nice.

I quite like Linux but I would not say the current state of the core kernel is "really" nice. Somewhat nice, maybe, but "really" nice? You just have to look around to see exemples of non-nice things (too much things in some headers, jumping between source files back and forth for no reason, excessive use of hand coded vtables, internal "framework" that are not as simple as they should, at other time lack of abstractions and excessive access to internals of structures, etc). Granted, it is way less buggy than some other softwares, but I think I'll know when I read a really nice core, and for now I've never had the feeling that Linux has one.

Still find it quite good, to be clear.

gnulinux · on April 12, 2018

FreeBSD?

jontro · on April 11, 2018

Looks like some parts of POSIX is implemented. See this page for their rationale: https://fuchsia.googlesource.com/docs/+/master/the-book/libc...

dralley · on April 12, 2018

>Why replace it with something that breaks existing code?

https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f02...

hawski · on April 11, 2018

I recognize that Linux and especially GNU are bloated. However the Linux LOC overtime chart is useless at demonstrating it. AFAIK size increase of Linux is mainly caused by drivers. Device support is overall the most important thing OS offers. And it's hard to be small and elegant while supporting everything under the sun, which Linux can't even really achieve. One could say something about epoll instead of kqueue, ALSA instead of OSS or other similar cases. But the size itself doesn't say much.

I agree that there is much to be explored yet in OS design. However it needs deep pockets to actually support wide array of hardware. Or a long time.

I cheer for Toybox and Oil shell regarding the GNU userland.

When I see all the openat(2) and friends family of functions or chmod/fchmod, chdir/fchdir, barbaz/fbarbaz I'm thinking that it all needs a bit of a clean up. Some batching system call would be appreciated as after Meltdown syscalls are more expensive.

I personally like and keep my fingers crossed for DragonflyBSD. They are doing things that are innovative while keeping conservative core for the lack of better words. But at the same time DragonflyBSD has minuscule hardware support comparing to Linux.

salatschussel · on April 12, 2018

I don't think changing the usage of system calls in the kernel as a reaction to Meltdown is a good workaround to make. This could lead to CPU makers relying on the workaround rather than fixing the issue in their CPU architecture. The other side of the medal is that this increases the motivation to switch to a newer CPU, which generates more waste, and the fact that CPUs running slower need to run longer, and use more energy, both of which is bad for the environment, and costs money.

joelthelion · on April 11, 2018

Don't worry, by the time it is complete and mature, it will be complex and full of quirks too.

I'm not convinced that even Google has the resources to build something like that successfully. Except perhaps if they target a very specific use case.

johnklos · on April 11, 2018

"not convinced that even Google"

Google CAN'T do something small, simple and elegant, partly for the very reason that they're too big.

eli_gottlieb · on April 11, 2018

Google also can't be convinced to carry a project through to completion and release /salty.

qarioz · on April 11, 2018

Google is very good in marketing. For example go.

weberc2 · on April 12, 2018

Why do people say this? What marketing has Google ever done for Go?

mkl · on April 12, 2018

Sponsoring Go introduction workshops for university undergrads, complete with Google-swag prizes and actual Google employees flown in from another country.

So, quite a lot if that experience is anything to go by.

yebyen · on April 12, 2018

I will say that my professor Axel Schreiner at RIT offered one of the two first Golang classes at a collegiate level back in... 2009? and he reached out to Google and said, "send us some Android phones so that we can develop Go on ARM"

They obliged with a full crate of first generation Motorola phones, each one preloaded with a 30 day free Verizon plan. Every person who took that class got one, and surely all of them made it back into the school's hands at the end of the quarter.

(I'm not sure how many people actually ever compiled and executed any go binaries on arm that year; we all learned Go, and it was a great class! But as far as the class, the phones were completely unnecessary. I think that they did make a more relevant class where the phones were able to be used again the year after that.)

pjmlp · on April 12, 2018

Its branding and companies using the language with hopes of being acquired by them is already good enough.

Had Go been released at AT&T and it would have shared the same fate as Limbo.

weberc2 · on April 12, 2018

> companies using the language with hopes of being acquired by them

This is beyond belief. What companies are using Go with the hopes of being acquired by Google? Does anyone honestly believe that Google's acquisitions teams know or care about programming languages? Any business that acquires companies on that basis is doomed to failure, as is any company that hopes to be acquired on that basis.

geodel · on April 12, 2018

Well if one doesn't like something, they have to come up with part hilarious and part wild assertions like this.

ddalex · on April 12, 2018

as what ?

geodel · on April 12, 2018

The idea here is if it worked its marketing but if it doesn't its market that has spoken.

skykooler · on April 12, 2018

Making it the first Google result for "go"? (Instead of the verb, or the game)

zaarn · on April 12, 2018

When I search "go" on google the first result I get is a package courier, then the game, then the verb... then comes Golang though.

voidfunc · on April 11, 2018

Go has been a success as well because of that marketing machine.

weberc2 · on April 11, 2018

Counterpoint: the Go programming language.

panic · on April 12, 2018

Did Go start as an official Google project? It always seemed like more of a side project that happened to be developed by Google engineers.

weberc2 · on April 12, 2018

It has always been associated with Google for as long as it has been known outside of Google. :)

OtterCoder · on April 11, 2018

Which failed in its stated mission and fell back to being yet another just-ok web scripting language.

kybernetikos · on April 11, 2018

Java also has been a phenomenal success, just not as a language for applets.

jrs95 · on April 11, 2018

Um...what? Are you thinking of Dart?

silotis · on April 11, 2018

Go initially billed itself as a language for systems programming, but that claim was quickly retracted when it turned out that Go's creators had a different notion of systems programming than everyone else.

geodel · on April 12, 2018

Not everyone. Just self-proclaimed authority who decided systems means operating system or some embedded code. Lots of companies I worked have title Systems Engineer or departments Systems engineering which has nothing to do with Operating systems but just some internal applications.

dingo_bat · on April 12, 2018

> Just self-proclaimed authority who decided systems means operating system or some embedded code

Systems does mean that. There are 2 broad categories of software you can write. One is software that provides a service to the user directly. That is an application. The other kind is software that provides a service to applications. That's systems software. Do you think there's something wrong with this notion? It's pretty well accepted over the decades:

http://en.wikipedia.org/wiki/System_programming

geodel · on April 12, 2018

Well by that definition Docker, Kubernetes, etcd and so on are systems software. But people here somehow explicitly make it to mean Computer Operating Systems.

slrz · on April 12, 2018

Sure, and Go's most prolific users are in infrastructure software for distributed systems. You seem to agree with the quoted statement.

weberc2 · on April 12, 2018

By this definition, Python and JavaScript are systems languages.

jrs95 · on April 12, 2018

This was all over with way before the 1.0 release even happened. There's no point in arguing over something that was addressed several years ago before Go even reached stability. Plus, trying to say that Go was a "failure" because of this is absurd. It was an issue of terminology, not technology. Given Kubernetes, Docker, etc you would have to be totally delusional to claim that Go has been a failure.

kk__ · on April 11, 2018

30 years is a good time frame to judge a programming language.

ehsankia · on April 11, 2018

But Google as a whole isn't working on this. A lot of the projects at Google require collaboration across multiple teams, but I could see Fuschia being done by a fairly small and cohesive team.

bitmapbrother · on April 12, 2018

The last time I counted there were well over a hundred people just committing code alone.

https://fuchsia-review.googlesource.com/q/status:open

kev009 · on April 11, 2018

What do you make of, for instance, OS X or VxWorks?

Alupis · on April 11, 2018

OSX is based on Darwin, which is a Unix (BSD-like) operating system.

I'm certain you can find complexities and bloat in there too, if you look.

catwell · on April 11, 2018

I concur, macOS is a nightmare. I didn't think so before I had to write code for it, but I am currently writing code that runs on 5 OSs (Linux, Windows, macOS, Android and iOS) and macOS is by far the worst of all. In particular, everything related to filesystems was a nightmare, especially HFS+. Thankfully Apple is replacing it with the much more sane APFS.

I don't have much experience with them but I think VxWorks and QNX are good examples of simpler OSs. I do have some experience with Minix3 and it is certainly one. I guess the BSDs stand somewhere in the middle.

dfox · on April 11, 2018

QNX always had practically oriented limitations of the general microkernel idea (eg. QNX native IPC/RPC is always synchronous) which allowed it to be essentially the only reasonably performant true micro kernel OS in the 90's. Unfortunately it seems that after QNX got repeatably bought by various entities the OS got various weird compatibility-with-who-knows-what hacks.

shp0ngle · on April 11, 2018

Oh hell yeah

OS X has tons of weird quirks and compatibility mindfucks going back to their transition from OS 9 (all those .DS_Store and _filename files for "resource forks")

Backwards compatibility is #1 reason for increasing complexity. Apple is sometimes good in cutting away compatibility for the sake of cleaning up, but there are still weird issues poking now and then

hell the whole NSEverything is a compatibility thing with NextStep, which is long dead.

catwell · on April 11, 2018

Apple doesn't really appeat to care about backward compatibility though. They break lots of things with every release of the OS. I would give that excuse to Microsoft, but not Apple.

Shorel · on April 11, 2018

True. But the legacy code is still there.

Which is the worst of both worlds.

jernfrost · on April 12, 2018

Disagree. Microsoft keep making new APIs and depricating old ones. With Mac you got cocoa which goes back to 1989 and you can still use.

It might not be backwards compatible but from a developers perspective it is nice to be able to reuse old knowledge.

I think Apple has been much better than MS or Linux in continously modernizing and upgrading what they have. On windows and linux things tend to become dead ends as new flashy APIs appear.

Sure old win32 and motif apps might still run but nobody really develops using these APIs anymore.

On windows I first use win32, then MFC, then it was WinForms. Then all of that got depricated and we got WPF, silverlight and then I sort of lost track of what was going on. Meanwhile on Linux people used tcl/tk for GUIs early on. And there was motif, wxeindows. KDE and Gnome went through several full rewrites.

If we look at MacOS X as the modern version of NeXTSTEP the core technology has been remarkable stable.

Sure they have broken compatibility plenty of times but the principles and API are at their core the same.

pjmlp · on April 13, 2018

Quickdraw VR, Quickdraw 3D, NetTalk, JavaBridge, Carbon, Objective-C GC, RubyCocoa, WebObjects, ....

One just needs to look into the right spot.

blattimwind · on April 11, 2018

Doesn't work, isn't maintained, but remains as an attack vector and failure mode.

kev009 · on April 11, 2018

I'm referencing the abilities of F500 companies to sustain OS development. OS X has roots in mach, and some BSD, but to call it either one is trivializing the amount of work that has gone on.

Waterluvian · on April 11, 2018

Of course. And they don't even pretend that's untrue. Every few releases they focus on lowering bloat.

wvenable · on April 11, 2018

OS X is literally the next version of NeXTStep and, as others have pointed out, was built on other OS technologies.

dfox · on April 11, 2018

And NeXTStep itself is to large extent one big ugly hack that stems from experience of trying to build Unix on top of Mach. In fact it is not microkernel, but monolithic kernel running as one big Mach task, thus simply replacing user/kernel split with task/priviledged-task split (which to large extent is also true for OS X).

bitofhope · on April 11, 2018

>Unix on top of Mach

Didn't Mach start out as a replacement kernel for BSD? Building a Unix on top of Mach is like building a truck over truck chassis.

GeekyBear · on April 11, 2018

Correct. The Mach research project at Carnegie Mellon aimed to build a replacement kernel for BSD that supported distributed and parallel computing.

Next's VP of Software Engineering, Avie Tevanian, was one of the Mach project leads. Richard Rashid, who lead the Mach project ended up running Microsoft Research's worldwide operations.

Their work on a virtual memory subsystem got rolled back into BSD.

https://en.wikipedia.org/wiki/Mach_(kernel)

The Computer History Museum has an interesting long form interview with Avie:

Part 1: https://www.youtube.com/watch?v=vwCdKU9uYnE Part 2: https://www.youtube.com/watch?v=NtpIFrOGTHk

dfox · on April 11, 2018

It did not. Mach is traditional microkernel which provides IPC mechanism, process isolation and (somewhat controversially) memory mapping primitives and not much else.

In late 80's/early 90's there were various projects that attemted to build unix on top of that as true micro kernel architecture with separate servers for each system service. Performance of such design was horrible and there are two things that resulted from that that are still somewhat relevant: running whole BSD/SysV kernel as Mach task, which today means OS X and Tru64 (at the time both systems had same origin as both are implementations of OSF Unix) and just ignoring the problem which is approach taken by GNU/Hurd.

snvzz · on April 12, 2018

>slowness

Worth noting that's because Mach IPC is extremely slow, an order of magnitude slower than L4 family.

BurningCycles · on April 11, 2018

AFAIK OSX is built on mostly pre-existing components like Mach and FreeBSD. I know nothing about VxWorks.

saagarjha · on April 11, 2018

I don't quite see where you're trying to go with these. Are they supposed to be examples of POSIX systems or something else?

michaelmrose · on April 11, 2018

Google already has a track record of building half an OS in android and its a pretty mediocre one with some redeeming qualities.

discreteevent · on April 11, 2018

The fuchsia part of android is linux? Anyway Google has a lot of people working for it. I don't think anyone would have expected NT to come from Microsoft at the time that it did.

bitmapbrother · on April 11, 2018

Is this the same OS that smokes an iPhone X with an A11? Not bad for "half an OS".

https://www.youtube.com/watch?v=B65ND8vUaKc

michaelmrose · on April 12, 2018

You posted a video wherein the tester simply sequentially opens and closes a series of apps on a Samsung and Apple device seeing which will run through the sequence faster...and the Samsung was a bit slower.

In theory it makes me wonder if iphone's storage is slightly faster than the latest galaxy or if the process by which one loads an iphone app is slightly faster/more efficient than the one by which an android app is loaded or if the tiny selection of apps the reviewer picked are just better optimized for iphone. Nobody smoked anyone and nothing of note was learned by anyone. So much so that I wonder why you bothered to watch said link or paste it here.

I said half an OS because its built on technologies like linux and java not because its half assed even though it is.

bitmapbrother · on April 12, 2018

>You posted a video wherein the tester simply sequentially opens and closes a series of apps on a Samsung and Apple device seeing which will run through the sequence faster...and the Samsung was a bit slower.

Certain apps were slower to load on the Samsung device. Additionally, the Samsung device encoded the 4K siginificantly video faster and took round 1 by 14 seconds and round 2 by 16 seconds.

>In theory it makes me wonder if iphone's storage is slightly faster than the latest galaxy or if the process by which one loads an iphone app is slightly faster/more efficient than the one by which an android app is loaded or if the tiny selection of apps the reviewer picked are just better optimized for iphone. Nobody smoked anyone and nothing of note was learned by anyone. So much so that I wonder why you bothered to watch said link or paste it here.

The iPhone X has faster storage and a significantly faster SoC. A 30 second win by the Samsung phone is what I could call getting smoked.

>I said half an OS because its built on technologies like linux and java not because its half assed even though it is.

And iOS is was a decedent of MacOS which itself is a decedent of NeXTSTEP. It also uses a language 11 years older than Java. So it sounds like iOS also meets your criteria of being "half assed".

SlowBro · on April 11, 2018

> Sometimes it's even more simple re-inventing the wheel than understand why a wheel was build a fractal design

When I saw your comment I immediately was reminded of Joel on Software: "The single worst strategic mistake that any software company can make is to rewrite the code from scratch."[1]

[1] https://www.joelonsoftware.com/2000/04/06/things-you-should-...

mikekchar · on April 11, 2018

Tips on surviving a rewrite in a mid-large sized company.

1) Get yourself placed on the "Legacy team" that is supposed to "minimally support the application while we transition to the new system".

2) Do whatever the hell you want with the code base (i.e., refactor as much as you want) because nobody cares about the boring legacy system.

3) Respond directly to user/customer needs without having to worry about what upper management wants (because they are distracted by the beautiful green-field rewrite).

4) Retain your job (and probably get a promotion) when they cancel the rewrite.

dane-pgp · on April 12, 2018

Alternatively, tips for not surviving a rewrite in a mid-large sized company.

1) Get stuck on the "Legacy team", as expensive contractors are called in to produce the rewritten version from scratch in an unrealistically small fraction of the time the original took.

2) Be told you can only fix bugs (with more layers of short term hacks), not waste time with refactors that have "no business value" for a code base that will be scrapped "soon".

3) Don't add any new features, to prevent the legacy system becoming a perpetually moving target that would delay the beautiful green-field rewrite even longer.

4) Hate your job, and watch your coworkers leave in disgust, further starving the team of resources, until you end up quitting too.

vorg · on April 12, 2018

My last two IT jobs ever were both for businesses which cancelled huge rewrites after being sold to another company. Someone up top pitched the businesses as white elephants ripe for massive cost savings half-way during the rewrite.

The programmers with the political skills to get put in the "Rewrite team" will have had their jobs in the "Legacy team" protected in case of project cancellation. Or they will knife you in the back to get their old jobs back -- they will know about the cancellation and have plenty of time to maneuver before you know what's going on.

pjmlp · on April 12, 2018

There is another path to survival:

1) Get on the new team

2) Ensure to get into the more interesting tech modules

3) Improve your CV

4) Use your soft skills to mingle with everyone and be aware of where the wind blows

5) In case of iceberg ahead, jump ship to a better job taking advantage of the new skills

SlowBro · on April 12, 2018

There is yet another path to survival:

1) Do a startup

ddalex · on April 12, 2018

Or, simpler

1) Quit job 2) move to the woods 3) shun all electrical devices 4) mediate until you become one with the universe

mr_toad · on April 12, 2018

On the other hand, Firefox today is much better than Netscape ever was. And with the never re-write philosophy we wouldn’t have Rust.

candiodari · on April 12, 2018

I think that actually speaks to Joel's point. None of those were rewrites.

Firefox started as a fresh UI reskin for the old browser interface, and indeed continued as "mostly a skin" for years (incidentally, so did Chrome).

(You can still get the non-firefox skin incidentally, it's "Mozilla Seamonkey")

Then the Rust rewrite. Rust was a hobby project, and they rewrote layout engine interface, and nothing more than that. Then CSS.

Now it's an HTML parser, a layout engine, a parallel CSS engine, and a GPU webpage renderer (still "missing"/non-rust are 2d compositing and javascript). Each of those components replaced another in a working version and there were at least beta versions of firefox of all the parts.

jhasse · on April 12, 2018

You don't know what we would have instead.

washadjeffmad · on April 12, 2018

Potential is worthless. We have Rust, we don't have what might have been produced had Rust not happened, and as far as I know, no one is working on that hypothetical product.

reificator · on April 11, 2018

I live by that quote every day, but it's not a silver bullet.

tialaramex · on April 11, 2018

It isn't, but in twenty years getting paid to write software I have far more regrets where I rewrote and shouldn't have, than where I should have rewritten and didn't.

If you're Google and you have people with these abilities kicking about, it's probably not a crazy investment to see what happens. We've got a HN story elsewhere in the list on post-quantum key agreement experiments in Chrome, again there's a fair chance this ends up going nowhere but if I was Google if throw a few resources at this just in case.

But on the whole I expect Fuchsia to quietly get deprecated while Linux lives on, even if there's lots to like about Fuchsia.

yohui · on April 11, 2018

> post-quantum key agreement experiments in Chrome

Link for reference: https://news.ycombinator.com/item?id=16811554

z3t4 · on April 11, 2018

you usually get it right on the second time. only hope the first dont get too popular.

noir_lord · on April 11, 2018

or run the entire business (when it works) without which the whole company would grind to a halt.

A rewrite made no sense to me since I'd end up maintaining version A alongside version B with B constantly lagging A unless I severely restricted the scope of B in which case it'd be an incomplete (though better written/more maintainable A).

Instead I went the isolate (not always easy), shim, rewrite, replace, remove shim approach.

It does feel a bit like spinning plates blindfold sometimes in the sense I'm always expect to hear a crash.

So far I've replaced the auth system, the reports generation system, refactored a chunk of the database, implemented an audit system, changed the language version, brought in proper dependency management, replaced a good chunk of the front end (jquery soup to Vue/Typescript), rewritten the software that controls two production units and implemented an API for that software so that it isn't calling directly into the production database.. and done it without any unplanned down time (though I'm still not sure how - mostly through extensive testing and spending a lot of time on planning each stage).

It's slower because I have to balance new features against refactor time but I have management buy-in and have kept it, mostly through been extremely clear about what I'm working on and what the benefits are and have built up some nice momentum in terms of deploying new stuff that fixes problems for users.

The really funny part is that even though I'm re-factoring ~40% of the time I'm deploying new features faster than previous dev who wrote the mess...because I spent the time fixing the foundations in places I knew I'd need for new features going forwards.

dfox · on April 11, 2018

In my experience the second time leads to architecture astronautics... third time is when you get it right.

Althought in OS space one might argue that second generation of time sharing OSes (TOPS-10, ITS, MCP...) got more things right than are right in Unix and such.

kps · on April 12, 2018

Perhaps I'm subject to a giant whoosh here, but this subthread is recapitulating The Mythical Man-Month piece by piece.

zaarn · on April 12, 2018

Reinventing well explored areas of software engineering from first principles!

tandr · on April 11, 2018

I like the term "architecture astronautics". There is probably a joke hiding somewhere in a plain sight about "Plan 9" wrt Unix...

dfox · on April 11, 2018

For OS it means that you should pick one abstraction for process state and one abstraction for IO and in fact you can have same abstraction for both. In this view Plan 9 makes sense while modern Unix with files, sockets, AIO, signals, various pthread and IPC primitives and so on does not (not to mention the fact that on every practical POSIX implementation various such mechanisms are emulated in terms of other synchronisation mechanisms)

kyrra · on April 11, 2018

(the below is my ignorant understand of drivers in Fuschia and linux)

It's not just the kernel design, it's the driver design as well. Fuschia has drivers being ELF binaries that hook into Device Manager process[0]. I believe they are developing a standard API for how drivers interact with Fuschia.

Linux doesn't really have this today, which means that the driver must be in the kernel tree to have it keep up with the kernel's interface. And with ARM lacking something like a BIOS for phones, each chip that Qualcomm et al make requires a full BSP to get it working. And from what I understand, Linus and others don't want these (relatively) short-lived processors to be checked into mainline linux (plus you have the lag time of that code being available). Android's Project Treble[1] aims to address this some by creating a stable API for hardware makers to use to talk with Linux, but it is not an ideal solution.

[0] https://fuchsia.googlesource.com/zircon/+/HEAD/docs/ddk/over...

[1] https://android-developers.googleblog.com/2017/05/here-comes...

sorisos · on April 12, 2018

There is device tree [0] to resolve the problem with chip support but I guess it would make "universal" drivers more complex and harder to maintain if all hardware features are to be supported (in an effective way). Seems like the Fuschia model might handle this better.

[0] https://en.wikipedia.org/wiki/Device_tree

petecox · on April 11, 2018

> Linus and others don't want these (relatively) short-lived processors to be checked into mainline linux

I don't think that's true. e.g. 4.17 is accepting code to run Samsung Galaxy S3 (released nearly 5 years ago). It is left to volunteers since most vendors dump a source release to comply with the GPL but themselves don't upstream it to Linus.

StillBored · on April 11, 2018

Because it can take 5 years, in which time the product is obsolete.

But, the core idea behind SBSA and SBBR is to form a common platform to which the vendors conform their machines so they don't have to keep up-streaming special little drivers for their special little SOCs. Only time will tell if its a success, but a large part of the community has effectively declared that they aren't really going to conform to the ideals of a standard platform. So, the ball keeps rolling and new DT properties keep getting added, and new pieces of core platform IP keep showing up. At this point arm64, despite just being a few years old already looks as crufty as much older architectures with regard to GIC versions, dozens of firmware interfaces, etc due to the lack of a coherent platform/firmware isolation strategy.

ansible · on April 11, 2018

> ... POSIX is not so great as well ...

Ugh, tell me about it. Just dealing with a relatively simple situation where you've got signals coming in while reading a pipe is a hassle to get completely correct.

I am so there for an operating system API that is relatively simple and sane. Where writing correct programs is, if not the easy path, at least not the hard and obscure path.

derefr · on April 11, 2018

I'd be down for an Erlang-like OS ABI (or, to put that another way, a Windows-GUI OS ABI, but for even non-GUI processes): just message-passing IPC all the way down. OS signals, disk IO, network packets, [capabilities on] allocated memory, [ACL'ed] file handles, etc: all just (possibly zero-copy) messages sitting in your process's inbox.

Of course, it's pretty hard to deal with a setup like that from plain C + libc + libpthread code, so OSes shy away from it. But if your OS also has OS-global sandboxed task-thread pools (like macOS's libdispatch)—and people are willing to use languages slightly higher-level than C where tasks built on those thread-pools are exposed as primitives—then it's not out of the question to write rather low-level code (i.e. code without any intermediating virtual-machine abstraction) that interacts with such a system.

pavlov · on April 11, 2018

> ... message-passing IPC all the way down

Wasn't QNX like that?

dfox · on April 11, 2018

QNX IPC mechanism is essentially cross-process function call, ie. message sender is always blocked while waiting for message reply and the server process cannot meaningfully combine waiting for message and some other event.

Edit: in essence QNX messages work like syscalls, with the difference that there are multiple "systems" that accept "syscalls". For what it's worth Solaris has quite similar IPC mechanism that is mostly unused.

my123 · on April 11, 2018

The NT Native API is relatively elegant, especially Alpc (a shame that it's mostly hidden by Win32 tho)

vardump · on April 12, 2018

The biggest blunder is that ALPC is not accessible through Win32 at all. ALPC is much more sane than any other IPC Windows got.

Sure, you can use ALPC, but only by using undocumented unstable NT API...

Sigh.

henesy · on April 12, 2018

Plan 9?

Viper007Bond · on April 11, 2018

Interestingly PowerShell is petty awesome in that regard. You pass around objects instead of trying to parse strings and things.

sleepydog · on April 11, 2018

I don't think that's what the parent was complaining about, it was more about handling signals correctly. It's incredibly easy to write buggy signal handlers on a Linux system. The "self-pipe" trick[1], and later, signalfd(2) have made signal handling much easier, but a lot of programs still do it the old way.

[1]: http://cr.yp.to/docs/selfpipe.html [2]: http://man7.org/linux/man-pages/man2/signalfd.2.html

vardump · on April 11, 2018

PowerShell is not Windows API, win32 is. So your comment makes little sense.

reificator · on April 11, 2018

Powershell is just that, a shell. I agree that Powershell itself is pleasant enough to use but that's not what's being discussed.

msla · on April 11, 2018

> You pass around objects instead of trying to parse strings and things.

That can't be very good for debuggability.

pzone · on April 11, 2018

Type checking and calling help methods can be useful for debuggability! If you want to figure out what you're looking at in string format, call its .ToString method.

msla · on April 12, 2018

Adding extra complexity just means more to go wrong. Plain text can't really go wrong because anything can use it, anything can edit it, anything can show it. With "objects" I'm betting that Powershell itself never fails.

msla · on April 11, 2018

"Bloat" is a loaded term. It's meaningless to say "Look at all this code! It's bloated!" when you don't know the reason those lines are there, and comparing two "solutions" is meaningless when they don't solve all the same problems.

Mainly, I'm just getting tired of people trying to cut things out of the solution by cutting them out of the problem.

Saying that it's possible to solve a problem in a simpler fashion is fine. Saying a problem shouldn't be solved, that nobody should have that problem, so therefore we won't solve it, is not fine if you then turn around and compare your cut-down solution to the full solution.

hackcasual · on April 11, 2018

Fuchsia is a microkernel architecture, so I think it being "more efficient" generally is not going to be the case. I do think it is valuable to see a microkernel architecture with large scale backing, as it simplifies security and isolation of subprocesses.

guidoism · on April 11, 2018

"More efficient" in terms of running LINPACK, maybe not. But the raw throughput of highly numeric scientific calculations isn't the goal of all architectures, even though we pretend it is.

It's possible to be more efficient at showing a bit of text and graphics which is what mobile phones do a lot more of than raw number crunching, except for games, except for games of course.

hackcasual · on April 12, 2018

LINPACK would probably run equivalently. Anything that just needs the CPU will work about the same. It's overhead like networking/disk/display where microkernels lose out. Not saying that's overall a reason not to use, as the tradeoffs in terms of isolation/simplicity/security are an area very much worth investigating.

zzzcpan · on April 12, 2018

For networking and disk io monolithic kernel has to be pretty much completely bypassed already if you want high performance, see netmap/vale architecture for example.

Not sure about display though, but don't expect monolithic kernel to help here somehow either.

zaarn · on April 12, 2018

Userspace implementations of various protocols usually suffer from various problems, most notoriously that applications can't share an interface (how would you if both try to write ethernet frames at the same time?) and lackluster performance in low-throughput scenarios (high throughput != low latency and high packet throughput != high bandwidth througput)

GPUs don't have much security at all, there is lots of DMA or mapped memory. Though on most modern monolithic kernels a lot of this work is either in modules (AMDGPU on Linux is usually a module not compiled into the kernel) or even userspace (AMDGPU-Pro in this case). Mesa probably also counts.

Microkernels aren't the ideal kernel design. Monolithic isn't either. I put most of my bets on either Modular Kernels if CPUs can get more granual security (MILL CPU looks promising) or Hybrid Kernels like NT where some stuff runs in Ring 0 where it's beneficial and the res in userspace.

zzzcpan · on April 12, 2018

> that applications can't share an interface

Of course they can share an interface, I even pointed out the vale switch as an example of this [1]. And it is very fast.

The thing is isolation and granularity that microkernels happen to have force certain design and implementation choices that benefit both performance and security on modern systems. And monolithic kernels while theoretically can be as fast and as secure actually discourage good designs.

[1] http://www.openvswitch.org/support/ovscon2014/18/1630-ovs-ri...

zaarn · on April 13, 2018

It doesn't look like netmap is actual raw access to the interface like I mentioned.

I also severely doubt that microkernels encourage efficient design. I'll give you secure but it's not inherent to microkernels either (NT is a microkernel, somewhat, and has had lots of vuln's over the years, the difference between microkernels and monolithic or hybrids like NT is that most microkernels don't have enough exposure to even get a sensible comparison going)

IMO microkernels encourage inefficient designs as everything becomes IPC and all device drivers need to switch ring when they need to do something sensitive (like writing to an IO Port unless the kernel punches holes into ring 0 that definitely don't encourage security).

Monolithic kernels don't necessarily encourage security but definitely efficiency/performance. A kernel like Linux doesn't have to switch priv ring to do DMA to the harddisk and it can perform tasks entirely in one privilege level (esp. with Meltdown, switching ring is a performance sensitive operation unless you punch holes into security).

I don't think monolithic kernels encourage bad design. I think they are what people intuitively do when they write a kernel. Most of them then converge into hybrid or modular designs which offer the advantages of microkernels without the drawbacks.

zzzcpan · on April 13, 2018

You are assuming that switching priv ring is a bottleneck, which it isn't. The cost of the switch is constant and is easily amortizable, no matter the amount of stuff you have to process.

zaarn · on April 16, 2018

The cost of a switch is non-zero. For IPC you need to switch out the process running in the CPU, for Syscalls to drivers a microkernel will have to switch into priv ring, then out, wait for the driver, then back in and back out, as it switches context.

A monolithic, hybrid or modular kernel can significantly reduce this overhead while still being able to employ the same methods to amortize the cost that exists.

A microkernel is by nature incapable of being more efficient than a monolithic kernel. That is true as long as switching processes or going into priv has a non-zero cost.

The easy escape hatch is to allow a microkernel to run processes in priv ring and in the kernel address so the kernel doesn't have to switch out any page tables or switch privs any more than necessary while retaining the ability to somewhat control and isolate the module (with some PT trickery you can prevent the module from corrupting memory due to bugs or malware)

hackcasual · on April 12, 2018

A microkernel gives programs less direct access to hardware.

gnufx · on April 11, 2018

The reason a microkernel wouldn't be more efficient is that the OS is irrelevant for the (rather useless) LINPACK benchmark. However, I want a microkernel system and capabilities for HPC. The microkernel-ish system I used in the '80s for physics was pretty fast.

lallysingh · on April 11, 2018

Or may be better at splitting up work across cores. Less lock/cache contention, better logical separation, etc.

bad_user · on April 11, 2018

No, it won’t. This is not the user land you’re talking about and in general the idea that multiple, isolated processes can do better on the same CPU, versus a monolithic process that does shared memory concurrency is ... a myth ;-)

lallysingh · on April 11, 2018

For throughput, separate processes on separate cores with loose synchronisation will do better than a monolith. You don't want to share memory, you want to hand it off to different stages of work.

Consider showing a webpage. You have a network stack, a graphics driver, and the threads of the actual browser process itself. It's substantially easier to about bottlenecking through one or more locks (for, say an open file table, or path lookup, etc) when the parts of the pipeline are more separated than a monolithic kernel.

bad_user · on April 12, 2018

“Handing off” via sharing memory is much more efficient than copying.

Lock-free concurrency is also achievable.

Again, this isn’t the user land we’re talking about, in the sense that the kernel is expected to be highly optimized.

Granted, a multi process architecture does have virtues, like stability and security. But performance is not one of them.

lallysingh · on April 12, 2018

Handing off means to stop using it and letting someone else use it. Only copy in rare cases.

Lock free concurrency is typically via spinning and retrying, suboptimal when you have real contention. It's better not to contend.

Kernel code isn't magic, its performance is dominated by cache just like user space.

High performance applications get the kernel out of the way because it slows things down.

bad_user · on April 12, 2018

> Lock free concurrency is typically via spinning and retrying, suboptimal when you have real contention.

Lock free concurrency is typically done by distributing the contention between multiple memory locations / actors, being wait free for the happy path at least. The simple compare-and-set schemes have limited utility.

Also actual lock implementations at the very least start by spinning and retrying, falling back to a scheme where the threads get put to sleep after a number of failed retries. More advanced schemes that do "optimistic locking" are available, for the cases in which you have no contention, but those have decreased performance in contention scenarios.

> Handing off means to stop using it and letting someone else use it. Only copy in rare cases.

You can't just let "someone else use it", because blocks of memory are usually managed by a single process. Transferring control of a block of memory to another process is a recipe for disaster.

Of course there are copy on write schemes, but note that they are managed by the kernel and they don't work in the presence of garbage collectors or more complicated memory pools, in essence the problem being that if you're not in charge of a memory location for its entire lifetime, then you can't optimize the access to it.

In other words, if you want to share data between processes, you have to stream it. And if those processes have to cooperate, then data has to be streamed via pipes.

> High performance applications get the kernel out of the way because it slows things down.

Not because the kernel itself is slow, but because system calls are. System calls are expensive because they lead to context switches, thrashing caches and introducing latency due to blocking on I/O. So the performance of the kernel has nothing to do with it.

You know what else introduces unnecessary context switches? Having multiple processes running in parallel, because in the context of a single process making use of multiple threads you can introduce scheduling schemes (aka cooperative multi-threading) that are optimal for your process.

zzzcpan · on April 12, 2018

System calls are not the reason the kernel is bypassed. The cost of the system calls is fixable. For example it is possible to batch them together into a single system call at the end of the event loop iteration or even share a ring buffer with the kernel and talk to the kernel the same way high performance apps talks to the nic. But the problem is that the kernel itself doesn't have high performance architecture, subsystems, drivers, io stacks, etc., so you can't get far using it and there is no point investing time into it. And it is this way, because monolithic kernel doesn't push developers into designing architecture and subsystems that talk to each other purely asynchronously with batching, instead crappy shared memory designs are adopted as they feel easier to monolithic developers, while in fact being both harder and slower to everyone.

1pfdthrow · on April 11, 2018

"better" meaning what exactly? Are you talking about running a database with high throughput, recording audio with low latency, or computing pi?

lallysingh · on April 12, 2018

And even on the latency side, you just want the kernel out of the damn way.

bad_user · on April 12, 2018

Given the topic we’re discussing, I don’t know what you’re talking about.

__d · on April 11, 2018

macOS (and iOS, tvOS, watchOS, etcOS) are built on a microkernel too (Mach).

It's not an automatic security win.

larkost · on April 11, 2018

You are mixing things up a little bit. Darwin (the underlying kernel layer of MacOS X and the rest) is actually a hybrid between a microkernel and a regular kernel. There is a microkernel there, but much of the services layered on top of it are done as a single kernel. All of that operating within one memory space. So some of the benifits from a pure microkenel are lost, but a whole lot of speed is gained.

So from a security standpoint MacOS X is mostly in the kenel camp, not the microkernel one.

bitmapbrother · on April 11, 2018

According to Wikipedia - the XNU kernel for Darwin, the basis of macOS, iOS, watchOS, and tvOS is not a microkernel.

The project at Carnegie Mellon ran from 1985 to 1994, ending with Mach 3.0, which is a true microkernel. Mach was developed as a replacement for the kernel in the BSD version of Unix, so no new operating system would have to be designed around it. Experimental research on Mach appears to have ended, although Mach and its derivatives exist within a number of commercial operating systems. These include all using the XNU operating system kernel which incorporates an earlier, non-microkernel, Mach as a major component. The Mach virtual memory management system was also adopted in 4.4BSD by the BSD developers at CSRG,[2] and appears in modern BSD-derived Unix systems, such as FreeBSD.

phlakaton · on April 12, 2018

This was, more or less, the driving philosophy behind BeOS. Therein lie some lessons for the prospective OS developer to consider.

Say what you will about how terrible POSIX is, Be recognized the tremendous value in supporting POSIX: being able to support the mounds and mounds of software written for POSIX. It chose to keep enough POSIX compatibility to make it possible to port many common UNIX shells and utilities over, while Be developers could focus on more interesting things.

So where were the problems?

One huge problem was drivers, particularly once BeOS broke into the Intel PC space. Its driver interface and model was pretty darn slick, but it was different, and vendors wouldn't support it (take the problems of a Linux developer getting a spec or reference implementation from a vendor and multiply). This cost Be, and its developer and user community, quite a bit of blood, sweat, and tears.

Another big problem was networking. Initially, socket FDs were not the same thing as filesystem FDs, which had a huge impact on the difficulty of porting networked software over to BeOS. Eventually, BeOS fixed this problem, as the lack of compatibility was causing major headaches.

The lesson: if you are looking to make an OS that will grow quickly, where you will not be forced to reinvent the wheel over and over and over again, compatibility is no small consideration.

pjmlp · on April 12, 2018

Android and ChromeOS don't expose POSIX to userspace, it hasn't hurt them.

phlakaton · on April 13, 2018

Which points to what ended up happening with BeOS: it became an internet appliance OS, and then the core of a mobile product. These were areas where the hardware and application spaces were quite constrained, and BeOS's competitive advantage in size and performance could be leveraged.

littlestymaar · on April 12, 2018

Bloat is what you get when your software has to face the real world.

Yes the Linux kernel is bloated, bloated with tons of code responsible for making it work on exotic hardware. Yes x86 is bloated, let's just remove those useless AVX instructions. Yes MS Excel is bloated, who the hell is using all those features [1]?!

You can only have two alternatives : either your software is “bloated”, or then it will be replaced by something else, which is more “bloated” and works for more people.

Notice than I'm just criticizing the “bloat” argument, I'm just not criticizing Google for creating a new OS from scratch, which can bring a lot on the table, if done properly and includes innovations from the past 30 years, like was done when creating Rust for instance.

[1]: https://www.joelonsoftware.com/2001/03/23/strategy-letter-iv...

andrepd · on April 11, 2018

I honestly don't get what your 3rd reference is complaining about. That software has... more features and is faster (with the tradeoff being code size)?

nerraga · on April 11, 2018

Is TLPI available for free now? It's a great book and I hope that people will continue to support both NoStarch and the author.

jwilk · on April 11, 2018

I see no evidence of the book becoming free on its homepage: http://man7.org/tlpi/

https://doc.lagout.org/ looks quite shady, so I guess someone uploaded their copy without copyright holder's permission.

brianon99 · on April 12, 2018

LOC is not a good way to measure the "bloatness" of a software. There is a significant amount of device driver code in linux kernel. With the number of devices exponentially increasing, it is inevitable, but it does not make the kernel more complex.

A truck is not more complex than a car. A truck is bigger because it is designed to carry more load.

ryao · on April 12, 2018

That article by Columbia claims that dbus is not POSIX, yet the communication occurs over a UNIX domain socket. I do not think that is a good example of not using POSIX for IPC. The underlying mechanism that makes it work is part of POSIX. It just extends what is available to provide IPC from user space in a way that people find more convenient.

agumonkey · on April 11, 2018

how much of linux LoC are drivers and arch versus generic code ?

CodeArtisan · on April 12, 2018

Can't tell how many LoC but there is an initiative about producing tiny linux kernel that you can read about here

https://www.linux.com/news/event/open-source-summit-na/2017/...

In the video, it is show that the binary size of "core" linux has barely grown

https://youtu.be/ynNLlzOElOU?t=14m45s

agumonkey · on April 12, 2018

Pretty cool. I had no idea about this project, thanks.

harikb · on April 11, 2018

Didn't think I will see a day where Linux-bashing didn't include a comparison to FreeBSD

jwilk · on April 11, 2018

This is TLPI cover: http://man7.org/tlpi/cover/TLPI-front-cover.png

I don't know what this thing is, but it doesn't look very appealing.

52-6F-62 · on April 11, 2018

Mind the digression, but it's a Fiddlehead, and they're delicious. https://en.wikipedia.org/wiki/Fiddlehead_fern

jasonjayr · on April 11, 2018

It's a Fiddlehead fern :)

https://en.wikipedia.org/wiki/Fiddlehead_fern

kraig911 · on April 11, 2018

Maybe they've developed fear of FOSS after this oracle debacle and just want to greenfield the whole thing out of fear.

bitmapbrother · on April 11, 2018

I think it's pretty clear why they created Fuchsia:

1. They want a modular microkernel OS with modern capabilities.

2. They want to eventually replace their internal Linux based OS with Fuchsia.

3. Fuchsia, or parts of it, will be the basis for all of their products.

4. They want to control the direction of their own OS.

chris37879 · on April 11, 2018

A lot of people speculate that that's why Flutter was written in Dart instead of Kotlin or something else. Google wanted to use a language they already have a lot of investment in, and for some reason didn't pick Go. Which honestly seems odd to me since Go can be compiled all the way down to binaries already, they had to invent that compiler for dart, but whatever. Dart is super cool and I'm looking forward to using it in Flutter's user space.

pjmlp · on April 12, 2018

Dart supports generics, has a good package manager and their devs are open to the history of programming language's design and modern tooling.

Jyaif · on April 13, 2018

Go isn't interpreted, making hot-reload impossible.

NiveaGeForce · on April 11, 2018

I like this description of Linux https://www.youtube.com/watch?v=rmsIZUuBoQs

monochromatic · on April 11, 2018

https://en.wikipedia.org/wiki/TempleOS

Problem solved!

mtgx · on April 11, 2018

It's still a shame they wrote something from scratch but they didn't do it in Rust, so they can avoid fixing a ton of bugs over the next 20-30 years.

steveklabnik · on April 11, 2018

The kernel was started before Rust 1.0, so I think that was a reasonable decision. Additionally, since it's a microkernel, the kernel is pretty small. That helps, both in the implementation now and if they ever decided to replace it. And future components can be in Rust if they want.

chris37879 · on April 11, 2018

Bonus though, I'm pretty sure the fuschia user space has rust support already. I think their alternative to vim (xi maybe? I think it was.) is written in rust natively with python api bindings for plugin support.

lambda · on April 12, 2018

Yes, xi (https://github.com/google/xi-editor) is a text editor written in Rust. It's not exactly an alternative to vim, it's more of a text editor microkernel that can have various different frontends and plugins attached which use JSON-based IPC to allow their implementation in whatever language you want. So, on macOS you can implement the frontend in Swift, and write plugins in Python or Ruby or Rust or whatever you like. On Fuchsia, the frontend is in Flutter.

lambda · on April 12, 2018

It looks like while the core kernel and a lot of components are written in C++, there are a number of components written in Rust: https://fuchsia.googlesource.com/garnet/+/master/Cargo.toml

One of the advantages of a microkernel architecture is that it is relatively easy for different components to be written in different languages.

klez · on April 11, 2018

Honestly curious: why Rust specifically and not any other "secure" systems language?

lambda · on April 12, 2018

Rust specifically because it has the zero-overhead safety properties via the borrow checker. This is something that no other safe language has, as far as I know. They generally either make you deal with a GC if you want safety, or deal with raw pointers and manual memory management if you want low overhead.

And the borrow checker, along with move semantics by default and the Send and Sync traits, help with several other aspects of safety as well; the Send and Sync traits encode information on what can safely be moved or shared between threads, and move semantics by default (and checked by the compiler, instead of at runtime as in C++), make it easier to encode state transitions on objects in ways that you can't try to perform operations on an invalid state.

But as others point out, Zircon, the Fuchsia kernel, was written before Rust 1.0 was released, and even after Rust 1.0 was released and stable it was still a bit rough to work with for a little while.

If you were starting a new project from scratch today, I'd seriously ask why not Rust, though of course there are other reasons why you might not choose it. But given the history, and how new Rust was at the time when this project was started, it makes a lot of sense.

greenhouse_gas · on April 12, 2018

>If you were starting a new project from scratch today, I'd seriously ask why not Rust, though of course there are other reasons why you might not choose it. But given the history, and how new Rust was at the time when this project was started, it makes a lot of sense.

While I'm not saying that rust is bad, Rust as a kernel language won't help you here. The simple proof is that Redox-os (which is a microkernel built in Rust) was found with the same class of bugs your average OS had.