Fuchsia is a microkernel architecture, so I think it being "more efficient" gene...

guidoism · on April 11, 2018

"More efficient" in terms of running LINPACK, maybe not. But the raw throughput of highly numeric scientific calculations isn't the goal of all architectures, even though we pretend it is.

It's possible to be more efficient at showing a bit of text and graphics which is what mobile phones do a lot more of than raw number crunching, except for games, except for games of course.

hackcasual · on April 12, 2018

LINPACK would probably run equivalently. Anything that just needs the CPU will work about the same. It's overhead like networking/disk/display where microkernels lose out. Not saying that's overall a reason not to use, as the tradeoffs in terms of isolation/simplicity/security are an area very much worth investigating.

zzzcpan · on April 12, 2018

For networking and disk io monolithic kernel has to be pretty much completely bypassed already if you want high performance, see netmap/vale architecture for example.

Not sure about display though, but don't expect monolithic kernel to help here somehow either.

zaarn · on April 12, 2018

Userspace implementations of various protocols usually suffer from various problems, most notoriously that applications can't share an interface (how would you if both try to write ethernet frames at the same time?) and lackluster performance in low-throughput scenarios (high throughput != low latency and high packet throughput != high bandwidth througput)

GPUs don't have much security at all, there is lots of DMA or mapped memory. Though on most modern monolithic kernels a lot of this work is either in modules (AMDGPU on Linux is usually a module not compiled into the kernel) or even userspace (AMDGPU-Pro in this case). Mesa probably also counts.

Microkernels aren't the ideal kernel design. Monolithic isn't either. I put most of my bets on either Modular Kernels if CPUs can get more granual security (MILL CPU looks promising) or Hybrid Kernels like NT where some stuff runs in Ring 0 where it's beneficial and the res in userspace.

zzzcpan · on April 12, 2018

> that applications can't share an interface

Of course they can share an interface, I even pointed out the vale switch as an example of this [1]. And it is very fast.

The thing is isolation and granularity that microkernels happen to have force certain design and implementation choices that benefit both performance and security on modern systems. And monolithic kernels while theoretically can be as fast and as secure actually discourage good designs.

[1] http://www.openvswitch.org/support/ovscon2014/18/1630-ovs-ri...

zaarn · on April 13, 2018

It doesn't look like netmap is actual raw access to the interface like I mentioned.

I also severely doubt that microkernels encourage efficient design. I'll give you secure but it's not inherent to microkernels either (NT is a microkernel, somewhat, and has had lots of vuln's over the years, the difference between microkernels and monolithic or hybrids like NT is that most microkernels don't have enough exposure to even get a sensible comparison going)

IMO microkernels encourage inefficient designs as everything becomes IPC and all device drivers need to switch ring when they need to do something sensitive (like writing to an IO Port unless the kernel punches holes into ring 0 that definitely don't encourage security).

Monolithic kernels don't necessarily encourage security but definitely efficiency/performance. A kernel like Linux doesn't have to switch priv ring to do DMA to the harddisk and it can perform tasks entirely in one privilege level (esp. with Meltdown, switching ring is a performance sensitive operation unless you punch holes into security).

I don't think monolithic kernels encourage bad design. I think they are what people intuitively do when they write a kernel. Most of them then converge into hybrid or modular designs which offer the advantages of microkernels without the drawbacks.

zzzcpan · on April 13, 2018

You are assuming that switching priv ring is a bottleneck, which it isn't. The cost of the switch is constant and is easily amortizable, no matter the amount of stuff you have to process.

zaarn · on April 16, 2018

The cost of a switch is non-zero. For IPC you need to switch out the process running in the CPU, for Syscalls to drivers a microkernel will have to switch into priv ring, then out, wait for the driver, then back in and back out, as it switches context.

A monolithic, hybrid or modular kernel can significantly reduce this overhead while still being able to employ the same methods to amortize the cost that exists.

A microkernel is by nature incapable of being more efficient than a monolithic kernel. That is true as long as switching processes or going into priv has a non-zero cost.

The easy escape hatch is to allow a microkernel to run processes in priv ring and in the kernel address so the kernel doesn't have to switch out any page tables or switch privs any more than necessary while retaining the ability to somewhat control and isolate the module (with some PT trickery you can prevent the module from corrupting memory due to bugs or malware)

hackcasual · on April 12, 2018

A microkernel gives programs less direct access to hardware.

gnufx · on April 11, 2018

The reason a microkernel wouldn't be more efficient is that the OS is irrelevant for the (rather useless) LINPACK benchmark. However, I want a microkernel system and capabilities for HPC. The microkernel-ish system I used in the '80s for physics was pretty fast.

lallysingh · on April 11, 2018

Or may be better at splitting up work across cores. Less lock/cache contention, better logical separation, etc.

bad_user · on April 11, 2018

No, it won’t. This is not the user land you’re talking about and in general the idea that multiple, isolated processes can do better on the same CPU, versus a monolithic process that does shared memory concurrency is ... a myth ;-)

lallysingh · on April 11, 2018

For throughput, separate processes on separate cores with loose synchronisation will do better than a monolith. You don't want to share memory, you want to hand it off to different stages of work.

Consider showing a webpage. You have a network stack, a graphics driver, and the threads of the actual browser process itself. It's substantially easier to about bottlenecking through one or more locks (for, say an open file table, or path lookup, etc) when the parts of the pipeline are more separated than a monolithic kernel.

bad_user · on April 12, 2018

“Handing off” via sharing memory is much more efficient than copying.

Lock-free concurrency is also achievable.

Again, this isn’t the user land we’re talking about, in the sense that the kernel is expected to be highly optimized.

Granted, a multi process architecture does have virtues, like stability and security. But performance is not one of them.

lallysingh · on April 12, 2018

Handing off means to stop using it and letting someone else use it. Only copy in rare cases.

Lock free concurrency is typically via spinning and retrying, suboptimal when you have real contention. It's better not to contend.

Kernel code isn't magic, its performance is dominated by cache just like user space.

High performance applications get the kernel out of the way because it slows things down.

bad_user · on April 12, 2018

> Lock free concurrency is typically via spinning and retrying, suboptimal when you have real contention.

Lock free concurrency is typically done by distributing the contention between multiple memory locations / actors, being wait free for the happy path at least. The simple compare-and-set schemes have limited utility.

Also actual lock implementations at the very least start by spinning and retrying, falling back to a scheme where the threads get put to sleep after a number of failed retries. More advanced schemes that do "optimistic locking" are available, for the cases in which you have no contention, but those have decreased performance in contention scenarios.

> Handing off means to stop using it and letting someone else use it. Only copy in rare cases.

You can't just let "someone else use it", because blocks of memory are usually managed by a single process. Transferring control of a block of memory to another process is a recipe for disaster.

Of course there are copy on write schemes, but note that they are managed by the kernel and they don't work in the presence of garbage collectors or more complicated memory pools, in essence the problem being that if you're not in charge of a memory location for its entire lifetime, then you can't optimize the access to it.

In other words, if you want to share data between processes, you have to stream it. And if those processes have to cooperate, then data has to be streamed via pipes.

> High performance applications get the kernel out of the way because it slows things down.

Not because the kernel itself is slow, but because system calls are. System calls are expensive because they lead to context switches, thrashing caches and introducing latency due to blocking on I/O. So the performance of the kernel has nothing to do with it.

You know what else introduces unnecessary context switches? Having multiple processes running in parallel, because in the context of a single process making use of multiple threads you can introduce scheduling schemes (aka cooperative multi-threading) that are optimal for your process.

zzzcpan · on April 12, 2018

System calls are not the reason the kernel is bypassed. The cost of the system calls is fixable. For example it is possible to batch them together into a single system call at the end of the event loop iteration or even share a ring buffer with the kernel and talk to the kernel the same way high performance apps talks to the nic. But the problem is that the kernel itself doesn't have high performance architecture, subsystems, drivers, io stacks, etc., so you can't get far using it and there is no point investing time into it. And it is this way, because monolithic kernel doesn't push developers into designing architecture and subsystems that talk to each other purely asynchronously with batching, instead crappy shared memory designs are adopted as they feel easier to monolithic developers, while in fact being both harder and slower to everyone.

1pfdthrow · on April 11, 2018

"better" meaning what exactly? Are you talking about running a database with high throughput, recording audio with low latency, or computing pi?

lallysingh · on April 12, 2018

And even on the latency side, you just want the kernel out of the damn way.

bad_user · on April 12, 2018

Given the topic we’re discussing, I don’t know what you’re talking about.

__d · on April 11, 2018

macOS (and iOS, tvOS, watchOS, etcOS) are built on a microkernel too (Mach).

It's not an automatic security win.

larkost · on April 11, 2018

You are mixing things up a little bit. Darwin (the underlying kernel layer of MacOS X and the rest) is actually a hybrid between a microkernel and a regular kernel. There is a microkernel there, but much of the services layered on top of it are done as a single kernel. All of that operating within one memory space. So some of the benifits from a pure microkenel are lost, but a whole lot of speed is gained.

So from a security standpoint MacOS X is mostly in the kenel camp, not the microkernel one.

bitmapbrother · on April 11, 2018

According to Wikipedia - the XNU kernel for Darwin, the basis of macOS, iOS, watchOS, and tvOS is not a microkernel.

The project at Carnegie Mellon ran from 1985 to 1994, ending with Mach 3.0, which is a true microkernel. Mach was developed as a replacement for the kernel in the BSD version of Unix, so no new operating system would have to be designed around it. Experimental research on Mach appears to have ended, although Mach and its derivatives exist within a number of commercial operating systems. These include all using the XNU operating system kernel which incorporates an earlier, non-microkernel, Mach as a major component. The Mach virtual memory management system was also adopted in 4.4BSD by the BSD developers at CSRG,[2] and appears in modern BSD-derived Unix systems, such as FreeBSD.