> On the other hand, a new class of more dynamic and complex applications are also on the horizon, with an increasing demand for application constructs such as in-situ analysis, workflows, elaborate monitoring and performance tools. This complexity relies not only on rich features of POSIX, but also on the Linux APIs (such as the /proc, /sys filesystems, etc.) in particular.
If I were to design a new class of a thing to manage complexity efficiently today I would rather choose to rely on something Plan9-like or entirely new rather than the legacy "rich features of POSIX".
But you don't get the choice, given the applications. It's not clear to me how Plan9 would help with relevant problems; it was tried on Blue Gene at one stage. If I got the choice of something different, it would likely be capability-, microkernel-based, which might actually help with the jitter problem.
there is an L4 based effort for HPC at TU Dresden called FFMK. the reality is that for HPC you still need high-performance implementation of fast-path system calls (e.g., memory and process management) and the traditional micro-kernel architecture is not suitable for that. as for POSIX, you don't have a choice, people who write apps are used to POSIX and Linux.
Hi, I'm another person involved with McKernel from early on. The name actually originates from Many-core Kernel (intended for many-core CPUs). In addition, a clarification regarding standalone LWKs, mainly two reasons why they "failed": 1.) incompatibility with POSIX/Linux, 2.) lack of device drivers and thus their lack of applicability to different platforms. IBM's CNK (still running on the BG/Q) is a prime example. Plenty of POSIX calls are not supported and it only runs on IBM's hardware. The multi-kernel approach solves these issues by providing basically full Linux compatibility through service offloading and easy deployment thanks to direct reuse of Linux device drivers.
It's basically a few kernel modules and utilities that help with memory/cpu management (take cores offlines, reserve memory etc) and provide message passing from linux to whatever you want to run on the other codes with a queue messaging system (ikc) -- it's used by mckernel but should theorically work with anything else, would love to know if anyone else uses it (disclaimer: I'm working on ihk/mckernel this year)
I think the LWK concept itself is ok, but implementation-wise there need to be some auditing done. This is in a phase going from mostly being a research project to getting some attention and setup for production use on some HPC sites so I'm sure that'll get done in due time.
I doubt the number of possible privilege escalations would be much different from on your average compute node. There might actually be fewer if syscalls could be sanitized before reaching Linux.
I often hear the term “batteries included,” used to describe tech that comes with lots of features/stuff out of the box, but basically if something is heavyweight, what is advertised are its feautures, not its size.
If I were to design a new class of a thing to manage complexity efficiently today I would rather choose to rely on something Plan9-like or entirely new rather than the legacy "rich features of POSIX".