Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To the best of my knowledge, seL4 is not AVX-512 aware. The AVX-512 state is not saved or restored on a context switch, which is clearly going to impact efficiency.

At present there's 16x64-bits of register state saved (128B), but if we were to have full support for the vectors, you need to potentially add 32x512-bits to the state (plus another 16 GP registers when APX arrives). Total state that needs moving from registers to memory/cache would jump to 2304B - a 1800% increase.

Give that memory is still the bottleneck, and cache is of limited size, a full context switch is going to have a big hit - and the main issue with microkernels is that a service you want to communicate with lives in another thread and address space. You have: app->kernel->service->kernel->app for a round-trip. If both app and service use the AVX-512 registers then you're going to have to save/restore more than half a page of CPU state up to 4 times, compared with up to 2 on the monolithic kernel which just does app->kernel->app.

The amount of CPU state only seems to be growing, and microkernels pay twice the cost.



The cost of moving 4 KB is miniscule. Assume a anemic basic desktop with a 2 GHz clock and 2 instructions per clock. You would be able to issue 4 billion 8-byte stores per second resulting 32 GB/s or 32 bytes per nanosecond. Memory bandwidth is going to be on the order of 40-60 GB/s for a basic desktop, so you will not run into memory bandwidth bottlenecks with the aforementioned instruction sequence. So, 4 KB of extra stores is a grand total of 128 additional nanoseconds.

In comparison, the average context switch time of Linux is already on the order of ~1,000 nanoseconds [1]. We can also see additional confirmation of the store overhead I computed as that page measures ~3 us for a 64 KB memcpy (load + store) which would be ~187 nanoseconds for 4 KB (load + store) where as the context saving operation is a 2 KB register -> store and a 2 KB load -> register.

So, your system call overhead only increases by 10 percentage points. Assuming you have a reasonably designed system that does not kill itself with system call overhead, spending the majority of the time actually handling the service request, then it constitutes a miniscule performance cost. For example, if you spend 90% of the time executing the service code, with only 10% in the actual overhead, then you only incur a 1% performance hit.

[1] https://eli.thegreenplace.net/2018/measuring-context-switchi...


I agree that a well-designed system for most use cases won't have performance issues, since we should not just be optimizing context switches but also things like kernel bypass mechanisms, mechanisms like io_uring, and various application-guided policies that will reduce context switches. Context switches are always a problem (the essential complexity of having granular protection), and moving an extra 4KB is not negligible depending on the workload, but we are not out of options. It will take more programmer effort, is all.


I appreciate your criticism; it is reasonable and relevant in the present and the future. I wrote a reply nearby in the thread[0]. I think it's mostly solveable, but I agree it's not trivial.

[0] https://news.ycombinator.com/item?id=43456628


>AVX-512 and other x86 warts

While seL4 runs on x86, it is not recommended.

Effort seems to be focused on ARM and RISC-V, particularly the latter.

An argument could be made that x86's baggage makes it a particularly bad fit for offering the sort of guarantees seL4 does.


High-assurance security requires kernels clear or overwrite all shared state. That could become a covert, storage channel if one partition can write it and another read it. If so, it should be overwritten.


I’m not sure that seL4 even supports NUMA, so there are other tradeoffs to consider.


I think NUMA management is high level enough that in a microkernel it would be comfortably managed in userspace, unlike things relevant to performance-critical context switches. And seL4 is currently intended only for individual cores anyways.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: