Hacker News new | past | comments | ask | show | jobs | submit login

> Lock free concurrency is typically via spinning and retrying, suboptimal when you have real contention.

Lock free concurrency is typically done by distributing the contention between multiple memory locations / actors, being wait free for the happy path at least. The simple compare-and-set schemes have limited utility.

Also actual lock implementations at the very least start by spinning and retrying, falling back to a scheme where the threads get put to sleep after a number of failed retries. More advanced schemes that do "optimistic locking" are available, for the cases in which you have no contention, but those have decreased performance in contention scenarios.

> Handing off means to stop using it and letting someone else use it. Only copy in rare cases.

You can't just let "someone else use it", because blocks of memory are usually managed by a single process. Transferring control of a block of memory to another process is a recipe for disaster.

Of course there are copy on write schemes, but note that they are managed by the kernel and they don't work in the presence of garbage collectors or more complicated memory pools, in essence the problem being that if you're not in charge of a memory location for its entire lifetime, then you can't optimize the access to it.

In other words, if you want to share data between processes, you have to stream it. And if those processes have to cooperate, then data has to be streamed via pipes.

> High performance applications get the kernel out of the way because it slows things down.

Not because the kernel itself is slow, but because system calls are. System calls are expensive because they lead to context switches, thrashing caches and introducing latency due to blocking on I/O. So the performance of the kernel has nothing to do with it.

You know what else introduces unnecessary context switches? Having multiple processes running in parallel, because in the context of a single process making use of multiple threads you can introduce scheduling schemes (aka cooperative multi-threading) that are optimal for your process.




System calls are not the reason the kernel is bypassed. The cost of the system calls is fixable. For example it is possible to batch them together into a single system call at the end of the event loop iteration or even share a ring buffer with the kernel and talk to the kernel the same way high performance apps talks to the nic. But the problem is that the kernel itself doesn't have high performance architecture, subsystems, drivers, io stacks, etc., so you can't get far using it and there is no point investing time into it. And it is this way, because monolithic kernel doesn't push developers into designing architecture and subsystems that talk to each other purely asynchronously with batching, instead crappy shared memory designs are adopted as they feel easier to monolithic developers, while in fact being both harder and slower to everyone.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: