Hacker News new | past | comments | ask | show | jobs | submit login

not do dismiss anything or what..

but how does this kind of multithreading (one thread per core) is better than proper multithreading (many threads per core)?




Perhaps you’re thinking about SMT (e.g. Intel Hyper Threading) when you say “proper multithreading”?

I’m not sure it’s valid to say that only SMT is “proper multithreading”, especially since multithreading as a concept predates it by quite a way.

SMT has a quite a few performance issues since resources such as the L1, L2, and branch predictor are shared between the threads, which can lead to contention that hurts the performance of all the SMT threads sharing a physical core.

SMP is no less “proper”, and as core counts have increased significantly on commodity CPUs, the use of spinning threads bound to a single core each has become a common paradigm.

Oversubscription without SMT (i.e. many threads per core) is possible, but unless you have a workload where each thread is I/O bound with a substantial amount of time spent blocking, the overhead of scheduling and context switching means throughput will likely decrease.


All SMT does is allow multiple instruction counters on the same superscalar core. It increases utilization of all compute units and therefore increases throughput.

Of course it increases latency, since those resources are not fully exclusive to a particular thread anymore.

Whether or not it's a good thing depends on what you care about. You could also argue that a good program would be able to saturate a single superscalar core with a single thread and thus wouldn't benefit from SMT at all, but I think that would be hard to guarantee in practice.


I don’t disagree, I’m just trying to unpack what the GP might have meant by “proper multithreading”.


Doing multiple threads per core is not "proper multithreading", since it's a well-known antipattern.


..it's not good because it's bad?

Why is it an anti pattern, this is news to me?


More threads on a single core equal more context switches, which reduces the effective amount of instructions you can process.


Well sure, but why would that make it "improper multithreading"? Is polymorphism based on vtables not "proper OOP"? We rely on many abstractions that aren't free in terms of CPU cycles because it makes development easier or less error prone.

And setting up, say, one thread per HTTP request will likely be negligible because blocking I/O is where time is spent anyways..


It's not improper it's just non-optimal.

And we have had non-blocking I/O for quite some time now.


Any networking program doing blocking I/O is doing it wrong.

Your I/O should only be done synchronously if it's non-blocking.

Now for disk I/O, it's a more muddy thing, it's actually quite different from networking since it's more transparently managed by the operating system.


Kernel threads do not scale, and neither does the scheduler.

Userland threads (or fibers, or stackful coroutines) do scale better though.


My limited understanding is that there's less cache thrashing (from multiple different workloads scheduled on the same core) and less scheduler overhead (from less overall threads).


Just to add that scheduling overhead goes away with SMT (assuming you don’t oversubscribe), but the sharing of caches and branch prediction logic is still an issue as you point out.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: