not do dismiss anything or what.. but how does this kind of multithreading (one ...

denotational · on May 21, 2023

Perhaps you’re thinking about SMT (e.g. Intel Hyper Threading) when you say “proper multithreading”?

I’m not sure it’s valid to say that only SMT is “proper multithreading”, especially since multithreading as a concept predates it by quite a way.

SMT has a quite a few performance issues since resources such as the L1, L2, and branch predictor are shared between the threads, which can lead to contention that hurts the performance of all the SMT threads sharing a physical core.

SMP is no less “proper”, and as core counts have increased significantly on commodity CPUs, the use of spinning threads bound to a single core each has become a common paradigm.

Oversubscription without SMT (i.e. many threads per core) is possible, but unless you have a workload where each thread is I/O bound with a substantial amount of time spent blocking, the overhead of scheduling and context switching means throughput will likely decrease.

mgaunard · on May 21, 2023

All SMT does is allow multiple instruction counters on the same superscalar core. It increases utilization of all compute units and therefore increases throughput.

Of course it increases latency, since those resources are not fully exclusive to a particular thread anymore.

Whether or not it's a good thing depends on what you care about. You could also argue that a good program would be able to saturate a single superscalar core with a single thread and thus wouldn't benefit from SMT at all, but I think that would be hard to guarantee in practice.

denotational · on May 21, 2023

I don’t disagree, I’m just trying to unpack what the GP might have meant by “proper multithreading”.

mgaunard · on May 21, 2023

Doing multiple threads per core is not "proper multithreading", since it's a well-known antipattern.

cachehit · on May 21, 2023

..it's not good because it's bad?

Why is it an anti pattern, this is news to me?

janhaa · on May 21, 2023

More threads on a single core equal more context switches, which reduces the effective amount of instructions you can process.

cachehit · on May 21, 2023

Well sure, but why would that make it "improper multithreading"? Is polymorphism based on vtables not "proper OOP"? We rely on many abstractions that aren't free in terms of CPU cycles because it makes development easier or less error prone.

And setting up, say, one thread per HTTP request will likely be negligible because blocking I/O is where time is spent anyways..

threeseed · on May 21, 2023

It's not improper it's just non-optimal.

And we have had non-blocking I/O for quite some time now.

mgaunard · on May 21, 2023

Any networking program doing blocking I/O is doing it wrong.

Your I/O should only be done synchronously if it's non-blocking.

Now for disk I/O, it's a more muddy thing, it's actually quite different from networking since it's more transparently managed by the operating system.

mgaunard · on May 21, 2023

Kernel threads do not scale, and neither does the scheduler.

Userland threads (or fibers, or stackful coroutines) do scale better though.

why_only_15 · on May 21, 2023

My limited understanding is that there's less cache thrashing (from multiple different workloads scheduled on the same core) and less scheduler overhead (from less overall threads).

denotational · on May 21, 2023

Just to add that scheduling overhead goes away with SMT (assuming you don’t oversubscribe), but the sharing of caches and branch prediction logic is still an issue as you point out.