I thought "distributed" would mean distributed over the network, which I could definitely make use of! (Or perhaps I should just use zookeeper or equivalent for that ;)
Anyways, in this case the term "distributed" is being used to describe a mechanism which reduces memory contention when go is utilizing multiple cores on one machine.
I'd love to see more exhaustive analysis of the performance implications for this technique across a wide variety of usage scenarios.
N00b question here but when people refer to mutex's, they are usually talking about multiple cores on a single machine right? Isn't there a different terminology for locks taken across the network in distributed systems?
I don't think I've ever heard of using the word mutex, namely, for synchronizing access to data over a network. Mutex kind of implies that there's the synchronization primitive and there's the data, and both can be touched separately but it's just an agreement that we touch the mutex first. However, network protocols usually just transfer updates to/from the data and then let the server do the synchronization and access pairing for each connection.
I don't think it would make sense at all to run a "mutex service" on one server and asking clients to grab that mutex before accessing another "data service" which would obviously allow any data updates by anyone without any synchronization at all.
¹ or just multiple tasks, in general. Even a single-core machine needs mutexed access to shared data because several tasks could try to concurrently use that memory location and the scheduler could switch tasks at a critical point. However, if strictly limited to single-core, and the system allows to, you can just disable interrupts while you're manipulating the data.
> I don't think it would make sense at all to run a "mutex service" on one server and asking clients to grab that mutex before accessing another "data service"
I first came across one in the context of a distributed cache which sat on top of a database. I'm not at all convinced it was a good design, but it was there!
No. There are usually talking about 2 threads accessing the same code.
Let's suppose we have an int, and want to add 2, we would do:
1. int i = sharedInt
2. i = i + 2
3. sharedInt = i
(that's what the compile actually does when you write sharedInt += 2).
If the runtime decides two stop the current thread on line 2 and lets the second one run all three lines, we would have a problem if I didn't use a mutex.
I don't know but I think so. I just wanted to give him an oversimplified example (you could also use any complicated computation).
I'm not sure, though, if every compiler/interpreter makes use of them (what about VMs)?
This is available only under certain architectures (x86 is one of them as you've noted), but I've also seen scenarios where the atomic increment instruction is actually slower than using a mutex. Don't consider this feature to be a magic bullet and always test your use cases!
As for the parent discussion, usually a mutex is talking about the threading construct while a "lock service" is how you'd refer to something like etcd or zookeeper.
Mutexes are typically implemented over atomic instructions. So you'd do something like atomic compare/exchange to acquire the mutex and if there's no contention you got it. If there is contention you go to the OSes synchronization constructs which are typically much slower... An atomic increment should always be faster than acquiring a mutex and incrementing...
So what happens when you find yourself running on a CPU that wasn't in your initial affinity mask?
Also, the "sleep for 1 ms" approach used in `init()` looks wrong -- if `sched_setaffinity()` doesn't guarantee that the calling task has been migrated to one of the target CPUs on return (which I suspect it does), I don't think sleeping for a millisecond is going to change anything.
You'll acquire CPU 0's lock (since a map lookup with an invalid key yields the zero value). I agree this isn't optimal when you change the affinity of a process after it has started. You could imagine a scheme where, if this happens, you create a new lock, but that would significantly complicate the scheme as you would now potentially need to take a read lock on the map in case it changes under you. It's annoying that CPUID values aren't guaranteed to be without holes, but that's what we're stuck with.
Yeah, the sleep is a leftover from an earlier version of the code that didn't use sched_setaffinity. I've remove it now.
Anyways, in this case the term "distributed" is being used to describe a mechanism which reduces memory contention when go is utilizing multiple cores on one machine.
I'd love to see more exhaustive analysis of the performance implications for this technique across a wide variety of usage scenarios.