This is the problem I'm working on. Independent threads are as independently fas...

samsquire · on Dec 25, 2022

I did some tweaking with my benchmark.

I picked 11 threads due to my CPU having 12 logical cores.

With 100 threads and incrementing 10 at a time, the LockBenchmark achieves 3,482,053 requests per second.

With 100 threads the lock free benchmark achieves 17,754,858 requests per second, 10 messages sending per synchronization event at a time.

In other words, the lock free algorithm scales better.

LockBenchmark code https://github.com/samsquire/multiversion-concurrency-contro...

Lock free actor2 algorithm https://github.com/samsquire/multiversion-concurrency-contro...

zelphirkalt · on Dec 25, 2022

I think this is to be expected. If you have a lock anywhere, the risk is, that the lock will become the bottleneck at some point when scaling to more and more concurrently running processes wanting to acquire the lock.

withinboredom · on Dec 25, 2022

Why merge them at all? Can a lazy merge increase performance by the fact you are usually iterating over them and at that point most work will be done by the iteration and not by the merger. That gives you some time to work out which thread/machine to pull from next while the main thread works on the result of the last iteration (ie, preempt the iteratie). You could even buffer/stream them from each thread much faster than any code could do real work.

In my experience, lock-free solutions tend to not do well when threads share a physical core (vs virtual cores), but ymmv.

srcreigh · on Dec 25, 2022

How'd you pick 11 threads? How do the algorithms scale as you change # of threads? T=2, 10, 100, 1000, 10k

samsquire · on Dec 25, 2022

https://news.ycombinator.com/item?id=34126789

I replied to your comment here.