I know the patch mentions interactive multimedia applications (games) in particu...

badamp · on July 31, 2019

Linux already has what you’re talking about with eventfd and epoll.

In Linux each thread can get an eventfd and you can POLLIN all of them.

In fact I would argue that using futexes is the “roll your own solution” using lower level primitives (and easier to fuckup) much more so than eventfd and epoll.

As mentioned somewhat poorly in the post, using futexes gives a performance boost which is not surprising since they are fast user mutexes. FWIW I didnt think windows events had a fast user space path but I may be mistaken.

For most worker pool scenarios you’re describing, the overhead of eventfd is probably in the noise.

pizlonator · on July 31, 2019

You’re talking about interfaces for waiting on multiple kernel resources but the new futex interface enables you to wait for multiple user resources.

Though it can emulate a win32 api for waiting on multiple “objects”, it’s strictly more powerful than WaitForMultiple if you are dealing with user objects since futexes impose very few constraints on how your user synchronization object is shaped and how it works.

So, the new interface is totally different from things like epoll. In one case the kernel is helping you wait for multiple user objects and in the other case it’s helping you wait for multiple kernel objects. The distinction is intentional because the whole point is that the user object that has the futex can be shaped however user likes, and can implement whatever synchro protocol the use likes.

Finally, it’s worth remembering that futex interfaces are all about letting you avoid going into kernel unless there is actually something to wait for. The best part of the api is that it helps you to avoid calling it. So for typical operations, if the resource being waited on can have its wait state represented as a 32-bit int in user memory, the futex based apis will be a lot faster.

Taniwha · on July 31, 2019

They point out that they already have an implementation that does just this .... and it fails on some programs due to running out of file descriptors (they have one program that needs ~1 million of them ...)

badamp · on July 31, 2019

If you read the full thread that is a bit of a red herring and beside the point (thats why I said the conveyance of the performance implication was poor)... indeed window WFMO only supports 64 objects per call. They mention that the fd issue is due to leaking objects in many windows programs..which was an odd mention and a little off the main subject. The main motivator is performance. If eventfds performed better it would likely be better to fix the fd leak issue with a cache.

Again.. eventfd and epoll covers the same use case as WFMO and EVENTs.

solipsism · on Aug 1, 2019

Curious, how would a cache fix the fd leak issue?

badamp · on Aug 1, 2019

Perhaps a better term would be “pool”. Anyway, what’s being leaked is “handles” or events not actually fds. You only actually need as many fds as the maximum possible number passed to a syscall. The mapping of handles/event objects in user space does not have to be 1:1 with the kernel resource.

mjevans · on July 31, 2019

I recall higher performance browsers also use up large numbers of FDs; I suspect it might be for this very reason.

Asooka · on July 31, 2019

Yes, and you have to cobble together an event implementation out of eventfd and epoll. There are two problems (specifically talking about multi-platform software)

1. You'll likely get it wrong and have subtle bugs.

2. This is significantly different than the Windows model where you wait on events. Now you have two classes of events - regular ones, and ones that can be waited on in multiple. The second class also comes with its own event manager class that needs to manage the eventfd for this group of events.

You end up with a specialised class of event that needs to be used whenever you need to wait in several of them at once. Then you realise you used a normal POSIX event somewhere else and now you want to wait on that as well, so you have to rewrite parts of your program to use your special multi-waitable event.

It's mostly trivial to write a event wrapper on top of POSIX events that behaves the same as Windows Events, except for the part where you might want to wait on multiple of them. I would expect that once this kernel interface is implemented we'll get GNU extensions in glibc allowing for waiting on multiple POSIX events. I absolutely do not want to roll my own thread synchronisation primitives except for very thin wrappers over the platform-specific primitives. Rolling your own synchronisation primitives is about as fraught with peril as rolling your own crypto.

To be honest, WaitForMultipleObjects will probably become not very useful in the near future. We're getting 32-core workstation CPUs today, it's quite likely there will be CPUs with more than 64 cores in near future workstations making it impossible to use this classic Windows primitive, but I suspect Microsoft will provide WaitForMultipleObjectsEx2.

wbl · on July 31, 2019

On Linux your workers would push the work onto a single output queue or could signal and condition variable pointed to by the work. I've never really felt the need for WaitForMultiple.