I can't help but wonder if the article of the author has seen Quasar: http://doc...

arielweisberg · on Sept 3, 2014

For me the caveat that calling a library or existing code can block the scheduler is a non-starter. I love what everyone is doing and I don't think it means they are useless, but for certain tasks and existing projects it means they are impractical to incorporate.

I think it's fair to say that what we are being offered is not ponies. It's a useful tool, but also a leaky abstraction and not what I want to be working with in the long run.

Also the statements about Erlang and loops is odd. My understanding is loops in Erlang are preemptible and that the VM bends over backwards to provide consistent scheduling even if it means losing some performance.

heavenlyhash · on Sept 3, 2014

It's true that Erlang has some capabilities of preemption, but now we're getting into an altogether more interesting range of details.

Erlang is still essentially cooperative and not preemptive, if I've understood my reading. That means the BEAM VM is doing something very similar to the style of instrumentation Quasar is doing: it injects yields into your code at points it thinks are reasonable. This is not quite the same thing as true preemptive scheduling as OS-native threads do. Quasar could do this kind of safepoint injection as well, though afaik that's not currently a feature.

Your definition of ponies and mine seems to diverge here, and that's fine :) I agree that true preemptibility is an even higher bar we can hold scheduling frameworks to. But it's also a very complicated area to get into, it's not completely without it's tradeoffs (full preemptibility pretty much gets us back to OS native threads, right? and there's very real performance reasons there's so much momentum away from that right now), and I also feel that I can get a lot done with green threads without these features. Maybe we'll see a growing swing towards safepoint injection for psuedopreemptibility -- I'm just making words up at this point, as far as I know; if there's a better existing terminology for these shades of grey I'd love a link -- in the coming years. I don't know where I place my bets on that, yet.

EDIT: this also appears to have been discussed before at https://news.ycombinator.com/item?id=7962838

pron · on Sept 3, 2014

Accidentally blocking the scheduler in Quasar is immediately detected and results in a warning with the exact stack trace of the offending operation. Also, that doesn't really "block the scheduler" but merely one of its threads. ForkJoin is more than capable dealing with occasional kernel threads blocking.

arielweisberg · on Sept 3, 2014

That's my point. I don't want a warning. I want it to just work ala Erlang. Injecting notifications to the threading framework that I might block is not ponies.

Out of curiousity how is Quasar detecting blocking?

To my knowledge ForkJoin doesn't detect blocking?

From the JDK 7 ForkJoin javadoc > However, no such adjustments are guaranteed in the face of blocked IO or other unmanaged synchronization

Sure the framework has extra threads and will work around it via work stealing, but you can get a lot of blocked threads at the worst possible time when you hit a correlated source of blocking.

You also lose thread affinity once work stealing kicks off.

Notifying a framework of potential blocking is certainly less nasty than what I do to work around blocking without one, but for an existing project of sufficient scope it's tough to transition.

pron · on Sept 3, 2014

AFAIK Erlang doesn't even warn you if you call blocking C code. The way Quasar does it is as follows: every time a fiber becomes runnable, it has a counter incremented. Every once in a while (I think 100ms) a special (kernel) thread goes over all FJ's worker threads and takes note of the fiber each is currently running and its counter (this requires some memory fences, but we take advantage of those already found in Quasar, so there's no added overhead). If it encounters the same fiber, with the same count twice, you've got a "runaway fiber", that's either blocking, or spinning too long. You can further examine the thread's state to see if it's blocked or not, to figure out which of the two things is happening.

Just to clarify: it's perfectly OK to call blocking code on Quasar fibers. In fact, it's encouraged. But the blocking call must be "fiber aware", and there's a project called Comsat, that takes many popular Java libraries and makes them fiber-blocking without changing their APIs.

This leads me to another point, which is time-slice based preemption of fibers. That's a feature Quasar had early in it's evolution, but has since been taken out (Quasar is preemptive, but doesn't offer time-slice scheduling). The reason is that time-slice scheduling is great when you have hundreds of threads running, but quite terrible when you have a million, because it means that the threads (lightweight or not) constantly compete for CPU cycles that the CPU just can't keep up with. In Java, plain threads are still available (with the same API as fibers, i.e. new Thread vs. new Fiber etc.), so for long-running computations, you're better off using a kernel thread; work-stealing scheduler aren't great at scheduling such tasks anyway. In Erlang, you don't have access to kernel threads, so time-slice scheduling is necessary to support the occasional heavy-computation process.

pron · on Sept 3, 2014

Author of Quasar here and apparently the target of the criticism in the article. It's kind of hard to make out the main claim the author has, but let me respond to the few more specific claims:

1. ForkJoin is not "notorious for its overhead". In fact, it is among the best implemented, best performing work stealing schedulers out there. Scheduling a task with ForkJoin takes a few nanos, and is almost as cheap as a plain method call. Don't take my word for it: go ahead and benchmark it.

2. Like Go, Quasar doesn't constrain the running code from mutating shared state -- if you're using Quasar from Java, that is. But it's still just as useful as Go, and when used from Clojure, it's even more flexible than Erlang, and actually quite safe.

3. My macbook isn't cruddy.

4. The stuff possible with Quasar, like running a plain Java RESTful service on fibers to gain a 4x increase in server capacity -- without changing the code and without even starting to parallelize the business logic with actor/CSP -- speaks for itself.

5. I'm not spreading FUD on threads -- you can watch my talk at JVMLS (linked in the article) to see my precise point: kernel threads cannot be used to model, one-to-one, domain concurrency, because the concurrency requirements of modern application (and the capabilities of modern hardware) exceed by several orders of magnitude the number of threads supported by the kernels. Fibers keep the (excellent) abstraction provided by threads as the unit of software concurrency, while making the implementation more suitable for modern soft-realtime workloads. When your average programmer can spawn up a (lightweight) thread without thinking about it -- say one for each request, and even many more, concurrency becomes a lot easier.

6. The linked Paul Tyma slide are completely irrelevant. I've got nothing against doing kernel-thread-blocking IO. The problem becomes writing simple, yet scalable code to process incoming requests. Modern hardware can support over a million open TCP sockets, but not nearly as many active kernel threads. Asynchronous libraries give you the scalability but fail on the simplicity requirement; fiber-blocking IO gives you both the performance and the simplicity of blocking code.

7. As to the "strawman benchmark" with "too many threads", the author is welcome to repeat the experiment using a thread pool with as few or as many treads as he'd like -- the result would be the same: switching kernel threads costs about 10-20us, while task-switching fibers costs 0.5us (and can be improved).

> few existing libraries understand what your application is trying to do with green threads

That's exactly the purpose of the Comsat project, which integrates existing third-party libraries with Quasar fibers. You're right, integrating "inverted" frameworks does require more work, but so far Comsat integrates, servlets, JAX-RS services and Dropwizard.

[1]: http://blog.paralleluniverse.co/2014/05/29/cascading-failure...

Blackthorn · on Sept 3, 2014

While you're here, can I just thank you for Quasar/Pulsar? Amazing piece of engineering work, every language deserves a threading library like that.

heavenlyhash · on Sept 3, 2014

> That's exactly the purpose of the Comsat project, which integrates existing third-party libraries with Quasar fibers.

Indeed! :) And Comsat is a hugely important part of Quasar's growing usability in real-world applications. (I'm using the servlet & JAX-RS code right now. So yeah, it's safe to say I'm thrilled about those integrations.)

Quasar also has great abstractions available if one needs to generate new bindings to any code which can currently produce callbacks: FiberAsync [1] is every bit as simple to use as the docs indicate.

But it is still slightly-more-than-none work required when dealing with hollywood frameworks that haven't already been adapted. It's totally manageable; at the same time, it's my personal hope in the long run we see more frameworks growing up that deal with green threading naturally.

[1] http://docs.paralleluniverse.co/quasar/javadoc/co/parallelun...

hrjet · on Sept 3, 2014

A tangential question:

Does any of this benefit a desktop app? I realize that most of the green-thread interest lies in asyc i/o and i/o bound workloads. But can a desktop app with a couple of dozen threads (i/o + cpu mix loads) gain something from Quasar?

heavenlyhash · on Sept 3, 2014

I'd say Yes.

A) Frankly, channels result in prettier, more maintainable code. I've seen enough questionable uses of LinkedBlockingQueue to last me a lifetime. Inability to so much as "close" a BlockingQueue in the face of multiple concurrent consumers is an unbelievable cramp -- it won't bother you until it does, but when it does, it's just a bellyflop-onto-concrete sort of sensation.

B) I'm even more pessimistic than Pron's sibling response about scalability of threads. A minecraft server with a even a few dozen concurrent players is starting to feel the limitations of naively scheduled threads, as an anecdote. Part of this comes down to the choices of concurrent data structures, how interaction with shared data strucutures is batch and the resolution of locks, the devil is in the details etc etc etc, but I'd venture that the abstractions with green threads and channels make good code a heck of a lot easier.

Truly trivial apps with one "compute" thread and one "UI" thread are unlikely to see serious performance gains. Similarly applications that have workloads that are highly parallel (say, somewhere around $num_cpus threads which exchange information only once every few hundred millions of cycles -- spitballing a bit, but for context I think the Doug Lea talk I linked in earlier (https://www.youtube.com/watch?v=sq0MX3fHkro) mentions thread unpark can take up to a million cycles in a worst-case scenario) are unlikely to see serious performance gains. So there are situations where green threading can't help you from a purely performance perspective, yes. But in practice, it's my observation that it's startling how quickly "simple" apps end up doing enough concurrent UI or network operations that naive threading starts getting unpleasant.

pron · on Sept 3, 2014

Quasar shines when there's a lot of inherent concurrency in the problem domain. This concurrency mostly arises on servers where you have many concurrent requests, but it's also common in simulations/games. If your domain has no inherent (large scale) concurrency, then the OS is more than capable (very) efficiently handling dozens or even hundreds of threads.

sgrove · on Sept 3, 2014

Quasar is an amazing piece of engineering work, and I really don't understand how pron has been able to do so much in such a relatively short period of time (including insights into the ideas and implementation).

jeremyjh · on Sept 3, 2014

Erlang loops will not block the OS thread because the BEAM VM implements preemptive multi-tasking at the (green) process level.

felixgallo · on Sept 3, 2014

Not quite. BEAM implements a cooperative multitasking environment using reductions that are checked at function calls. In practice you will rarely write code that runs forever without calling a function or returning, but it's possible, especially if you use NIFs.

This will generally cause scheduler collapse and result in all sorts of weird problems, so in the most recent version of the BEAM, there's 'dirty scheduler' support so that you can work around that problem if you have native code that runs for a long time (> 1 ms).

A good primer on all of this is http://jlouisramblings.blogspot.com/2013/01/how-erlang-does-....

kylequest · on Sept 3, 2014

> I can't help but wonder if the article of the author has seen Quasar

Yes, given that the post links to Quasar code :-) Take a look at what the "bytecode weaving" link points to...

voidfunc · on Sept 3, 2014

He links to quasar in the article...

heavenlyhash · on Sept 3, 2014

And yet I don't understand why the article is FUD'ing about it!

> hopefully without altering its meaning

...What?

I'm using Quasar right now. It works as advertised. Bytecode manipulation tools are not some scary unexplored arena of JVM tooling.