Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I can't help but wonder if the article of the author has seen Quasar: http://docs.paralleluniverse.co/quasar/ It seems to be a concrete refutation of his claims of impossibility. Quasar successfully brings green threads to the jvm. It includes both channels as are now popularized by golang, as well as higher level patterns like actors. Despite the young nature of the framework, benchmarks show it comparing reasonably well with both golang and erlang. Quasar also provides libraries that step all the way up into the OTP realms of supervision trees (though I haven't myself used this, yet).

The article mentions bytecode weaving, but dismisses it with very wavery justifications. Bytecode manipulation tools are a successful part of the jvm ecosystem. Frankly, they're part of why I consider the jvm ecosystem so successful: bytecode manipulation has allowed things like:

- third party tree-shakers/minifiers and obfuscators (i.e. proguard)

- cross compilers (i.e. robovm)

- concurrency libraries that DO have real green threads and continuations (i.e. Kilim, Quasar, and others)

- code coverage and complexity analysis tooling (i.e. jacoco)

- scala

- clojure

- groovy

- kotlin

- [... more languages ...]

There are two critical points about the above:

- All of these tools were built without direct cooperation with the compiler and core tool chain. That means experimentation and growth were possible from the community.

- Everyone's tools play nice with each other! You can use Quasar as a library in Clojure and then feed that bytecode into Proguard for minification, and then add code coverage instrumentation, and then feed it into Robovm!

Given the wild success of bytecode and bytecode manipulators, I have no idea how the article can so whimsically poo-poo the entire field.

(Yes, I'm well aware Erlang has a VM that allows alternate languages as well. And yes, Elixer is pretty. OT, no, I won't be making investments of my time into Elixer, because I like strong compile-time type systems, and Elixer doesn't have that.)

It is true that even in the presence of a full greenthreading tool like Quasar, code can call legacy APIs that still block a full thread, but this is not sufficient cause to dismiss the possibilities. To quote back part of the article, blocking will always be an issue in any cooperatively multitasked environment: "There’s no real way to limit what that code can do, unless it is explicitly disallowed from [...] looping." And yet I wouldn't claim Erlang fails to give me concurrency just because it still allows loops! Part of the compromise of cooperative multitasking is the very premise that in exchange for the higher performance possible from cooperative code, yes, poorly written code can suck up arbitrary amounts of CPU before yielding. If this were a practical concern, it would also be entirely possible for a bytecode instrumenting library to inject cooperative rescheduling points even into loops; and yet I have no real desire to see this feature.

Furthermore, I strongly object to the claim ForkJoin is "notorious for its overhead". All thread synchronization is notorious for its overhead. That's completely known to any programmer with experience in this area, and in no way unique to ForkJoin.

For an excellent, in-depth coverage of what exactly ForkJoin is and the problems it solves for you, see https://www.youtube.com/watch?v=sq0MX3fHkro . I highly recommend watching the entire thing despite its length, and even if you are not a JVM programmer -- even if you've been doing concurrent programmer for years, you will almost certainly walk away knowing significantly more about concurrent scheduling from the (relatively) high levels of memory fencing all the way down to CPU architecture choices and their impacts.

I'm not going to claim there are no issues with something like Quasar. In particular, I find that it is harder to operate in an ecosystem where very few existing libraries understand what your application is trying to do with green threads. Mostly, this doesn't phase me if my application is calling out to other libraries, because I control the scheduling one step above them (just like I would in a plainer actor framework without green threads like Akka). The problem is more with "hollywood" style frameworks -- the "don't call me, I'll call you" type -- so far it feels like these are very hard to use when your application is using green threading, but the calling framework has no clue about it. Some sort of interfacing code is required and usually has thread handoffs of its own, which can be moderately unpleasant, and limits your scalability at that juncture. But this is a present-tense bummer, and can be solved by patching (or outright replacing) these hollywood frameworks, or simply avoiding frameworks of that kind altogether.

But in short, I still think it's a bit unreasonable to dismiss the existence of ponies.



For me the caveat that calling a library or existing code can block the scheduler is a non-starter. I love what everyone is doing and I don't think it means they are useless, but for certain tasks and existing projects it means they are impractical to incorporate.

I think it's fair to say that what we are being offered is not ponies. It's a useful tool, but also a leaky abstraction and not what I want to be working with in the long run.

Also the statements about Erlang and loops is odd. My understanding is loops in Erlang are preemptible and that the VM bends over backwards to provide consistent scheduling even if it means losing some performance.


It's true that Erlang has some capabilities of preemption, but now we're getting into an altogether more interesting range of details.

Erlang is still essentially cooperative and not preemptive, if I've understood my reading. That means the BEAM VM is doing something very similar to the style of instrumentation Quasar is doing: it injects yields into your code at points it thinks are reasonable. This is not quite the same thing as true preemptive scheduling as OS-native threads do. Quasar could do this kind of safepoint injection as well, though afaik that's not currently a feature.

Your definition of ponies and mine seems to diverge here, and that's fine :) I agree that true preemptibility is an even higher bar we can hold scheduling frameworks to. But it's also a very complicated area to get into, it's not completely without it's tradeoffs (full preemptibility pretty much gets us back to OS native threads, right? and there's very real performance reasons there's so much momentum away from that right now), and I also feel that I can get a lot done with green threads without these features. Maybe we'll see a growing swing towards safepoint injection for psuedopreemptibility -- I'm just making words up at this point, as far as I know; if there's a better existing terminology for these shades of grey I'd love a link -- in the coming years. I don't know where I place my bets on that, yet.

EDIT: this also appears to have been discussed before at https://news.ycombinator.com/item?id=7962838


Accidentally blocking the scheduler in Quasar is immediately detected and results in a warning with the exact stack trace of the offending operation. Also, that doesn't really "block the scheduler" but merely one of its threads. ForkJoin is more than capable dealing with occasional kernel threads blocking.


That's my point. I don't want a warning. I want it to just work ala Erlang. Injecting notifications to the threading framework that I might block is not ponies.

Out of curiousity how is Quasar detecting blocking?

To my knowledge ForkJoin doesn't detect blocking?

From the JDK 7 ForkJoin javadoc > However, no such adjustments are guaranteed in the face of blocked IO or other unmanaged synchronization

Sure the framework has extra threads and will work around it via work stealing, but you can get a lot of blocked threads at the worst possible time when you hit a correlated source of blocking.

You also lose thread affinity once work stealing kicks off.

Notifying a framework of potential blocking is certainly less nasty than what I do to work around blocking without one, but for an existing project of sufficient scope it's tough to transition.


AFAIK Erlang doesn't even warn you if you call blocking C code. The way Quasar does it is as follows: every time a fiber becomes runnable, it has a counter incremented. Every once in a while (I think 100ms) a special (kernel) thread goes over all FJ's worker threads and takes note of the fiber each is currently running and its counter (this requires some memory fences, but we take advantage of those already found in Quasar, so there's no added overhead). If it encounters the same fiber, with the same count twice, you've got a "runaway fiber", that's either blocking, or spinning too long. You can further examine the thread's state to see if it's blocked or not, to figure out which of the two things is happening.

Just to clarify: it's perfectly OK to call blocking code on Quasar fibers. In fact, it's encouraged. But the blocking call must be "fiber aware", and there's a project called Comsat, that takes many popular Java libraries and makes them fiber-blocking without changing their APIs.

This leads me to another point, which is time-slice based preemption of fibers. That's a feature Quasar had early in it's evolution, but has since been taken out (Quasar is preemptive, but doesn't offer time-slice scheduling). The reason is that time-slice scheduling is great when you have hundreds of threads running, but quite terrible when you have a million, because it means that the threads (lightweight or not) constantly compete for CPU cycles that the CPU just can't keep up with. In Java, plain threads are still available (with the same API as fibers, i.e. new Thread vs. new Fiber etc.), so for long-running computations, you're better off using a kernel thread; work-stealing scheduler aren't great at scheduling such tasks anyway. In Erlang, you don't have access to kernel threads, so time-slice scheduling is necessary to support the occasional heavy-computation process.


Author of Quasar here and apparently the target of the criticism in the article. It's kind of hard to make out the main claim the author has, but let me respond to the few more specific claims:

1. ForkJoin is not "notorious for its overhead". In fact, it is among the best implemented, best performing work stealing schedulers out there. Scheduling a task with ForkJoin takes a few nanos, and is almost as cheap as a plain method call. Don't take my word for it: go ahead and benchmark it.

2. Like Go, Quasar doesn't constrain the running code from mutating shared state -- if you're using Quasar from Java, that is. But it's still just as useful as Go, and when used from Clojure, it's even more flexible than Erlang, and actually quite safe.

3. My macbook isn't cruddy.

4. The stuff possible with Quasar, like running a plain Java RESTful service on fibers to gain a 4x increase in server capacity -- without changing the code and without even starting to parallelize the business logic with actor/CSP -- speaks for itself.

5. I'm not spreading FUD on threads -- you can watch my talk at JVMLS (linked in the article) to see my precise point: kernel threads cannot be used to model, one-to-one, domain concurrency, because the concurrency requirements of modern application (and the capabilities of modern hardware) exceed by several orders of magnitude the number of threads supported by the kernels. Fibers keep the (excellent) abstraction provided by threads as the unit of software concurrency, while making the implementation more suitable for modern soft-realtime workloads. When your average programmer can spawn up a (lightweight) thread without thinking about it -- say one for each request, and even many more, concurrency becomes a lot easier.

6. The linked Paul Tyma slide are completely irrelevant. I've got nothing against doing kernel-thread-blocking IO. The problem becomes writing simple, yet scalable code to process incoming requests. Modern hardware can support over a million open TCP sockets, but not nearly as many active kernel threads. Asynchronous libraries give you the scalability but fail on the simplicity requirement; fiber-blocking IO gives you both the performance and the simplicity of blocking code.

7. As to the "strawman benchmark" with "too many threads", the author is welcome to repeat the experiment using a thread pool with as few or as many treads as he'd like -- the result would be the same: switching kernel threads costs about 10-20us, while task-switching fibers costs 0.5us (and can be improved).

> few existing libraries understand what your application is trying to do with green threads

That's exactly the purpose of the Comsat project, which integrates existing third-party libraries with Quasar fibers. You're right, integrating "inverted" frameworks does require more work, but so far Comsat integrates, servlets, JAX-RS services and Dropwizard.

[1]: http://blog.paralleluniverse.co/2014/05/29/cascading-failure...


While you're here, can I just thank you for Quasar/Pulsar? Amazing piece of engineering work, every language deserves a threading library like that.


> That's exactly the purpose of the Comsat project, which integrates existing third-party libraries with Quasar fibers.

Indeed! :) And Comsat is a hugely important part of Quasar's growing usability in real-world applications. (I'm using the servlet & JAX-RS code right now. So yeah, it's safe to say I'm thrilled about those integrations.)

Quasar also has great abstractions available if one needs to generate new bindings to any code which can currently produce callbacks: FiberAsync [1] is every bit as simple to use as the docs indicate.

But it is still slightly-more-than-none work required when dealing with hollywood frameworks that haven't already been adapted. It's totally manageable; at the same time, it's my personal hope in the long run we see more frameworks growing up that deal with green threading naturally.

[1] http://docs.paralleluniverse.co/quasar/javadoc/co/parallelun...


A tangential question:

Does any of this benefit a desktop app? I realize that most of the green-thread interest lies in asyc i/o and i/o bound workloads. But can a desktop app with a couple of dozen threads (i/o + cpu mix loads) gain something from Quasar?


I'd say Yes.

A) Frankly, channels result in prettier, more maintainable code. I've seen enough questionable uses of LinkedBlockingQueue to last me a lifetime. Inability to so much as "close" a BlockingQueue in the face of multiple concurrent consumers is an unbelievable cramp -- it won't bother you until it does, but when it does, it's just a bellyflop-onto-concrete sort of sensation.

B) I'm even more pessimistic than Pron's sibling response about scalability of threads. A minecraft server with a even a few dozen concurrent players is starting to feel the limitations of naively scheduled threads, as an anecdote. Part of this comes down to the choices of concurrent data structures, how interaction with shared data strucutures is batch and the resolution of locks, the devil is in the details etc etc etc, but I'd venture that the abstractions with green threads and channels make good code a heck of a lot easier.

Truly trivial apps with one "compute" thread and one "UI" thread are unlikely to see serious performance gains. Similarly applications that have workloads that are highly parallel (say, somewhere around $num_cpus threads which exchange information only once every few hundred millions of cycles -- spitballing a bit, but for context I think the Doug Lea talk I linked in earlier (https://www.youtube.com/watch?v=sq0MX3fHkro) mentions thread unpark can take up to a million cycles in a worst-case scenario) are unlikely to see serious performance gains. So there are situations where green threading can't help you from a purely performance perspective, yes. But in practice, it's my observation that it's startling how quickly "simple" apps end up doing enough concurrent UI or network operations that naive threading starts getting unpleasant.


Quasar shines when there's a lot of inherent concurrency in the problem domain. This concurrency mostly arises on servers where you have many concurrent requests, but it's also common in simulations/games. If your domain has no inherent (large scale) concurrency, then the OS is more than capable (very) efficiently handling dozens or even hundreds of threads.


Quasar is an amazing piece of engineering work, and I really don't understand how pron has been able to do so much in such a relatively short period of time (including insights into the ideas and implementation).


Erlang loops will not block the OS thread because the BEAM VM implements preemptive multi-tasking at the (green) process level.


Not quite. BEAM implements a cooperative multitasking environment using reductions that are checked at function calls. In practice you will rarely write code that runs forever without calling a function or returning, but it's possible, especially if you use NIFs.

This will generally cause scheduler collapse and result in all sorts of weird problems, so in the most recent version of the BEAM, there's 'dirty scheduler' support so that you can work around that problem if you have native code that runs for a long time (> 1 ms).

A good primer on all of this is http://jlouisramblings.blogspot.com/2013/01/how-erlang-does-....


> I can't help but wonder if the article of the author has seen Quasar

Yes, given that the post links to Quasar code :-) Take a look at what the "bytecode weaving" link points to...


He links to quasar in the article...


And yet I don't understand why the article is FUD'ing about it!

> hopefully without altering its meaning

...What?

I'm using Quasar right now. It works as advertised. Bytecode manipulation tools are not some scary unexplored arena of JVM tooling.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: