This is a great writeup, and reignites my interest in Java. (I've long considered "Java Concurrency in Practice" to be the _best_ Java book ever written.)
I haven't been able to figure out how the "unmount" of a virtual thread works. As stated in this article:
> Nearly all blocking points in the JDK have been adapted so that when encountering a blocking operation on a virtual thread, the virtual thread is unmounted from its carrier instead of blocking.
How would I implement this logic in my own libraries? The underlying JEP 425[0] doesn't seem to list any explicit APIs for that, but it does give other details not in the OP writeup.
> How would I implement this logic in my own libraries?
There's no need to if your code is in Java. We had to change low-level I/O in the JDK because it drops down to native.
That's not to say every Java library is virtual-thread-friendly. For one, there's the issue of pinning (see the JEP) that might require small changes (right now the problem is most common in JDBC drivers, but they're already working on addressing it). The bigger issue, mostly in low-level frameworks, is implicit assumptions about a small number of shared threads, whereas virtual threads are plentiful and are never pooled, so they're never shared. An example of such an issue is in Netty, where they allocate very large native buffers and cache them in ThreadLocals, which assumes that the number of threads is low, and that they're reused by lots of tasks.
> An example of such an issue is in Netty, where they allocate very large native buffers and cache them in ThreadLocals, which assumes that the number of threads is low, and that they're reused by lots of tasks.
Fixing Netty is very high yield. Every modern Java server application I'm aware of uses Netty. Quarkus, Vertx, Micronaut, Java-GRPC, ...
Then Graal? Virtual threads in Graal with a Netty that isn't 60MB would be superb.
What about just shimming Netty? Is that in Oracle's scope? There are already selectable backends for Netty. Why not have "virtualthread-graalcompatible" that uses your already fixed Java IO? It would reduce so much pain, and make Java competitive with golang for the first time ever.
GraalVM native images have recently gained support for virtual threads. So you can have AOT compiled fast starting binaries that use virtual threads, if you want (or very soon at least, I can't recall if it's out yet or not).
The main gap vs Go would then be the speed of the AOT compile. But you normally develop on HotSpot anyway.
Netty already works with Loom. There are people doing experiments with it where it shows some small performance gains even. They are incrementally improving it so it works better when Loomified, but it does work.
Graal also has a “fast build” mode, likely way slower still than go’s compilation, but there is that. It is meant for development though, you will likely want an optimized build for prod. But yeah, one should probably just develop in the traditional way, and then test it out in native after a few iterations.
Do you need Netty in a virtual thread world? Imo Netty made non blocking IO in Java tractable.. but virtual threads does it better and more broadly, so what role does netty play now? What other than thread efficiency does it bring that can’t be achieved more easily now?
Conversely, some applications would like a leaky abstraction they have some control over. Some caching will likely remain beneficial to link to a carrier thread.
As a member of the Cassandra community I’m super excited to get my hands on virtual threads come the next LTS (and Cassandra’s upgrade cycle), as it will permit us to solve many outstanding problems much more cheaply.
I hope by then we’ll also have facilities for controlling the scheduling of virtual threads on carrier threads. I would rather not wait another LTS cycle to be able to make proper use of them.
LTS is a designation by our sales organisation for arbitrarily chosen versions so they can offer a support service for legacy codebases -- i.e. people willing to pay for the privilege of not getting new features [1]. Why anyone would wait for something intended for the sole purpose of not adding new features to get a new feature --and so enjoying the very worst of both worlds -- is beyond me. The development organisation has no consideration of support offerings. All releases are equal, and the assumption is that those who want new features obviously do not want LTS and vice-versa.
Anyway, the mention of the perennially misunderstood Java LTS is a pet peeve of mine, so I'm sorry if this comment was overly aggressive.
[1]: There are many legacy applications that aren't actively developed. They have no use for new features, and new features sometimes requires changing configurations -- a hassle they don't have the people to do. So LTS is a subscription service that allows them to get releases without new features so they can keep running legacy apps without much maintenance. It's a great service, but obviously the opposite of what actively maintained codebases want; for them we have the regular upgrade model.
Well, it is not just Oracle that has adopted the LTS designations. AdoptOpenJDK and others are also selecting the same LTS versions to provide longer term support promises for, including security and other improvements.
A major project like Cassandra that is non-trivial to upgrade (but is desirable to upgrade, and to have security fixes for) simply cannot hop Java version every year and impose that additional burden on our users, and nor can we pick a Java version that is not guaranteed security updates past some near term horizon. So we pick versions that people are expected to have available them for the lifetime of that release in their environment.
Honestly I’m not sure what you’re upset about, I am a bit surprised at the vehemence of your response to that element of my comment. Also a little disappointed you didn’t engage with the rest of my comment; I hope that doesn’t mean I also end up disappointed with the near future of virtual threads.
A major project like Cassandra will find that it is easier to use the current version (before LTS existed, people had to upgrade to the six monthly feature releases, but because they didn't get a new version number people didn't care as much). If it does cause trouble, let us know, because LTS really isn't intended for actively maintained projects that want new features and isn't the recommended path for them. Just note that the free upgrade services called LTS are not quite the same; they just include backports from mainline and don't support the whole JDK.
Anyway, I'm sorry about my tone. I know that the change in the version numbering scheme confused people to pick the wrong upgrade path for themselves, and it's our fault for miscommunicating. But I don't know when features will land, or when those who want new features with an LTS service will be able to use them. But I can say that our process assumes that those who want long-term support are trying to avoid new features and are happier when a big feature misses the next release with LTS, so while missing one release normally means a mere 6 month delay, those who wait for LTS for actively developed codebases (even though it's due to a misunderstanding) might have to wait a further couple of years.
Well, whatever each of our perceptions about the utility of selecting an LTS, there are realities we all occupy - and LTS releases are a part of Cassandra's reality for the time being. Perhaps that will change in future, but I do not anticipate it very soon.
But, I will be pushing for the adoption of virtual threads once they become more useful for the community (which I think the previously mentioned improvements predicate). So, whatever the realities JEP425 operates within, I do hope these improvements land by Java 21, so that my job is made easier.
Either way, really excited about the work, whenever it transpires that we can use it. Thanks for your efforts delivering it so far.
The good work to slim down and better compartmentalize the JDK has historically created enough backward incompatibility risks for me that I prefer staying on the same version longer than 6 months. If I want security updates for the version I’m on LTS is the best (only?) way.
I think Java has never had better backward compatibility than now. The difficulties migrating to 9 were 1. due to 9 being the last major release ever, and 2. libraries that hacked JDK 8's internals and were not portable, so they broke in a big release. The overall upgrade costs now are also lower than ever before, and we know that because some companies do understand that using the current version is easier and cheaper than an old one. Having said that, if you want to stay on an old version for a long time, then yes, use one with LTS, but then you might as well upgrade very slowly (not every two years) because upgrades will be less pleasant.
I agree, Java compatibility is much better now. For Java 9 there were also runtime breaking changes like removal of javax classes and removal of javaFX.
>6 months is not really a long time for enterprise software.
But Java has always had semi-annual feature releases, and there wasn't even LTS -- people had to upgrade to a new feature release ever six months. It's just that we dropped major releases altogether and then gave the feature releases new version numbers, and that confused many people (who might not have even been aware that some of the minor releases in the past were actually quite significant feature releases). In other words, people upgraded to new feature releases every six months in the Java 7 and 8 era, too; now with major releases gone it's even easier, so it doesn't make sense that projects that were fine with such upgrades in the past all of a sudden need the new LTS model when things are even easier than before.
Would those intermediate releases make breaking runtime changes like dropping nashorn, removing APIs and changing default encoding modes?
That’d be pretty bad behavior when maintaining backward compatibility.
I know of a very large education company that trained their support staff to downgrade the Java 8 version of end users when they experienced problems (until they dropped Java on the front-end for web). Maybe the feature releases is why?
They're not "intermediate releases". In the past there were three kinds of releases, major (every few years), feature (aka "limited update", every six months), and patch (every quarter). Now there are two: feature and patch, with the feature releases getting the integer numbers now that major releases are gone. Oracle's sales arbitrarily selects some feature versions for which to offer an LTS service, and other companies follow their choice. BTW, they can choose to offer LTS even for releases that have already been made and retroactively make them "LTS releases." There's absolutely nothing special about them, and the development of the JDK ignores the availability of such offerings. We produce feature releases, and if someone wants to pick some of them to offer support services for longer durations than for other versions -- that's up to them.
Feature releases, now and before, sometimes made what you call "breaking runtime changes" that might require changing the command line. Actual breaking changes to APIs are rare, now as before (e.g. the last major release, 9, removed sound 6 methods, I think, and that was probably the biggest such change in Java's history, although the future degradation of the Security Manager is probably bigger). One difference between feature releases now and then is that, with major releases gone, feature releases can change the spec. This virtually always means adding new APIs.
> Maybe the feature releases is why?
Feature releases existed in Java 8, too, people just forget because they didn't get their own version number back when major releases existed. They were even less reliable back then. The biggest factor in Java compatibility issues is without a doubt libraries relying on JDK internals. That was less of a problem in the 6-7 era for the simple reason that Java stagnated due to lack of resources in Sun's last years. JDK 16 finally turned on strong encapsulation, so this problem is likely to recede.
I'm not saying that upgrading feature releases is risk-free, but it's always been that way, only people forgot or didn't notice so much with the old numbering scheme, and LTS wasn't available then at all. And it's also likely that the upgrades now are slightly more difficult, but in exchange there is no need for major upgrades ever again. For actively developed code, upgrading with every feature release is overall easier, cheaper and safer than staying on an old version, skipping updates, and doing a big transition every few years.
When the version number scheme changed and LTS was introduced, many companies got confused and stopped their practice of upgrading to new feature releases. At the same time, many don't understand what LTS is or that the free offerings don't actually maintain the full JDK, just backport fixes from mainline to the intersection of the existing features (e.g. Nashorn and CMS aren't getting maintained in the free "LTS" offerings).
Java Concurrency in Practice is a fantastic book. I had DL as a professor for about a half dozen courses in undergrad, including Concurrent and Parallel Programming. Absolutely fantastic professor, with a lot of insight into how parallel programming really works at the language level. One of the best courses I've taken.
Seems like a good development. I've been doing Node.js for last few years after letting go of Java. But there's something uneasy about async/await. For one thing it's difficult to debug how the async functions interact.
Debugging asynchronicity is complex in any language, no?
Blocking endpoints are mostly irrelevent because it alters the temporal flow of things, which makes your code executed in debug mode not 100% « isomorphic » with your code executed in run mode.
That's seamless though. So if you have a failure you get a single stack trace that includes everything. In JS the debuggers sometimes glue stack traces together which works for basic stuff but incurs a major runtime overhead and doesn't work for production failures.
Locally the concept of multi-threaded debugging is easier than async-await since a single flow typically maps to a single thread and you can just step over it. If something happens asynchronously it's just IO and you can ignore that part. As far as you're concerned it's just one thread that you're stepping over/into. Variable context, scope etc. are maintained the same and passed seamlessly.
I'm trying to understand what exactly is different about debugging async/await vs. debugging threads. Isn't making an async-call the same as starting a new thread, from the programmer's point of view?
In my environment the WebStorm debugger I can debug async-calls in which I have halted in the stack. But I can not inspect the variable-values in the earlier "trace" that started the async-call.
Is it just a matter of debugger capabilities or is there something that makes thread-based debugging fundamentally less confusing?
Ah maybe I get it. When starting an async-call to read a file for instance the value of variables is no longer available in the callback. Whereas in a (real) thread they are, because from my point of view the thread was simply "sleeping" while the IO was happening. When the IO is over I back in the same context except I now have the read file-contents available for my inspection.
So, reading a file in a thread-based system does NOT require you to start a new thread, whereas in async-await you essentially do have to create a new async-context (which is like a new thread) to read a file. No?
WebStorm is indeed amazing. It takes separate stack traces and glues them together which means it needs to keep old stack traces in memory then constantly compare the objects passed along to figure out which stack to glue where.
As I said, it's problematic for production. So in the IDE you can indeed see the glued stack but in the browser... Or on the server...
Then there's the context. Since glued stack traces could be in-theory separate threads (at least in Java async calls) you might get weird behaviors where values of objects across the stack can be wildly different.
And no. You don't have a separate thread doing the IO. That's exactly the idea Loom is solving. Javas traditional stream IO is thread based. But we have a faster underlying implementation in NIO which uses the native select calls. The idea here is that a thread receives a callback when the connection has data waiting and can wait for data. This means the system can wake up the thread as needed, very efficiently. So there's no additional thread.
Yes I think I got that. I was not saying that Java creates a new thread for every IO operation but that async/await in JavaScript etc. must do something like starting a new "pseudo-thread". And that is why debugging in Java is easier - because it doesn't need to start a new thread. That's what I was trying to understand. Thanks.
Debugging is hard regardless of the concurrency model but building an understanding of what the code is supposed to be doing is way easier when the code reads sequentially versus async.
As far as debugging changing the scheduling of the program, it's not so bad when the tooling evolves out of the concurrency model, which I imagine will happen once virtual threads catch on in java. For example, in erlang, you can trace processes on a running system by pid, and basically get a sequence diagram of execution with messages between processes and function calls within a process, as well as the actual terms themselves. Because execution doesn't pause, you can even do it in production (if you're careful...). So while it's not a traditional debugger, in the "pause execution here" sense, it's still a way to inspect the system that fits well into an actor model. If such a thing doesn't exist in java already, I'm sure it will soon.
Yes, and yes can become big, but you can scope it to certain processes, function calls, modules, pattern matches, etc, etc. So it's fine if you know what you're doing.
Legend has it that a major cell network was briefly taken offline due to a poorly thought out trace on a production system.
Does you library use any of the JDK's blocking APIs like Thread.sleep, Socket or FileInputStream directly or transitively? If so, it is already compatible. The only thing you should check is if you're using monitors for synchronization which are currently causing the carrier thread to get pinned. The recommendation is to use locks instead.
Yes but it only really matters if you're blocking on IO whilst inside a synchronized block. If you're using it to protect in-memory data structures then it's not a big deal.
Agree - Java Concurrency in Practice was a revelation when it came out. Well the whole concurrent api by Dr. Lea made it all so much saner. Very excited for virtual threads!
I don't know how they did it, but you could use that jep id as a query in the jdk issue tracker [0], and then use the issue tracker id to find the corresponding github issue [1]. (I had hoped for commits with that prefix, but there don't seem to be any for that issue.)
> I haven't been able to figure out how the "unmount" of a virtual thread works.
The native stack is just memory like any other, pointed to by the stack pointer. You can unmount one stack and mount another by changing the stack pointer. You can also do it by copying the stack out to a backing store, and copying the new thread's stack back in. I think the JVM does the latter, but not an expert.
I haven't been able to figure out how the "unmount" of a virtual thread works. As stated in this article:
> Nearly all blocking points in the JDK have been adapted so that when encountering a blocking operation on a virtual thread, the virtual thread is unmounted from its carrier instead of blocking.
How would I implement this logic in my own libraries? The underlying JEP 425[0] doesn't seem to list any explicit APIs for that, but it does give other details not in the OP writeup.
[0] https://openjdk.org/jeps/425