Networking I/O with Virtual Threads – Under the hood

ec109685 · on July 8, 2021

I often find developers jump to async too early. Clearly a million native threads is too many, but on modern processors a couple thousand mostly idle (due to IO blocking) Java threads are fine (e.g. one per request in a typical middle tier). While Java's stack sizes are large, if a thread's stack isn't deep, most of that memory is never actually allocated.

What is that breaking point where developers should switch to more scalable models? From the article:

> There is a threshold beyond which the use of synchronous APIs just doesn’t scale

What is a good rule of thumb these days?

riobard · on July 8, 2021

> on modern processors a couple thousand mostly idle (due to IO blocking) Java threads are fine

This “mostly idle” assumption is gonna break the moment you hit a load spike and wake up most of those “idle” threads.

The success of async is often (mis)attributed to eliminating unnecessary resources, but in practice it's actually because async model implicitly enforces an invisible job queue, where usually only a single process/thread per core is heavy lifting the actual work, and async I/O will feed processing pressure back to the load source via network (e.g. TCP ACK).

fulafel · on July 8, 2021

I'm not sure scheduling the work in your own process necessarily solves the load problem better? The described queue behaviour would happen in the OS in the threads scenario, and TCP would work the same too.

riobard · on July 8, 2021

It's not about which way is better, but which default is more useful in practice.

jchw · on July 8, 2021

Context switching is pretty cheap, but it’s hard to beat free. Eventually the overhead of OS context switching is probably going to become non-negligible in most cases. I think up to a point the answer to when it matters is “that depends on how deep your pockets are.” You can keep scaling vertically and horizontally forever with more resources, obviously, but I don’t think it would take too long for the memory and CPU efficiency of async to start to win out, at least for runtimes with proper async.

Besides that, it’s pretty hard to decide you want to go async later on. In languages where async is not omnipresent, you usually get some form of explicit continuations; async/await sugar if you’re lucky. But you can always accidentally call something not async and hang, or just spend a lot of CPU time with no ability to preempt.

I think Erlang, Go and probably a couple of others basically stand alone in this regard; everything is just async by default. It’s a shame that it requires a decent amount of runtime baggage to accomplish this.

bestinterest · on July 8, 2021

I also think people forget that Robinhood was made with Python, Reddit used Python, Github/Gitlab use Ruby. Stripe AKA payments that need to be processed fast within SLA's use Ruby.

Ruby and Python can only ever have one thread running at a time due to global interpreter lock. Though to keep in mind when I/O happens on a thread it can be parked and another thread can be picked up to continue processing. But in essence one thread is running at a time and these languages built billion dollar businesses.

Just scale horizontally computers are cheap is the message I get and remember to keep it simple.

jchw · on July 8, 2021

Horizontal scaling works!

But, it’s only efficient in convenient, shared-nothing type architectures. If you are dealing with longer lived, more stateful stuff (like a game server perhaps) it’s not going to be so easy.

Of course, it’s easy for fancy VC funded stuff to defer optimizations later. If you are working solo you are probably going to want to, counter intuitively maybe, spend more time thinking about how to keep things optimal and simple. There’s probably startups out there worth real money whose entire stack could run on a $20/mo DigitalOcean box if it were better optimized, but actually runs on a $1000/mo Amazon or GCP account. Of course there is a lot to be said about that (obviously, the latter set up should, if done well, be a lot more resilient and prepared to deal with scale) but it does go a long way to demonstrate why someone might care about efficient async; the scale of money you can burn when you are bootstrapping or doing solo hobby projects is a lot different from well-funded startup or established business...

Everything has tradeoffs. I do believe Go wins in context switching, but the GC latency and CPU usage has definitely caused some problems. The Go developers have done an impressive job with GC latency and performance, but it’s hard to be cheaper than free, so sometimes Rust is a better option, for example.

BenoitP · on July 8, 2021

Python and Ruby are fine for embarrassingly parallel problems, like serving content from well-sharded users.

I'd be very surprised if Robinhood's trade matching engine is done in Python. I'd bet they use an off-the-shelf engine in C++/Java or another language with good multi-threading support.

> computers are cheap is the message I get and remember to keep it simple.

Fully agree! Getting to product-market fit is probably easier with python and ruby.

engie · on July 8, 2021

Latency can be a key consideration. Async style code can make it a lot easier to fire off a set of requests in parallel and wait for all the responses to come back (e.g. with https://docs.hhvm.com/hack/asynchronous-operations/concurren...), rather than the serial request waterfall a basic thread-per-request model will push you towards.

DaiPlusPlus · on July 8, 2021

Java may be an exception, but on Windows in general, and in .NET, every thread has its stack preallocated and is generally 256KB to 4MB by default, so spawning “a couple thousand threads” as you suggest will immediately consume 250MB-4GB of main-memory which adds considerable memory-pressure. That’s not something to be taken lightly. By using (true) async IO you’ll likely only need 20-50MB total because it uses the threadpool for continuations which is sized according to the actual available level of concurrency.

Async NetworkStreams or Sockets in .NET is a well-designed API over traditional BSD sockets’ select() function which is how you scale to tens of thousands of concurrent connections using only as many threads as you have hardware DoP - it’s not that I’m wowed that .NET has modernised it, it’s that Java’s solution is to wallpaper over a gaping hole in their fundamental design instead.

native_samples · on July 8, 2021

I don't see how this is wallpapering over a hole in the design. Far from it. Java has supported async IO since Java 1.4 using APIs like java.nio.channels.AsynchronousChannel. That's a more or less conventional wrapper around epoll.

What's being introduced now is a way to use that API without needing to touch the whole concept of async code. From the developer's perspective blocking calls and threads is pretty nice. It's async that's awkward and hard to use. Making the existing threads facility scale much better is a very clean approach: virtual threads introduces "virtually" no new concepts at all, in fact, if you aren't working with native code via JNI or Panama then you can act as if virtual and physical threads are the same. You literally just upgrade Java and maybe flip a couple of switches and you can now assign one thread per connection yet handle millions of connections (assuming you don't run out of other resources of course).

DaiPlusPlus · on July 8, 2021

java.nio.channels was introduced in Java SE 7, not 1.4. It also is far from a drop-in replacement for InputStream/OutputStream.

Even with Loom's green-threads (not the original green-threads) async IO in Java will be too otherworldly until Java's language designers cease their intrasigence about adding first-class support for `await`.

native_samples · on July 9, 2021

I don't see how you arrived at that conclusion at all. Async/await is always strictly harder to use than green threads. Can you explain precisely why you view them as "otherworldly"?

DaiPlusPlus · on July 9, 2021

"Async/await is always strictly harder to use than green threads"

I'm sorry but I can't take you seriously after you say something like that.

pjmlp · on July 8, 2021

Tainting the whole call is major problem in .NET code.

That is why you will find out lots of code bases full of Task.Run() and direct calls to .Result, instead of using async/await operators.

Having worked with both platform since their yearly days, I am not sure which design is actually the best one, considering brown field development.

DaiPlusPlus · on July 8, 2021

What you’re describing is just fake-async and that is discouraged in the ecosystem. I’m not aware of any major NuGet packages that use fake-async, and certainly nothing in the BCL.

pjmlp · on July 8, 2021

Not at all.

What I am describing is what a large majority of .NET consulting projects do, because async/await taints the whole call stack up to Main() or event handlers, and almost no one is doing .NET Core projects from scratch.

Just because it is discouraged at conference talks and blog posts, doesn't mean devs aren't doing it on the trenches when dealing with the crossfire of project deadlines and massive codebases.

DaiPlusPlus · on July 8, 2021

> Just because it is discouraged at conference talks and blog posts, doesn't mean devs aren't doing it on the trenches when dealing with the crossfire of project deadlines and massive codebases.

Oh, I'm sure of that, don't worry.

I'm just fortunate that I've never had to experience that myself.

ec109685 · on July 8, 2021

Unless the stack is used, the memory of those stack threads is never resident, so it doesn’t create memory pressure.

DaiPlusPlus · on July 8, 2021

That depends on how the memory-pages of the stack-space of the threads is managed. That could still result in memory-fragmentation and disk-paging.

ec109685 · on July 9, 2021

At least the last time I tested on Java, resident memory << virtual memory as idle thread count increased.

fulafel · on July 8, 2021

A million threads is a lot, but the reasonable limit is more like 100k idle threads than a few thousand, on 64-bit Linux.

MichaelMoser123 · on July 8, 2021

if you get to a very high number of cooperative threads, then each such thread will have a very small stack (otherwise the OOM killer will get you). i am not sure that you can impose such limitations on the stack size in java.

fulafel · on July 13, 2021

If you are short on memory, sure. 100k times 10 mb for example is just 1 TB, an order of magnitude or two less than what bigger x86 servers take.

nly · on July 8, 2021

Once you have an async API (of any kind), and have paid the cognitive and technical price of having one, you have a lot of flexibility about what you do on the implementation side - including thousands of threads.

If you start off with a synchronous API and thousands of threads, migrating to something else later is very expensive.

grandinj · on July 8, 2021

i run up to about 500 threads on my Java app with ease, with about 1/3 of them pumping significant data (think stereo streaming audio type data).

I started writing some fancy asynch code, but on checking with a profiler, the thread overhead was pretty negligable, so I left it alone.

dom96 · on July 8, 2021

I often find that developers jump for threads too early. Concurrency is much easier to reason about than parallelism.

signa11 · on July 8, 2021

using thread scheduling for packet scheduling goes only so far.

bullen · on July 8, 2021

My app server has had non-blocking IO on regular OS threads for 10 years. To hide things in "virtual threads" does not do you any favours in the long run because it just adds complexity; you need to understand things as close to the metal as possible to make it work as good as it can work: http://github.com/tinspin/rupy

To not use async. is really bad, you are wasting so much CPU. Even in a async. scenario just copying memory from user space to kernel and back takes 30% of the CPU at full 100% utilization, imagine having all those context switches too!

bob1029 · on July 8, 2021

Furthermore, if all of the events being pushed into your application are needing to be processed in a serialized fashion anyways (i.e. any financial transaction system), then you will find that a single thread is the fastest way to process those items. The moment you involve a lock, you lose the instruction-level parallelism game.

Threads are a huge mistake 9/10 times. I don't judge when the OS uses them, but for most of my applications I prefer to get everything into a nice ringbuffer so I can tear through tens or hundreds of millions of transactions per second with that single blessed core.

Most of this is meaningless if you use a database engine to serialize your transactions for you, but its still worth considering from an abstract perspective IMO.

n_f · on July 8, 2021

This seems like goroutines for java?

erik_seaberg · on July 8, 2021

More powerful. Java threads are objects with a standard API for managing them, and you can do things like await concurrent work without rolling your own command protocol using channel pairs and hoping some goroutine out there might still be answering. A JMX client can even show them in a GUI.

This makes the same concept radically cheaper by not involving the kernel, which is great because Java hasn't had green threads for a long time, and doing everything async in small worker pools worked but has admittedly been pretty painful.

vbsteven · on July 8, 2021

That’s the idea, just like coroutines in Kotlin.

BenoitP · on July 8, 2021

Coroutines in Kotlin are only a compiler trick, and are stackless. Go and Java's are stackful, reifed: small stack chunks are moved in and out of the heap to the carrier stack.

This means you get an actual meaningful stacktrace when debugging, and not something stemming from a mysterious event-loop thread.

In Java you could save your coroutine state to disk, and wake it up later in theory.

----

EDIT: This being said, I'm 99% sure Kotlin is going to pass Loom's goodness onto their developers when it's available, probably reusing the existing coroutine API.

pjmlp · on July 8, 2021

Kotlin is placing themselves into a corner by trying to go everywhere and married Android.

So for every JVM/Java feature post Java 6 that gets introduced, they will have the dilemma of how to integrate them into a way that keeps language semantics across compilation targets, having multiple solutions to the same problem (Kotlin's one and what each platform later introduced), or just expose them via KMM and leave the #ifdef burden to the community.

That is why platform languages always carry the trophy, even if they are the turtle most of the time.

native_samples · on July 8, 2021

Well, so far that hasn't been an issue. Kotlin is a pragmatic language. The differences that currently exist can be addressed just with an annotation here or there. Also, Kotlin's features are designed whilst paying careful attention to what the Java guys are doing. Look at how records have played out. You can use JVM records from Kotlin transparently, you can create them by just adding an annotation (not that there's much point in doing so, as records are mostly a labour saving device that Kotlin already had).

Value types are perhaps a better example. Kotlin has them already with nearly identical semantics to Valhalla, but without the ability for them to have more than one field due to the need for erasure. Once Valhalla arrives, Kotlin can simply remove that restriction when targeting the JVM, perhaps add another annotation or compiler flag to say "make this a real Java value type". No language changes needed beyond that.

Kotlin is semantically so close to Java already that they aren't really growing apart, they're growing together. It works well enough to justify its usage, for me.

pjmlp · on July 8, 2021

JVM history is full of pragmatic guest languages.

native_samples · on July 8, 2021

Yes, and it's richer for it. Java is definitely not the last word in language design!

pjmlp · on July 8, 2021

Indeed, it just has the same role on the JVM as C on UNIX, JS on the browser,....

Just like those, it will slowly adopt whatever is more appealing from the guests and then carry on its merry way, while the other slowly lose their relevance while newcomers try yet again to challenge the place of the host language on the platform.

MrBuddyCasino · on July 8, 2021

It sure is a dilemma. Java is catching up. When (if) Java introduces null-safety, Kotlin will loose much of its lustre, at least to me.

Virtual threads are superior to coroutines, which are still a pain in the ass in Kotlin, cause issues with mocking and you can't even evaluate them in the repl.

pjmlp · on July 8, 2021

Kotlin's null safety is only a selling point when one uses 100% libraries written in Kotlin.

PMD, SonarQube and nullable annotations have long sorted that problem in our Java projects.

MrBuddyCasino · on July 8, 2021

> PMD, SonarQube and nullable annotations

Not a fan of any of those tools, tbh. Still better than nothing I suppose, but I'd rather use Kotlin in the meantime.

vips7L · on July 8, 2021

They're talking about it and waiting to see if `?` pays off in C# and Kotlin before moving on it.

pjmlp · on July 8, 2021

From C# point of view, I have hardly seen anyone use it outside conference talks about new C# features.

Right now there are 20 years of having null as default and turning it on just generates endless warnings.

So it is left for complete new projects where almost no third party libraries are being used, which right now is very little.

I also hope that C# 10 doesn't get !! for checked parameters, but it might already be too late.

MrBuddyCasino · on July 8, 2021

Interesting, got any links?

vips7L · on July 9, 2021

I do not, but as always this is the Java way. Being the last mover is how Java moves forward with features. They let other languages experiment first so they don't have to support a bad feature for eternity because of Java's backwards compatibility promises.

outloudvi · on July 8, 2021

Just saying - I'm amazed that .java is a TLD.

venamresm__ · on July 8, 2021

Very well written.