I often find developers jump to async too early. Clearly a million native threads is too many, but on modern processors a couple thousand mostly idle (due to IO blocking) Java threads are fine (e.g. one per request in a typical middle tier). While Java's stack sizes are large, if a thread's stack isn't deep, most of that memory is never actually allocated.
What is that breaking point where developers should switch to more scalable models? From the article:
> There is a threshold beyond which the use of synchronous APIs just doesn’t scale
> on modern processors a couple thousand mostly idle (due to IO blocking) Java threads are fine
This “mostly idle” assumption is gonna break the moment you hit a load spike and wake up most of those “idle” threads.
The success of async is often (mis)attributed to eliminating unnecessary resources, but in practice it's actually because async model implicitly enforces an invisible job queue, where usually only a single process/thread per core is heavy lifting the actual work, and async I/O will feed processing pressure back to the load source via network (e.g. TCP ACK).
I'm not sure scheduling the work in your own process necessarily solves the load problem better? The described queue behaviour would happen in the OS in the threads scenario, and TCP would work the same too.
Context switching is pretty cheap, but it’s hard to beat free. Eventually the overhead of OS context switching is probably going to become non-negligible in most cases. I think up to a point the answer to when it matters is “that depends on how deep your pockets are.” You can keep scaling vertically and horizontally forever with more resources, obviously, but I don’t think it would take too long for the memory and CPU efficiency of async to start to win out, at least for runtimes with proper async.
Besides that, it’s pretty hard to decide you want to go async later on. In languages where async is not omnipresent, you usually get some form of explicit continuations; async/await sugar if you’re lucky. But you can always accidentally call something not async and hang, or just spend a lot of CPU time with no ability to preempt.
I think Erlang, Go and probably a couple of others basically stand alone in this regard; everything is just async by default. It’s a shame that it requires a decent amount of runtime baggage to accomplish this.
I also think people forget that Robinhood was made with Python, Reddit used Python, Github/Gitlab use Ruby. Stripe AKA payments that need to be processed fast within SLA's use Ruby.
Ruby and Python can only ever have one thread running at a time due to global interpreter lock. Though to keep in mind when I/O happens on a thread it can be parked and another thread can be picked up to continue processing. But in essence one thread is running at a time and these languages built billion dollar businesses.
Just scale horizontally computers are cheap is the message I get and remember to keep it simple.
But, it’s only efficient in convenient, shared-nothing type architectures. If you are dealing with longer lived, more stateful stuff (like a game server perhaps) it’s not going to be so easy.
Of course, it’s easy for fancy VC funded stuff to defer optimizations later. If you are working solo you are probably going to want to, counter intuitively maybe, spend more time thinking about how to keep things optimal and simple. There’s probably startups out there worth real money whose entire stack could run on a $20/mo DigitalOcean box if it were better optimized, but actually runs on a $1000/mo Amazon or GCP account. Of course there is a lot to be said about that (obviously, the latter set up should, if done well, be a lot more resilient and prepared to deal with scale) but it does go a long way to demonstrate why someone might care about efficient async; the scale of money you can burn when you are bootstrapping or doing solo hobby projects is a lot different from well-funded startup or established business...
Everything has tradeoffs. I do believe Go wins in context switching, but the GC latency and CPU usage has definitely caused some problems. The Go developers have done an impressive job with GC latency and performance, but it’s hard to be cheaper than free, so sometimes Rust is a better option, for example.
Python and Ruby are fine for embarrassingly parallel problems, like serving content from well-sharded users.
I'd be very surprised if Robinhood's trade matching engine is done in Python. I'd bet they use an off-the-shelf engine in C++/Java or another language with good multi-threading support.
> computers are cheap is the message I get and remember to keep it simple.
Fully agree! Getting to product-market fit is probably easier with python and ruby.
Latency can be a key consideration. Async style code can make it a lot easier to fire off a set of requests in parallel and wait for all the responses to come back (e.g. with https://docs.hhvm.com/hack/asynchronous-operations/concurren...), rather than the serial request waterfall a basic thread-per-request model will push you towards.
Java may be an exception, but on Windows in general, and in .NET, every thread has its stack preallocated and is generally 256KB to 4MB by default, so spawning “a couple thousand threads” as you suggest will immediately consume 250MB-4GB of main-memory which adds considerable memory-pressure. That’s not something to be taken lightly. By using (true) async IO you’ll likely only need 20-50MB total because it uses the threadpool for continuations which is sized according to the actual available level of concurrency.
Async NetworkStreams or Sockets in .NET is a well-designed API over traditional BSD sockets’ select() function which is how you scale to tens of thousands of concurrent connections using only as many threads as you have hardware DoP - it’s not that I’m wowed that .NET has modernised it, it’s that Java’s solution is to wallpaper over a gaping hole in their fundamental design instead.
I don't see how this is wallpapering over a hole in the design. Far from it. Java has supported async IO since Java 1.4 using APIs like java.nio.channels.AsynchronousChannel. That's a more or less conventional wrapper around epoll.
What's being introduced now is a way to use that API without needing to touch the whole concept of async code. From the developer's perspective blocking calls and threads is pretty nice. It's async that's awkward and hard to use. Making the existing threads facility scale much better is a very clean approach: virtual threads introduces "virtually" no new concepts at all, in fact, if you aren't working with native code via JNI or Panama then you can act as if virtual and physical threads are the same. You literally just upgrade Java and maybe flip a couple of switches and you can now assign one thread per connection yet handle millions of connections (assuming you don't run out of other resources of course).
java.nio.channels was introduced in Java SE 7, not 1.4. It also is far from a drop-in replacement for InputStream/OutputStream.
Even with Loom's green-threads (not the original green-threads) async IO in Java will be too otherworldly until Java's language designers cease their intrasigence about adding first-class support for `await`.
I don't see how you arrived at that conclusion at all. Async/await is always strictly harder to use than green threads. Can you explain precisely why you view them as "otherworldly"?
What you’re describing is just fake-async and that is discouraged in the ecosystem. I’m not aware of any major NuGet packages that use fake-async, and certainly nothing in the BCL.
What I am describing is what a large majority of .NET consulting projects do, because async/await taints the whole call stack up to Main() or event handlers, and almost no one is doing .NET Core projects from scratch.
Just because it is discouraged at conference talks and blog posts, doesn't mean devs aren't doing it on the trenches when dealing with the crossfire of project deadlines and massive codebases.
> Just because it is discouraged at conference talks and blog posts, doesn't mean devs aren't doing it on the trenches when dealing with the crossfire of project deadlines and massive codebases.
Oh, I'm sure of that, don't worry.
I'm just fortunate that I've never had to experience that myself.
if you get to a very high number of cooperative threads, then each such thread will have a very small stack (otherwise the OOM killer will get you). i am not sure that you can impose such limitations on the stack size in java.
Once you have an async API (of any kind), and have paid the cognitive and technical price of having one, you have a lot of flexibility about what you do on the implementation side - including thousands of threads.
If you start off with a synchronous API and thousands of threads, migrating to something else later is very expensive.
My app server has had non-blocking IO on regular OS threads for 10 years. To hide things in "virtual threads" does not do you any favours in the long run because it just adds complexity; you need to understand things as close to the metal as possible to make it work as good as it can work: http://github.com/tinspin/rupy
To not use async. is really bad, you are wasting so much CPU. Even in a async. scenario just copying memory from user space to kernel and back takes 30% of the CPU at full 100% utilization, imagine having all those context switches too!
Furthermore, if all of the events being pushed into your application are needing to be processed in a serialized fashion anyways (i.e. any financial transaction system), then you will find that a single thread is the fastest way to process those items. The moment you involve a lock, you lose the instruction-level parallelism game.
Threads are a huge mistake 9/10 times. I don't judge when the OS uses them, but for most of my applications I prefer to get everything into a nice ringbuffer so I can tear through tens or hundreds of millions of transactions per second with that single blessed core.
Most of this is meaningless if you use a database engine to serialize your transactions for you, but its still worth considering from an abstract perspective IMO.
More powerful. Java threads are objects with a standard API for managing them, and you can do things like await concurrent work without rolling your own command protocol using channel pairs and hoping some goroutine out there might still be answering. A JMX client can even show them in a GUI.
This makes the same concept radically cheaper by not involving the kernel, which is great because Java hasn't had green threads for a long time, and doing everything async in small worker pools worked but has admittedly been pretty painful.
Coroutines in Kotlin are only a compiler trick, and are stackless. Go and Java's are stackful, reifed: small stack chunks are moved in and out of the heap to the carrier stack.
This means you get an actual meaningful stacktrace when debugging, and not something stemming from a mysterious event-loop thread.
In Java you could save your coroutine state to disk, and wake it up later in theory.
----
EDIT: This being said, I'm 99% sure Kotlin is going to pass Loom's goodness onto their developers when it's available, probably reusing the existing coroutine API.
Kotlin is placing themselves into a corner by trying to go everywhere and married Android.
So for every JVM/Java feature post Java 6 that gets introduced, they will have the dilemma of how to integrate them into a way that keeps language semantics across compilation targets, having multiple solutions to the same problem (Kotlin's one and what each platform later introduced), or just expose them via KMM and leave the #ifdef burden to the community.
That is why platform languages always carry the trophy, even if they are the turtle most of the time.
Well, so far that hasn't been an issue. Kotlin is a pragmatic language. The differences that currently exist can be addressed just with an annotation here or there. Also, Kotlin's features are designed whilst paying careful attention to what the Java guys are doing. Look at how records have played out. You can use JVM records from Kotlin transparently, you can create them by just adding an annotation (not that there's much point in doing so, as records are mostly a labour saving device that Kotlin already had).
Value types are perhaps a better example. Kotlin has them already with nearly identical semantics to Valhalla, but without the ability for them to have more than one field due to the need for erasure. Once Valhalla arrives, Kotlin can simply remove that restriction when targeting the JVM, perhaps add another annotation or compiler flag to say "make this a real Java value type". No language changes needed beyond that.
Kotlin is semantically so close to Java already that they aren't really growing apart, they're growing together. It works well enough to justify its usage, for me.
Indeed, it just has the same role on the JVM as C on UNIX, JS on the browser,....
Just like those, it will slowly adopt whatever is more appealing from the guests and then carry on its merry way, while the other slowly lose their relevance while newcomers try yet again to challenge the place of the host language on the platform.
It sure is a dilemma. Java is catching up. When (if) Java introduces null-safety, Kotlin will loose much of its lustre, at least to me.
Virtual threads are superior to coroutines, which are still a pain in the ass in Kotlin, cause issues with mocking and you can't even evaluate them in the repl.
I do not, but as always this is the Java way. Being the last mover is how Java moves forward with features. They let other languages experiment first so they don't have to support a bad feature for eternity because of Java's backwards compatibility promises.
What is that breaking point where developers should switch to more scalable models? From the article:
> There is a threshold beyond which the use of synchronous APIs just doesn’t scale
What is a good rule of thumb these days?