Hacker News new | past | comments | ask | show | jobs | submit login
Optimising for Concurrency: Comparing the BEAM and JVM virtual machines (erlang-solutions.com)
185 points by francescoc on May 13, 2020 | hide | past | favorite | 106 comments



the author leaves out Kotlin which adds support for coroutines on the language level and still compiles to java bytecode. These are not classic continuations because they cannot be cancelled, but they're still very useful and true fibers.

There's also the Quasar library that adds fiber support to existing Java projects, but its mostly unmaintained since the maintainers were pulled in to work on Project Loom.

Then there's Project Loom, an active branch of of OpenJDK with language support for continuations and a fiber threading model. The prototype is done and they're in the optimization phase. I expect fibers to land in the Java spec somewhere around JDK 17.

I figure its fair to mention these as the authors criticisms are somewhat valid but will not be for very long (few years max?)

In summary: Java will have true fiber support "soon". This will invalidate the arguments for Erlang concurrency model. They are already outdated if you are okay using mixed java/kotlin coroutines or Quasar library

The newer Java GC's Shenandoah and ZGC address authors criticisms of pause times. They already exist, are free, and are in stable releases. Dare I say they are almost certainly better than Erlang's GC. They are truly state of the art, arguably far superior to the GC's used in Go, .NET, etc. Pause times are ~10 milliseconds at 99.5%ile latency for multi terabyte heaps, with average pause times well below 1 millisecond. No other GC'ed language comes close to my knowledge. His points 1 and 2 no longer exist with these collectors. You don't need 2X memory for the copy phase and the collectors quickly return unused memory to the OS. This has been the case for several years.

Hot code reloading. JVM supports this extensively and its used all the time. Look into ByteBuddy, CGLIB, ASM, Spring AOP if you want to know more. Java also supports code generation at build time using Annotation Processors. This is also extensively used/abused to get rid of language cruft


> This will invalidate the arguments for Erlang concurrency model.

What about failure domains? As far as I'm concerned, this is the strongest reason for actor-based concurrency. I can design my architecture so that groups of processes that need to die together die together. And it's usually one or two lines of code, if any.

Here's a real life example. I have a process that maintains an SSH connection to a host machine, and that ssh connection is used to query information about running VMs on that host machine. If the SSH connection dies, it kills the process that is tracking the host machine, which in turn kills the processes tracking the associated VMs, without perturbing any of the other hosts' processes or vms. This triggers the host process to be restarted by a supervisor, which then creates a new SSH connection to query for information (possibly repopulating VM processes for tracking information). All of this I wrote zero lines of code for (which, importantly, means I made no mistakes), just one or two configuration options. More importantly, the system doesn't get stuck in an undefined state where complex query failures can cause logjams in the running system.


You can tie the fates of threads together in Java using thread groups. If you need more flexibility, or want it to be managed for you, Akka framework offers this. I believe Akka gives you a model very similar to Erlang.

In Java you would create a thread pool and configure it to restart the threads if they die. Each thread would wake up every so often to query SSH and dump their results into a queue. If the query threads die, the processes reading the queue at the other end have nothing to do so they won't execute. Its easy to make a consumer queue that executes some code on another thread whenever data arrives.

Java's exposure of the underlying OS threads and cheap transfer of data between threads lets people build libraries on top that offer memory models used by Erlang and others. Its not built in or quite as convenient, but you can use actors and fibers in Java if you want to.


Yeah that's exactly the problem. It's an afterthought in the system. How certain can you be that the system you're using is compostable with any other code brought in to your system, even from libraries outside? In erlang, failure domains are the raison d'etre of the language, so everything in the ecosystem will play nice.

Ultimately, systems like akka are extremely complicated to get right, even for experts, because you have to think about all of the vm bits underneath. I can (and have) teach a junior programmer basic OTP concepts with the confidence that they can't mess things up. Now, they wouldn't be able to come up with the architecture I designed as a good idea, but I could tell them to implement it (with tests!) and expect them to get it right.


That's what exceptions are for, no? If a connection dies an exception is thrown that would propagate up to the top of the thread stack. You'd then catch it and sit in a loop re-establishing the SSH connection, or terminating with a signal to whatever thread started the monitoring thread that it was dying. The act of unwinding the stack would pass through the finally handlers, closing open resources and cleaning up, before the loop starts again.

The failure domain here isn't precisely defined because shared data is allowed (but not required). You could define it as "anything reachable from the thread/fiber stack".


No. If you try to use exceptions to guard your failure domains in this fashion you will not have a good time.


A current discussion on the Loom mailing list is about providing Structured Concurrency [1] primitives.

It would allow you to write something like:

    try (var scope = FiberScope.open(Option.PROPAGATE_CANCEL)) {
        var fiber1 = scope.schedule(() -> sshKeepAlive());
        var fiber2 = scope.schedule(() -> trackHost());
        var fiber3 = scope.schedule(() -> trackVMs());
    }
With the garantee that if any fiber fails (which you bind to cancelling it), all others will be cancelled.

[1] http://250bpm.com/blog:71


Java's GC has to be best in class because of shared memory. In a shared nothing world doing GC in one "thread" doesn't stop other threads from executing, it also means that each heap can be very small so you might not even need to perform gc before the thread is done executing. It's truly amazing what Java is doing, but keep in mind that Erlang has worked this way for _decades_. And still, a classic web server that spins up one thread/process per request, can still potentially end up responding to the request with zero garbage collection in the best case, irrespective of load. This will not be true for Shenendoa or ZGC.

Does Java's Hot code reloading support data migration? One benefit of Erlangs model is that you can execute hooks when HCR is performed to make sure your data in memory is migrated to a new format.

But really, the most important thing about Erlangs actor model is error handling. If I spin up a process in Erlang and it fails, it won't corrupt the state of my other processes. In Java this can only be attained through disipline since all memory is shared. Also, I can very easily specify which processes should work together as units, such that if one fails, they all fail, and can be restarted together from a known working state. This, again, requires discipline in Java.


Per thread GC is definitely a different approach than Java takes. The trade-off is that shared memory between Java threads is nearly free. Basically the same approach C++ uses, except Java has better concurrency primitives because its VM. Not sure about Erlang but data sharing between processes on JS and Python is very expensive and a frequent criticism of those languages. You can achieve zero garbage per request in Java. Typically high performance web frameworks like Undertow and Vert.X are designed this way. User code rarely does it but its definitely possible.

Not sure what you mean by data migration on code reloading. I suspect the mechanisms are different enough that it can't be compared. With Java you can load arbitrary new code, but changes to existing code are limited in ways that prevent data incompatibilities. For example you can add fields to existing object but you can't change the type of existing ones.

Data corruption from threading is rare in Java. I can't remember the last time I ran into it. Its easy to do but everyone is used to threads and the concurrency implementation is one of the best I've used. Java also supports thread groups to ensure that threads die and get restarted together. Its not automatic, you need to manage the groups, but I think it achieves the same.


In Erlang processes need to send messages to each other. And those messages are copies (nothing is shared). This is less efficient than in Java where everything is shared, but it also means that process a cannot change something that process b is looking at. So locks in Erlang, aren't necessary. It also enables easy distribution. When all processes share data by messaging, it doesn't matter if those processes are running on the same machine or are distributed on a network.

Since Erlang has one GC per process, you can create garbage in one process without triggering GC if that process is short lived. Once the process dies, the entire heap for that process is returned to memory. So in Java, you'd have to write code in a special way to avoid GC, but in Erlang that happens automatically if either your process exits before the heap for that process needs GC. And in Erlang it's pretty normal to run one process per http request, so this does happen in practice, without requiring anything of the programmer.

When it comes to hot code reloading and data migration. When you hot load code into an Erlang vm, a hook will be called if defined which allows you to migrate all data that is in memory into new format. So, you're not restricted by data-incompatibility.

Your last paragraph is what I referred to by required discipline. Everyone that touches the code is required to understand what causes corruption and what doesn't. It also requires that you know which classes are thread safe and which arent, which is hopefully documented somewhere. Thread groups need to be understood (I work in Java/Kotlin every day, and I didn't know what thread groups were before today). In Erlang, data corruption due to multiple processes doesn't happen, and grouping processes together (supervision trees) is so common I can't remember the last time I saw an Erlang program without one.

Which of course doesn't mean that Erlang is superior to Java. But when you're working on something highly concurrent which needs to be fault tolerant, I'd argue that you'd get a better result with less effort than in Java. But of course, if you know Java really well and don't know Erlang at all, YMMW.


Different strokes I guess.

Erlang's model with fibers and message passing sounds close to Golang. Java has decent support for immutable objects with immutable collections, Lombok, the FreeBuilder library, both build-time code generators, and Java 14 record types. Automatic passing between machines is unique to Erlang

Per process GC isn't anything like Java does, but the new GC's are probably fast enough that it doesn't matter in practice. For any sane sized heaps the GC pauses are around 0.5 millisecond. This wasn't true until a few years ago, and in production most people don't know or care enough to use the new GC's.

You are right about thread safety in objects. Thankfully the JDK surface is fully documented. Third party libraries usually are. Internal code is a crapshoot. It requires discipline, but I still find it rare in practice because the normal patterns lend themselves to thread safety.

I think its safe to say that Java is a lower level language than Erlang which enables many of the same patterns with less convenience. You can probably get better performance with Java, but your fault tolerance completely depends on how good your coders are. Java will not save you from doing stupid things between threads.


Sounds about right :)

Just wanted to touch on one point. Golang also has shared memory, even though it encourages sharing by communicating. In Erlang you don't have a choice. Golang also doesn't have something like supervision trees (threads that die and restart together). So in practice golang and erlang concurrency is very different.


Interesting, so Golang channels are basically a hybrid between the two approaches.

I envy Rust and its borrow checker. Its a pain to get used to, but enables "shared when you say it is" concurrency model with no overhead. No message passing overhead, optional but safe mutability, no data corruption possibility, zero copy basically everywhere


Fair point - all this is true.

As a counter-point, I've been working on a platform for the last few years which uses Kotlin and Quasar in production. Quasar was cool at first but now it's just a nightmare and I wish we never opted to use it. It leaks abstractions all over the place with @Suspendable annotations and users of the platform find the quasar related errors super confusing. Debugging is also very difficult because of Quasar. On the other hand, Kotlin is great!

If I could turn back the time, I'd build the messaging/async workflow part of the platform using Erlang. I've mentioned this to a few people but they all think I'm mad... "Erlang... are you on drugs?!", which is disappointing because it's literally perfect for our use case.


I have actually heard the same about Quasar so I have avoided using it. It hacks up the bytecode so evil bugs appear common based on my glances at issue tracker.

Why didn't you use Kotlin coroutines? My understanding is that they achieve the same as Quasar without the insanity.

You may also want to look at Vert.X. Its evolved into a lot more than a REST framework. It uses thread-per-core and nonblocking to achieve high performance instead of green threads. It theoretically performs better because there's not a lot of stacks hanging around and only 1 thread per core. There's a lot of callbacks though, so if you're not used to RxJava style chaining its hard to get used to. Its very much like Node.

Erlang or Go would be the easiest if you need a lot of threads. If you just need high performance with a lot of connections, Vert.X may suffice. Java IO in recent years is fully non-blocking so you don't need a lot of threads for high concurrency. Vert.X can handle millions of concurrent clients, enough that you will need to adjust your kernel to hit its limits. And its built on Netty which is rock solid.


Sounds like you made the right decision! When we started the project four years ago, coroutines were still quite experimental so wasn't a feasible option for us. If I were to consider all the trade-offs, moving to coroutines not something we should do now as I believe it would yield only marginal benefits but would break compatibility for customers.

One of the main problems with the Quasar/coroutine based model is that the semantics are quite hard to undersand for developers who are not very familiar with concurrency. They write code that _looks_ synchronous but is actually async. We get a lot of support tickets claiming there's a bug in the platform when the reality is that they don't understand what's going on. I sympathise with them and we probably need to do a better job of hiding the complexities. As you note, the bytecode instrumentation is a bit of a pain but not only that... It also has quite a big impact on performance!

There has been talk of doing some experiments with Akka and that's something I'm interested in exploring. But I think, hypothetically, that writing parts of the platform again in Erlang/OTP would yield huge productivity benefits... gen_fsm offers exactly what we need out of the box. From the little playing around I've done with Erlang, it feels like you can get a small, competent team, up to speed fairly quickly.


You may also look at Kotlin coroutines in VertX, that we are using and seem to work just fine.


I have been looking at this for an upcoming project where we need to handle a ton of persistent HTTP clients. Regular Vert.X is "fine" but TBH having all the callbacks sucks. My only reservation to using Kotlin is IDE support. I know its great in IntelliJ but licenses are expensive and I don't want to advocate something that ties us to a single IDE. Lots of our guys use Eclipse and VSCode.

I know there's plugins for Kotlin support in other IDE's, have you used them and if so are they any good?


IDEA Community Edition has full Kotlin support and is free.


Non-preemptive concurrency doesn't invalidate any argument. Erlang's GC is per user thread, even a primitive GC per user thread will have lower latency than Java's GC.


I want to see a source on this. Golang's GC is often touted as better than Java's but every real world benchmark I've seen shows that it sacrifices a lot of throughput for low pause times, essentially by running much more often.

Java's new garbage collectors, ZGC and Shenandoah, have average pause times of 0.3 milliseconds on heaps less than 4GB. I find it unlikely that another language has pause times shorter than that given the sheer amount of work put into Java GC over the years


> have average pause times of 0.3 milliseconds on heaps less than 4GB

The answer is that one user thread would have a lot less memory than 4GB, and the GC only needs to work on heap sizes of KBs to MBs for most cases.

There is no shared memory(well, mostly).

It's comparing apples and oranges.


Don't forget ZIO (and other Cats Effect based libraries) which is the new kid on the block, and has it's own take on fibers and concurrent programming.

The biggest problem with Erlang is that hardly anything out there needs this level of concurrency and robustness in a single system - in the new world of microservices and serverless architectures there are other ways to cope with scaling. This is the main selling point, and unfortunately in all other areas Erlang is significantly outdated and refuses to evolve - even less so than the Java language which is a dinosaur in itself.

Having said that I think Erlang is a fantastic teaching tool and should be on everyone's bucket list of "things to learn in this life as a software engineer".


> The biggest problem with Erlang is that hardly anything out there needs this level of concurrency and robustness in a single system - in the new world of microservices and serverless architectures there are other ways to cope with scaling.

I wondered about this myself before I started using Elixir. In practice, it turns out when it's cheap to make things concurrent more services take advantage of this feature.

Tests and the elixir compiler are extremely fast because of this, and it makes the whole development experience better.

Because the primitives are so simple, people experiment more which makes better software. Nobody would come up with phoenix live view for Play framework in their spare time because play framework is so overly complicated.


Yes, that's a problem, you can't be the only one responsible for a core part of the stack.

I did an interview at a major streaming company and one critical part was written in Erlang. It has been working great but the guy who wrote it had left for some time and nobody knew Erlang there, so they would have to rewrite it if an update was needed.


Or, someone could take a week to learn Erlang and make whatever change is needed. You won't be an Erlang expert in a week, but it's a pretty small language, so editing existing code isn't that hard. And that existing code can't be that bad, since it's still working.


These are not classic continuations because they cannot be cancelled, but they're still very useful and true fibers. what? Coroutines do support cancelation


Is Java the best out there?


> Programming with concurrency primitives is a difficult task because of the challenges created by its shared memory model.

I never understood this often repeated point. As junior / mid-level developer I had the privilege to run self written .jar files on government scale systems with more than 50 cores. I used Java thread pools and concurrent data structures to do heavy cross thread caching.

It was all pretty simple and concurrency & parallelism were never an issue but simply a necessity to make things run fast enough.

Am I a concurrent programming genius? Were the types of problems/challenges I was solving too simple? When is concurrency in Java ever hard+?

+ I know about Java masterpieces like the LMAX Disruptor that are mostly beyond my skill level, but those are low level writte-once libraries you wouldn't write yourself.


> When is concurrency in Java ever hard?

Potentially-racey stuff:

* Synchronized primitives don't compose. You can safely `synchronized get(...)` and safely `synchronized put(...)`. But their composition put(get(...)+1) isn't synchronized. And it's hard to mentally revisit it at the end of the day: if you have a class with some methods marked synchronized, nothing will tell whether you've synchronized the right methods. You just have to think it through again and hope you reach the same conclusions as before.

Other (non-racey) stuff:

* Threads are heavy, CompletableFutures are light. But CFs lack the functionality of Threads. A CF can't decide to sleep for a while, nor can it be cancelled. (As an aside, BEAM threads are super light).


Java has a large set of higher level abstractions for concurrency. You don't have to use low level locks but you can. (And that's just Java, there's also Scala, clojure ...)


I'm pretty firmly in the "shared memory parallelism is a Good Thing" camp, but the counter argument to your point is that having a larger set of concurrency abstractions is a Bad Thing in that any particular piece of code has to consider all of the different permutations. In a shared-nothing world, there's a lot less to worry about (except occasionally performance).


The world writeable cross thread also has implications on how your GC algorithm is designed.

Erlang for instance scopes gc pools per process so short lived processes just drop the pool. Also GC of one worker doesn't stop any others. Can't remember if it even needs to be generational because the heaps are already sliced by process. It's the closest thing to heap arenas I've seen in a VM based language.

Or take Lua which is single threaded and doesn't require VM safepoints since everything is done via cooperative coroutines.

Java needs to assume worst case and as such has to be conservative in some of it's approaches.


As written here: https://github.com/l3nz/SlicedBread - "the over 400 rich pages of "Java concurrency in practice" show how hard it is to write and debug a good-mannered multithreaded application in standard Java."


... what are they?


https://docs.oracle.com/javase/8/docs/api/java/util/concurre...

And given that you didn't know that, you really need to study them.

java.util.concurrent is one of the greatest gems of software ever written.


So I describe the shortcomings of CompletableFutures, and you point out that there's a java.util.concurrent package?

Is this method one of its gems? A cancel() which doesn't cancel? https://docs.oracle.com/javase/8/docs/api/java/util/concurre...

> Synchronized primitives don't compose. You can safely `synchronized get(...)` and safely `synchronized put(...)`. But their composition put(get(...)+1) isn't synchronized.

So is there anything in java.util.concurrent that does compose? put(get()) has exactly the same problem up here at the 'high level' (of CountdownLatches and Semaphores) as it does at the 'low level' (of synchronized methods.)


Java does not allow one thread to kill another due to its shared memory concurrency model. It actually used to, but this feature was removed because it caused so many deadlocks. The reason is that killing threads won't always release monitors and locks. Lack of the feature is intentional.

You can get a lot better nonblocking support with third party libraries. Like RxJS in javascript, RxJava is almost a requirement when doing non-blocking code.

You are right that true green threads would allow thread cancellation one day. Right now, Java can't because it relies on OS threads which aren't safe to cancel. Userspace threads don't have that problem


> Java does not allow one thread to kill another due to its shared memory concurrency model. The reason is that killing threads won't always release monitors and locks.

I can't follow the reasoning here. To refute it, you'd only need to show a language with shared memory and futures with cancel(), right? Aside from that, a second shortcoming isn't a good excuse for the first.

> You can get a lot better nonblocking support with third party libraries.

I don't doubt this. Another example is vavr.io, which gives much better Lists/Optionals/Streams than the standard Java 8 ones. And Joda-Time before that.

> RxJava is almost a requirement when doing non-blocking code.

Why can't CompletableFutures do what RxJava did?

> it relies on OS threads which aren't safe to cancel. Userspace threads don't have that problem

This bit was the most confusing to me. In my shitty understanding of the model, Java Threads are based on OS threads, which is why they're relatively heavy. CFs on the other hand are are lighter and managed by the Java runtime. Why is it then that I can cancel Threads, but I cannot cancel CFs. Your explanation would seem to justify the opposite.


Maybe I muddled my words on this. Java supports "requesting" thread cancellation, but its not safe to forcibly kill OS threads in any language I know of. For the same reason you can't in Java, finalizers won't run. This includes C and C# among others that use OS threads. https://stackoverflow.com/questions/13285375/how-to-kill-a-r... https://stackoverflow.com/questions/1327102/how-to-kill-a-th...

Notably C lets you do it with the big warning that it can crash everything. C# made the same decision as Java and doesn't allow it.

Java and pretty much all OS thread-using languages DO allow you to end threads by nicely asking them to stop. This is not the same as forcing them to. In all cases, if a thread is stuck in an infinite loop or a blocking call, it probably won't cancel if you ask it to. It depends on whether the thread is written to handle cancellation properly.

This is also a limitation of some languages with fibers, including Erlang. You can't force a thread thats stuck in certain states to stop running in Erlang even though it uses fibers. Some languages with fibers do allow it though.


> Maybe I muddled my words on this.

Not at all. Thanks for your patience.

You've laid out really good arguments as to why Java Threads cannot be cancelled, and suggested that userspace threads -- which I (perhaps mistakenly) interpreted as CompletableFutures -- should be able to be cancelled.

But the thing is, I can cancel (request) Java Threads. And I cannot cancel CompletableFutures.

From my original comment:

> But CFs lack the functionality of Threads. A CF can't decide to sleep for a while, nor can it be cancelled.


What conditions do you need to get an Erlang process that you can't kill? Brutal kill, or load the executing module twice always worked for me (except for a couple deadlock bugs my company added to our locally patched BEAM)


What do you mean the cancel method doesn't cancel ? It does cancel with a CancellationException stored in the future and an optional interrupt send to the thread running the blocking operation.


> optional interrupt send to the thread running the blocking operation

I'm not 100% sure on what you mean by an optional interrupt, but I'm guessing it's the boolean flag on the cancel(); Let's look!

    public boolean cancel(boolean mayInterruptIfRunning) {
        boolean cancelled = this.result == null && this.internalComplete(new CompletableFuture.AltResult(new CancellationException()));
        this.postComplete();
        return cancelled || this.isCancelled();
    }
Doesn't look like it's used.


You are right. Even the javadoc says that `CompletableFuture` doesn't support this. Bah! will force me read the fine print next time.

mayInterruptIfRunning - this value has no effect in this implementation because interrupts are not used to control processing.


> What do you mean the cancel method doesn't cancel?

Try it.

    @Test
    public void cannotCancelARunningFuture() {

        // Future that just sleeps
        CompletableFuture<Void> sleeping =
                CompletableFuture.supplyAsync(() -> {
                    int i = 0;
                    while (true) {
                        System.out.println("zzzz: " + i++);
                        threadSleep(300);
                    }
                });

        // Make sure it started
        threadSleep(1000);

        sleeping.cancel(true);
        System.out.println("Sleep was cancelled");

        threadSleep(1000);
        System.out.println("Haha no it wasn't");
    }
===============================

  zzzz: 0
  zzzz: 1
  zzzz: 2
  zzzz: 3
  Sleep was cancelled
  zzzz: 4
  zzzz: 5
  zzzz: 6
  Haha no it wasn't


original Futures can't be cancelled. CompleteableFuture can be, but only if the thread agrees to die.

You can achieve arbitrary non-blocking delays by using the cruft scheduled thread executor or doing it sanely with RxJava. Really its just dangerous to do nonblocking stuff in Java without a wrapper like RxJava. That's not a good thing, I look forward to the day there's real fibers



> > A CF can't decide to sleep for a while, nor can it be cancelled.

> How can't a CF be cancelled?

So glad you brought up the docs. CF implements cancel(boolean mayInterruptIfRunning)... which does nothing ;)

More precisely, it will not cancel a running CF. If the CF hasn't started yet, by all means cancel it. But if it's running, that cancel method does nothing.


I started to write a response but remembered rich hickey talk I went to where he lays out problems with java style concurrency

Clojure Concurrency - Rich Hickey https://www.youtube.com/watch?v=dGVqrGmwOAw

Even though the talk is called clojure concurrency, first half of the talk is about the problems clojure solving in traditional concurrency.

one my favorite talks I ever went to.


This version has the slides and video side by side: https://www.youtube.com/watch?v=nDAfZK8m5_8 might be easier to follow.


What are his thoughts on Erlang? (I have not finished the talk video yet.)


That it contains many good ideas but the lack of shared memory and the poor sequential performance leave a lot to be desired.


>It was all pretty simple and concurrency & parallelism were never an issue.

A lot of developers are not aware of what thread is going to execute their code, or of what that implies (I think it takes practice, at least it did for me), and in my experience it often leads to shared mutable state without proper guards, or deadlock hell from locks being created all over the place in hope to make things safe, or other nightmares.

>I know about Java masterpieces like the LMAX Disruptor that are mostly beyond my skill level

Both the basic idea of the Disruptor, and its simplest implementation (mono publisher, mono subscriber), are pretty simple: just using minimal memory barriers to write and read data cycling on an array, and (busy-)wait whenever you bump into whoever is ahead (the publisher if you're the subscriber, or the subscriber if you're the publisher).

Quoting one of its authors:

« Sometimes we have absolutely no choice and we need to go parallel and use a lot of concurrency. If you do, get people in who are good at it. And actually, I found most of the people who are really good at it, their instinct is they'll do it as an absolute last resort, because they know how complicated it actually gets. There is a scottish comedian called Billy Connolly [who said]: "people who want to own a gun, or be a politician, should be automatically barred from either of them." And I think it's the same with concurrency: anybody who just wants to do it should not be allowed. » (https://www.infoq.com/presentations/top-10-performance-myths)


>When is concurrency in Java ever hard+?

Take a look at dated, but still relevant book by Brian Goetz - Java Concurrency in Practice - many problems are illustrated with a code section.


"Concurrency primitives" here is probably referring to Java's fundamental mutex system as used with `synchronized`, `wait()`, and `notify()`. Java's thread pools and concurrent data structures are built on top of these and as you noted are relatively straightforward to use correctly, as they take care of the actual coordination of threads for you.


It's fine if you know what you're doing and are the primary maintainer. As someone who's encountered code maintained over a long period of time with lots of people coming and going, concurrency using locks is quite ugly, especially as people cargo cult "better performing" solutions that aren't actually & just add complexity/race conditions. My experience is primarily C/C++ but this is all agnostic to the language.


> this is all agnostic to the language.

yes and no. Yes - in the sense that pretty equivalent things can be done in different languages. No - i'm in the same group of "geniuses" as GP, and i see for example on our current huge C++ platform project the highly technical people struggle with and do the wrong things with concurrency/multithreading that i don't remember seeing the even mildly technical people doing on various large Java projects.


> Were the types of problems/challenges I was solving too simple?

Without more information this is the likely scenario, going by my own experience.

BTW, if it turns out you are a concurrent programming genius please write about it, eh? (Like a blog or book or something.)


It's simply FUD and a strawman. Designed to trump up support for the alternative. I mean the actor model is fine on itself, but people like to set up a strawman problem to talk about the perceived benefit over it.


You use concurrent data structures but didn't write them yourself correct? I would say that that isn't programming using concurrency primitives.


BEAM is amazing and IMHO there's one very sweet spot ready for optimization: math functions. I know I can escape out to C/Rust/etc. yet the majority of what I do is simple float math such as stddev and vector normalization.

The article states benchmark of 5000% speedup on floats when switching from BEAM to the JVM. I would like to offer $100 as a gift incentive to anyone here who wants to work on optimizing BEAM math.


People say this isn't what BEAM is intended for an it excels elsewhere, which yes I'm sure it does.

But why can't it be both? Why can't you do everything that BEAM does... and then also have an optimising JIT for the straight line maths code? Couldn't you leave all the other parts of the system the same and keep all the existing benefits? Improving one doesn't damage the other does it?


Co-author here.

The problem with number crunching or maths is that it is very difficult to cut the whole computation into smaller units and pre-emptively schedule it. If it is possible for a specific use case, then it is moderately easy to replace that part with NIFs. For effective maths you need to convert the internal tagged number representation to machine native code that is also expensive. Solving these two things in the generic case is very difficult while preserving all the good parts.


Am I correct in saying that functions written in C do not get pre-empted like Erlang functions? If that is true, you could write computationally intense code in C within a BEAM app. But I think this misses the point. Pre-emption is really cool for concurrency abstractions, and the trade off is being less good at single threaded computation. Trying to turn Erlang into something like a Bitcoin miner is kind of like combining a bunch of Roombas to make a Shop-Vac.


Am I correct in saying that functions written in C do not get pre-empted like Erlang functions? If that is true, you could write computationally intense code in C within a BEAM app.

They cannot be pre-empted, but they must also return quickly, or risk causing lots of problems (see https://erlang.org/doc/man/erl_nif.html for slightly more detail on what this means). As such you can't just write some big function in C to do number crunching.

The NIF documentation mentions some ways around the problem, but all of them take some effort, or have tradeoffs of some sort. I was really excited when “dirty” NIFs were introduced, which can tell the BEAM that they'll run for a while, thus appearing to allow for long-running NIFs with no extra work other than setting a flag. However, it turns out that the BEAM just spins up N threads for scheduling dirty NIFs, and if you have too many such NIFs, too bad, some won't get scheduled till the others have completed. In retrospect it should have been obvious that there couldn't be a silver bullet for this problem, because it really isn't easy.

Erlang may well be my favorite language, but as you imply, it's just not going to be the right approach for everything: in my experience, it's absolutely fantastic in its niche but that niche is quite small. I think that's fine, though. For me, where Erlang does make sense, its concurrency approach makes it unbeatable, and I'll live with the performance tradeoffs. It turns out that basically all the NIFs I've had to write were just to gain access to functionality that Erlang doesn't expose (e.g. network namespaces on Linux, which are supported now, but weren't when I needed them).


You are correct.

It's actually worse than that; as I recall, the internal numerical representations of numbers do not necessarily map to the CPU's (for instance, there is no byte sizing; you have integers and floats, and they can be arbitrarily large). The work to perform that conversion, do the math, and convert back, would almost assuredly make it so that a single calculation takes more time than just doing it within the BEAM. The only way to save time would be to convert once, do a bunch of math, and convert back. Which would, yes, prevent pre-emption, AND require indication of intent (so brand new language constructs, minimally).

That's a lot to expect of the user, and a lot to implement in the language...all to avoid just writing a NIF.


Well pre-emption only happens at function call or return, or when a NIF calls the special function to allow for it; if all the math was in a single function, with no interleaved function calls (possibly after inlining?), unbox, math away, and rebox could work.

There wouldn't need to be an indication of intent, other than writing the math separate from any function calls. I don't know how much code fits this pattern, but it's an idea that could be explored. I think that's part of what hipe is supposed to do, but I haven't looked into hipe in a long time.


> Which would, yes, prevent pre-emption, AND require indication of intent

I don't understand why. If you have a maths-intensive operation like matrix-multiplication using untagged maths, why does that prevent pre-emption? Why does it require indication of intent?

And there's already a basically zero-overhead way to implement pre-emption - safepoints - that's what the JVM does when it wants to pre-empt in user-space.


Uh...no. A safepoint is when all threads in the JVM have blocked (which is purely cooperative, and happens during thread transitions), and, importantly, when OS threads running native code still are running, but can't return/respond to the JVM. The JVM doesn't pause those threads.

Which is the point. You can't preempt crunching those numbers if it's not within the BEAM. Which might be fine. Or it might not. Making it invisible to the user is not really a good idea when going for soft realtime properties; at least with a NIF and dirty scheduler you're being explicit about it.


> A safepoint is when all threads in the JVM have blocked

And having blocked them, you can then pre-empt them.

> You can't preempt crunching those numbers if it's not within the BEAM.

I still don't see why sorry. If you had a JIT and you compiled maths intensive code to native code, it could run efficiently in BEAM and still be pre-emptible by having a safepoint in the generated code.

How do you think Java is doing optimised numerical code that is pre-emptible from user-space? Safepoints! BEAM could do the same thing.


You, uh, realize that's not actually preemptive right? Like, having to thread in checks, that are cooperative, is by definition not preemptive?


Yes I think that's a reasonable definition of pre-emption because they don't interrupt the numerical pipeline. They cause zero data or control dependencies. But even if you don't think it fits the definition, what do you think the practical difference is when you argue about this terminology?

What did we want to achieve? We wanted to be able to run a tight loop of highly optimised, untagged numerical code but still be able interrupt it to switch threads on demand from user-space if needed.

Safepoints let us do that.

What else did we need that this doesn't cover?


OTP and BEAM are maintained by a very small team compared to other language projects. Working on math performance could likely reduce the amount of time to be spent on other things.

Perhaps there are some easy wins, but JIT is not an easy thing. Depending on your needs, pushing math onto a port, or a nif is probably a quicker win than trying to make it fast in Erlang. However, I wonder if the single static assignment optimizer would offer a path towards recognizing 'straight line math code' and potentially running things much faster. But there's still an issue of potential mismatch between the very general number format with automatic bignum promotion and whatever the underlying machine provides.


> But why can't it be both? Why can't you do everything that BEAM does... and then also have an optimising JIT for the straight line maths code?

I love this attitude. BEAM is something novel and special, and I think it's important to think of how to incrementally address its current shortcomings instead of throwing our hands up. I find GHC is another place where incrementalism on top of novelty is resulting in a lot of people's wishlists to be fulfilled.


I think this is what akka and quasar gives you. There might be some other trade offs at play though I can't say.


Make this 100k and we can maybe begin to look at a research project for a prototype that noone will use. Make it 1M and we can make something that works maybe for you.

Make it 10M and we can make it work for one version of OTP in a few years.

Make it 100M and recurring for 25 years and we can make it so it is in the ecosystem. This problem is hard and a lot of people have tried over the years. It always break down by not being able to deliver or noone wanting to maintain it.


One option for fast math in a BEAM setting might be to use a NIF based vector library like Matrex[0]

[0] https://github.com/versilov/matrex


Yes I mean if the ceiling for that ( or all Maths ) benchmark is 50x, getting even 25x is good enough. At least you know you are not giving up 50x difference just because you cant be bother to escape to C/Rust/etc.


Can someone steer me to some good benchmarks, discussions of perf characteristics and gotchas of the BEAM? My search-fu is weak and I'm not finding the sort of content I'm after.

I'm trying to learn Elixir and being a systems thinker so before I (can) get too comfortable I'm gonna want to dive into origin stories to build up my holistic map of why things are the way they are, what can be done and what can't be done, and understanding bottlenecks in the BEAM seems like it's gonna have to be part of that (the way I studied JVM tech documentation when I did perf and architecture work in Java)


I think you're going to have a hard time finding what you're after. Erlang in Anger [1] might be the closest, it will at least show some of the gotchas you run into.

From my experience, the gotchas tend to hit with emergent behavior, which is hard to benchmark, and may be repeatable in production, but is hard to model in a testing framework.

I'm not sure how much impact off-heap messaging has had, but the basic gotcha is that as a process gets bigger, it tends to run slower (because GC over more memory takes longer), and develop a larger message queue, which makes it slower. You need to have backpressure in your system, or small blips in procesing can blow up to huge messaging queues that can't be processed. Monitoring for overall queue size and maximum queue size is an important health indicator.

The other basic gotcha is that Erlang/OTP tends to default to 'unlimited' resource limits and 'infinity' time outs. You often want to have limits, and timeouts, but a general system doesn't know what you want. Sometimes, the unlimited settings result in terrible system behavior if you hit larger numbers than anyone else tested, but if you hit this, it's usually easy to fix.

A good thing about OTP is that they've written as much as possible of the environment in Erlang itself, so it's easier to change things when needed than a system where most of the provided apis are implemented in C.

[1] https://erlang-in-anger.com/


In general I would say there's no good single book or resource that describes everything comprehensively. There's a lot of resources, though, but mostly scattered in various places.

The BEAM Book [1] is a good, though unfinished resource talking in general about the implementation - the memory model and the interpreter.

If you're interested in some very low-level details of the runtime, the internal documentation [2] also holds a lot of interesting details.

There are also some additional details on internals at Spawned Shelter [3].

[1]: https://blog.stenmans.org/theBeamBook/ [2]: https://github.com/erlang/otp/tree/master/erts/emulator/inte... [3]: http://spawnedshelter.com/#erlang-design-choices-and-beam-in...


I want to know the constraints to, and evolution of, sequential computation on the BEAM. I want to form opinions on how that landscape is likely to change within the lifespan of a project I'm affiliated with.

I get mostly false positives trying to find those sorts of discussions or metrics.


I am not sure i understand the problem you are trying to find information about. Maybe explain it a little bit more ? or go ask for it in the elixir forum, people can try to be your librarians there


To an outsider, it seems like the BEAM documentation [and particularly, videos] go out of their way to discuss how process management and IPC communication works and how certain classes of data are managed. They talk about what makes the BEAM the BEAM to exclusion of all other concerns.

Prior to finding this document (http://www.cs-lab.org/historical_beam_instruction_set.html) I had no idea whether you could actually do computation on the BEAM. I was starting to wonder if they had misappropriated the term VM, and some sort of inline assembly trick was being used for everything but control flow and IPC.

Interpreted code has very, very real computational constraints and you can't assume people will know this, even now. Especially if your system is noteworthy for how it is not like other systems. Where does it stop being 'weird' and start being conventional? The boundaries describe both sides of a distinction. Even if you're only interested in the exotic part, leave some breadcrumbs for others.


Hmm, are you maybe after this page?

http://erlang.org/doc/efficiency_guide/advanced.html


> [and particularly, videos] go out of their way to discuss how process management and IPC communication works

After watching a bunch of videos on BEAM/erlang/elixir I came to the conclusion that it isn't a platform for computation, it's a platform for communication. The best video (by far) was The Soul of Erlang and Elixir • Saša Jurić https://www.youtube.com/watch?v=JvBT4XBdoUE

Two shallow benchmarks:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

https://www.techempower.com/benchmarks/ (phoenix is at 175)


I still don't understand what you are talking about, but i suppose this is more due to my own limitations in missing a lot of the background you come from. Sorry that i am not that useful :(


The JVM supports hot code loading, although this article seems to imply only BEAM supports it.


The article mentions "The JVM allows you to change the code while the program is running."

However, that's not quite the same thing. The JVM allows you to change instructions, but not -data-. That is, in between versions you change what data a class contains, there is no way to change it out from the running instance. The JVM either has one version of the bytecode loaded, or the other; it has no concept of transitioning between them.

The BEAM has a mechanism to do that. It can have both loaded. And you can write transformation functions to allow the internal process state to transform from one to the other.

Per the article, "Hot code loading means that the application logic can be updated by changing the runnable code in the system whilst retaining the internal process state" - emphasis added. That's the key bit for maintaining uptime during an upgrade. Honestly, I don't think it's used that often, but it's there.


You can do that on the JVM too, just use separate classloaders and let the new objects reflect over the old to transition data across. It's not widely done though, for sure.


How do you move data? That Foo object that previously referenced both a Bar and a Baz, but you refactored it so now the Bar, not the Foo, has the reference to the Baz? Or where you changed the type of Baz from Gleep to Glorp?

Erlang, due to how actors encapsulate state, and dynamic typing, allows you to do those things pretty trivially.


There are frameworks that support this kind of evolution, e.g. Kryo can do graph->graph transformations without intermediate serialisation I believe. Or you could write it by hand. If the static typing gets in the way you can always just use a scripting language to do it, many run on the JVM.


Do you have a reference on that?

I want to make sure you're talking about the same thing.


Clojure does hot code reloading as a built in. You essentially send code to a running system and you change it. It’s enabled by a dynamic class loader. I wouldn’t say it’s common outside of Clojure though, the whole language and ecosystem is built around this concept.

To be clear: JVM enables the feature, so “technically” JVM allows hot code reload. Not sure how useful this is in practice for non-Clojure JVM users.


Runtime code generation is a common optimization in java frameworks. End-users may never see it but the majority of popular frameworks use it under the covers.

Debuggers also use the functionality to allow live code editing and expression evaluation when paused on a breakpoint


Eclipse, IntelliJ and Netbeans all support it out of the box for Java code.


ClassLoader [1] in the small, and OSGi [2] in the large, are good starting points for comparison.

[1] https://docs.oracle.com/javase/7/docs/api/java/lang/ClassLoa...

[2] https://en.wikipedia.org/wiki/OSGi



generating code during runtime is rather common in Java frameworks. Most really popular frameworks use it. Spring for AOP and Hibernate for "bytecode enhancement". There's a number of libraries, like CGLIB and ByteBuddy designed explicitly to make this easy.

There are limits to how much you can change in existing loaded class code, but if you are just loading up new dynamically generated code you can do pretty much anything.

Its a big reason why some of these Java frameworks are so fast. They can generate highly optimized code on the fly, load it, and have it running alongside the existing app code within a few hundred milliseconds. And the Java JIT will optimize it as if the code was there the whole time.

This makes performance optimizations easy that would be impossible in AOT languages like Go, C, C++, Rust, etc


java.lang.invoke.MutableCallSite supports linking one particular MethodHandle at a time which the VM will optimize for. Then you can use MutableCallSite::setTarget with another MethodHandle which will cause recompilation of affected paths.


They mentioned this for the beam virtual machine but not for the JVM, the JVM actually can also do hot code loading as long as call site signatures are not changed or added. In some cases you you can make major changes to the current stack and restart the frame which is a pretty handy feature for developer. Some commercial extensions to the JVM get around all of these limitations.


Of interest might also be Erjang, Kresten Krab's port of BEAM to JVM. https://github.com/trifork/erjang

As I understand it, it is feature complete and actually runs Erlang pretty well. Could be interesting to see some benchmark testing.


Apparently this is an Erlang BEAM, not Apache BEAM https://beam.apache.org/?


Yes. It's about Erlang's BEAM VM.


Tl,dr. The point seems to be, "shared nothing makes concurrency and GC easy". Congrats. But also lots of big fast systems use shared memory, so just relax, STFU, and understand that tradeoffs exist.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: