Beam VM Wisdoms (2019)

neurostimulant · on April 21, 2021

This is quite a deep rabbit hole! Recently I decided to try learning erlang/elixir so reading about the beam vm internal is fascinating even though I barely know the language at all.

macintux · on April 21, 2021

Welcome to the BEAM world. It’s a very interesting place, definitely not just another variation on ALGOL.

This list I maintain hasn’t been tested for broken links for a while, but it should still be useful: https://gist.github.com/macintux/6349828

hinkley · on April 21, 2021

I thought modern processors had enough branch prediction that threaded interpreters were no longer faster than a normal switch loop.

Wonder if that also applies to aarm64

rkangel · on April 21, 2021

The BEAM runs on a variety of targets from your superscalar out-of-order targets like x64 down to much more embedded things. This may affect the performance tradeoffs.

Joker_vD · on April 21, 2021

They do, but a switch loop has only one indirect branch which can got to $INSTRUCTION_COUNT different places, and with direct threading there are $INSTRUCTION_COUNT branches, and different branches have separate histories.

cygx · on April 21, 2021

Supposedly, some relevant performance characteristics changed after Spectre mitigations got implemented. I never got around to looking into this, and have no idea how things stand right now...

jeff-davis · on April 21, 2021

Wouldn't the branch predictor be overwhelmed at some point?

uh_uh · on April 21, 2021

What I would really like to see is a statically typed language on top of the Erlang VM (dialyzer was not amazing last time I checked it, which was admittedly years ago).

ricketycricket · on April 21, 2021

Gleam exists: https://gleam.run/

uh_uh · on April 21, 2021

I'm looking at this now, thanks! Should have Google'd before typing.

Joker_vD · on April 22, 2021

Dialyzer is such a weird beast, it's overly optimistic IMO. We have code that almost literally is

    -record(outside_error, {
        reason :: term()
    }).

    -record(inside_error, {
        reason :: binary()
    }).

    -spec to_inside_error(#outside_error{}) -> #inside_error{}.
    to_inside_error(#outside_error{reason = Reason}) ->
        #inside_error{reason = Reason}.

and it typechecks just fine. I mean, yeah, we actually always used to have binaries in #outside_error—until now. Now we've started to sometimes pass a map in it. And so, somewhere down the line calls to unicode:binary_to_characters/1 suddenly start crashing because a map is not a binary, duh, but! it all still typechecks. Amazing.

ff_ · on April 21, 2021

Are you aware of PureScript on Erlang?

https://github.com/purerl/purerl

uh_uh · on April 21, 2021

I wasn't, thanks!

ritchiey · on April 22, 2021

You can also compile OCAML/ReasonML to Erlang.

https://caramel.run/

foobarbecue · on April 21, 2021

There sure has been some definition creep of ELI5...

klibertp · on April 21, 2021

What 5 y.o. wouldn't understand an explanation as simple as this:

> This way it is easy to jump to a location in C code which handles next opcode. Just read a void* pointer and do a goto *p. This feature is an extension to C and C++ compilers. This type of VM loop is called direct-threaded dispatch virtual machine loop.

...I guess the author works with quite smart 5 years olds, right...

skneko · on April 21, 2021

I mean ELI5 is clearly hyperbolic, but explaining dispatch like "Just read a void* pointer and do a goto *p" is definitely 1st year of CS level. This wouldn't be valid for teaching actual VM implementation fundamentals.

Some may wish courses about compilers & VMs would be as easily (and naively) explained, but this style misses a lot of critical and complex details. Nobody likes complexity, but if it exists, it exists.

whimsicalism · on April 21, 2021

Most "1st year of CS" people do not know C or understand how pointers work, let alone an 'opcode'

jhgb · on April 21, 2021

C is not a mandatory subject in first year of CS where you live? At least it used to be where I live, pretty universally (although at that time our universities were in the middle of a Pascal-to-C transition, so as to speak).

whimsicalism · on April 21, 2021

The United States? No, there are no mandatory subjects for majors.

My CS curriculum didn't have any "learn this language" classes, you were just expected to know the language that the class was taught in, with the exception of one of the classes that spent some time reviewing OCaml concepts.

Mostly it was C/C++ and Python for the scientific/numeric classes.

wongarsu · on April 21, 2021

We started with a semester of ML (to make sure nobody has prior experience) followed a semester with some quick assembly, some quick C, and a lot of Java. After that you are just expected to know whatever language the course requires

Tomte · on April 21, 2021

I had Ada95 in my introductory classes, and today my alma mater uses Java, I think.

I would estimate many more programs use Java than C.

jhgb · on April 21, 2021

We still have C/C++ as a first year subject where I studied, with Java being a second year subject.

stephenhuey · on April 21, 2021

My alma mater had zero language courses. We were expected to learn enough of whatever languages were necessary for concepts covered in the course. Intro courses used Scheme and Java. C ended up being used in later courses after the intro ones, e.g. operating systems.

_y5hn · on April 21, 2021

I coded C++, virtual dispatch and knew what void *ptr can be used for at 10. With right incentives children can learn anything, not that it helps in a world of domination.

atleta · on April 21, 2021

My understanding is that it has always been to be taken figuratively. Obviously, you won't explain this to (almost) any 5 year old. Mostly, because they will miss all the background. (Hey, what's a program at all, not to mention processors or virtual machines having virtual processors. BTW, what the heck is virtual?) Which you could explain, but half way through you'd loose their interest because their attention span is only so long.

Other than that, I found it pretty approachable. The basic knowledge it assumes is that you know what processors (and registers), pointers are and what a goto instruction does. It doesn't seem like a wild expectation. (Or something that you couldn't google in a few minutes if you already have some programming knowledge. And if you don't, why would you want to start with understanding how a VM works in the first place?)

yumraj · on April 21, 2021

I think 5 here refers to 5 years of experience with Erlang, Elixir or any other BEAM language. :)

Severian · on April 21, 2021

I guess the first and foremost for an ELI5 is WTF is BEAM VM?

The "BEAM VM ELI5" page doesn't explain what it is specifically, and how it's not a Hypervisor VM, but as a bytecode interpreter in Erlang. When I see 'VM' I think first of a hypervisor virtual machine.

Maybe I'm a bit of a curmudgeon, but I'd like to think an ELI5 would at least bring in the basics.

http://beam-wisdoms.clau.se/en/latest/eli5-vm.html

kvakvs · on April 21, 2021

This is not "introducing BEAM VM" type of website. More like a reference for more reading for those who already tried Erlang or Elixir or LFE or another language running on it. So they should come with at least the initial knowledge what this is.

Source: I created this website.

TechBro8615 · on April 21, 2021

I've never used Erlang, but have read up on it a few times over the years. I think a large reason why it's not more popular is that the concepts are so abstract and hard to grok, partially because -- as you mention -- there is a lot of overloading of terms from Unix land. e.g. a VM isn't a VM by the definition used in most colloquial contexts [0], a process isn't a Unix process, etc. And yet despite all this, you can run it on Unix systems. This is confusing.

There isn't really a solution to it, since it's not like Erlang can start from scratch and rename everything, but I really think this kind of cognitive overhead discourages people from learning new languages. The same challenges exist in languages like Haskell.

That said -- "devs are too lazy to read" is no reason to stop innovating. Erlang seems quite cool and I hope I get an excuse to use it in production one day.

[0] Yes, I know that Beam satisfies all the technical requirements to be a VM (https://en.wikipedia.org/wiki/Comparison_of_application_virt...) My point is more that, colloquially, people usually refer to a VM in the sense of a guest operating within a Hypervisor.

monocasa · on April 21, 2021

> My point is more that, colloquially, people usually refer to a VM in the sense of a guest operating within a Hypervisor.

Python, Ruby, Java, and JavaScript want a word with you. Bytecode VMs being called just 'VMs' colloquially is extremely common.

TechBro8615 · on April 21, 2021

Yeah, I mean, ultimately my argument is that devs are too lazy to understand documentation, but of course I don't consider myself one of those, so I can't defend the claim that my biased and imaginary sample of devs understands VMs in the same way as your sample.

That said, I will stand by the argument that most people (myself included) are too lazy to understand documentation.

tasogare · on April 21, 2021

> When I see 'VM' I think first of a hypervisor virtual machine.

VM as a bytecode interpreter is a very well-known meaning of the term thanks to the JVM.

neurostimulant · on April 21, 2021

I guess most people that uses erlang (or apps made with erlang) know what BEAM is, just like people that uses java or java apps know what JVM is, so the author doesn't feel the need to explain it anymore.

bmitc · on April 21, 2021

I think the ELI5 is being used rather loosely and figuratively and in spirit here, which should be clear.

> When I see ‘VM’ I think first of a hypervisor virtual machine.

That’s kind of a strange interpretation to me. In the context of programming languages VM is quite clear ala BEAM VM, CLR, JVM, LLVM, etc.

zorgmonkey · on April 21, 2021

I think the first sentence of the page explains pretty clearly that this an ELI5 of BEAM VM internals, not of what the BEAM VM is.

This is the collection of easy to read (ELI5) articles as well as in-depth knowledge such as VM internals, memory layout, opcodes etc.