WebAssembly architecture for Go

pcwalton · on March 3, 2018

The main benefit of using the Web Assembly GC is not improved performance within the language, but rather having a single GC that handles both native DOM objects and language objects. Having two GCs interoperate is a big pain, and it's very difficult to pull off without leaks or performance problems. This is especially true for DOM objects, which have no user-visible finalizers. (The closest thing is WeakMaps, but I don't believe these are enough to implement proper cross-language GC, because you can't query whether an potentially-dead object is actually dead.)

Web Assembly GC is not a performance optimization; it's necessary for correctness.

sseth · on March 3, 2018

In Chrome today, as an example, the C++ based DOM and V8 (Javascript ) objects are using different GCs. Obviously it is a pain point, but the situation of having different GCs for DOM and language objects is not new with Web Assembly.

pcwalton · on March 3, 2018

I mean, as of Oilpan the V8 garbage collector marks C++ objects by calling trace hooks, as I understand things. That isn't an option for Web Assembly, which doesn't have access to the V8 garbage collector in that way (except via the wasm GC proposal; in fact, that's fundamentally what the wasm GC proposal is).

emidoots · on March 3, 2018

It's also worth mentioning WASM has no GC today.

pjmlp · on March 3, 2018

My amd64 and ARM CPUs also don't have it.

On languages where the algorithm is an implementation detail, one can make use of reference counting with a cycle collector, which are just a few hundred lines.

Implementing one is pretty simple, making it perform well is another matter.

hajile · on March 3, 2018

Unlike your real processor though, you can't directly access memory or ensure its absolute location. You can make an advanced GC that is slow or a basic one that is not.

Sawamara · on March 3, 2018

Which is good. You can always implement GC inside it, but personally, for performance-intensive computing, I do not want GC inside WASM unless eplicitly enabled.

mutagen · on March 3, 2018

What kind of WASM file sizes can we expect? I know some work has gone into shrinking Go executables, especially in 1.7, but will we be able to produce something like the 15KB Rust Hello World WASMs? Go has a fantastic stdlib but it hasn’t prioritized web optimized file sizes (yet).

I’m looking forward to the results of this work!

steveklabnik · on March 3, 2018

A "hello world" in Rust is ~100 bytes; the more stuff you use, the bigger it gets, as less of the stdlib can be removed. The biggie is an allocator, which adds some size, but you can use things like https://github.com/fitzgen/wee_alloc and it's less than a kilobyte...

codedokode · on March 3, 2018

In Google's Doubleclick for Publishers UI they use Dart apps. The page loads several "apps" compiled to JS, each of which weights more than 2 Mb (even the "changelog" app). Page loads very slowly. So I would expect something similar for Go.

camdenlock · on March 3, 2018

This is huge. Looking forward to see what sort of front-end toolkits will pop up for this new platform (Go for the client-side web).

zenhack · on March 3, 2018

Worth noting, gopherjs has been around forever, and is mature:

https://github.com/gopherjs/gopherjs

...and already has a reasonable js interop story, whereas my understanding is that calling the dom api from wasm is not the simplest thing.

johnhenry · on March 2, 2018

I'm trying to learn more about this topic and I'm curious if anyone could clarify for me -- The reason Go would need a specific architecture for WebAssembly is because Go supports features, like garbage collection, that WebAssembly does not.

Is that right? Close? An oversimplification. Way off?

opencl · on March 3, 2018

Garbage collection isn't really a problem because no CPU architecture supports garbage collection- the compiler is generating code to implement the GC regardless. The issue is that WASM doesn't really resemble real CPUs at all. Basically all modern CPUs are register machines, but WASM is a stack machine.

Say you have the code C=A+B in your program.

The generated assembly for a real machine looks something like this: (R* denote registers and A-C are memory addresses) load A->R1 load B->R2 add R1+R2->R3 store R3->C

WASM looks more like this: push A push B add pop C

This alone makes the approach to code generation and optimization quite different.

WASM does lack some features that you would find in real CPUs, most notably the ability for execution to jump to arbitrary memory locations. They had to work around this to maintain the behavior of goroutines. Functions also live in their own address spaces which necessitates a change to how the program counter works.

hobofan · on March 2, 2018

Kind of.

It's more that Go _requires_ a runtime to support certain features which require Garbage Collection. The requirement of a runtime in turn dictates a different architecture to run on WebAssembly.

fwip · on March 2, 2018

My understanding is that they're treating Web Assembly as just another architecture to compile for - rather than spitting out x86_64 assembly, or ARM assembly, they're emitting Web Assembly.

nijaru · on March 3, 2018

As others have said, Web assembly is an architecture/target for C, C++, rust, etc. that you compile for, and it will be the same for go.

The link does talk about GC support. It looks like wasm will have one, but go’s will most likely perform better since it is tailored to go. (edit for clarification)

I assume a lot of the planning for this was in the GitHub and mailing list discussions dedicated to go’s wasm support. I’m on mobile but I can link those later.

swsieber · on March 3, 2018

I was under the impression that WASM does not have one.

steveklabnik · on March 3, 2018

Wasm doesn't have one yet, but it's a future extension to to spec.

You can, of course, compile your GC in today, and it will work. Wasm's integrated support will mean re-using the existing GC, which means smaller code size.

Side note: most people thought this would be a precursor to DOM support, but the new "host bindings" proposal opens up DOM support without needing GC support, and many people believe that now the former will land before the latter.

k__ · on March 3, 2018

Are GC implementations transparent for the languages?

Can Go simply run with the WASM GC or does it need some specific type of GC?

steveklabnik · on March 3, 2018

"the WASM GC" only exists as a proposal, and proposals can change before they land. I don't know enough about it to say, and even if I did, it might be invalid by the time things are actually accepted.

See https://github.com/WebAssembly/gc for the current proposal.

k__ · on March 4, 2018

Sure.

But one day, when there are implementations, can every language run with the WASM GC?

pjmlp · on March 3, 2018

No if you care about performance.

To go the extra mile in GC performance, the compiler and GC implementation have to collaborate, taking advantage of language specific features.

dragonwriter · on March 3, 2018

IIRC, WASM doesn't have one and, more importantly, has an architecture which makes implementing one on top of WASM efficiently difficult.

goalieca · on March 3, 2018

Follow-up question, why can't LLVM -> Web Assembly solve the problem?

ianburrell · on March 3, 2018

The regular Go compiler doesn't use LLVM. It is derived from Plan9 compiler toolchain. There has been work on a LLVM Go compiler. There is also Gccgo, Go frontend for GCC, that could use GCC WebAssembly backend.

earenndil · on March 3, 2018

Could use dragonegg[1] to run gccgo through llvm and then through emscripten.

1: https://dragonegg.llvm.org/

ecnahc515 · on March 3, 2018

It can, emscripten supports this, however the primary Go compiler doesn't use LLVM so this doesn't help really.

steveklabnik · on March 3, 2018

Emscripten does, but LLVM does without emscripten too.

johnhenry · on March 5, 2018

This is kind of what I was getting at... other languages seem to have a webasm pipeline that includes LLVM. I wanted to know what what the difference was between go and these languages that require one to use something other than LLVM.

pjmlp · on March 3, 2018

Besides the issue of Go main compilers not using LLVM, that would only be about the instructions.

There is no way around porting the runtime as well.

sitkack · on March 3, 2018

x86 elf -> wasm could solve this problem.

slrz · on March 3, 2018

No, it wouldn't. At that point you already lost lots of information and your only option is to faithfully reproduce the behaviour of the x86 machine.

You can still do it, of course and performance isn't even that bad. See Fabrice Bellard's x86 emulator using asm.js.

https://bellard.org/jslinux/tech.html

sitkack · on March 3, 2018

Ok, pick a better ISA, like RISC-V.

kodablah · on March 3, 2018

That would take quite the effort to build, but I bet lifting to LLVM first might work: https://github.com/trailofbits/mcsema

kodablah · on March 3, 2018

> Go’s garbage collection is fully supported. WebAssembly is planning to add its own garbage collection, but it is hard to imagine that it would yield a better performance than Go’s own GC which is specifically tailored to Go’s needs

I don't think it's hard to imagine reading the GC proposal. The JS collector that might be reused could be off thread, something WASM can't (yet) do.

> Most file system operations are mapped to Node.js’ “fs” module. In the browser, file system operations are currently not available.

Please please abstract this. As a maintainer of a non-JS WASM backend, I'd love to use Go too.

> Especially a “goto” operation in WebAssembly would be very helpful.

I didn't look into the Go use case enough, but curious how much better this would be than the current labeled block and labeled break approach in WASM. WASM has fairly strict stack/frame rules/types, so arbitrary gotos wouldn't work.

ramenmeal · on March 3, 2018

> WASM has fairly strict stack/frame rules/types, so arbitrary gotos wouldn't work.

kinda funny because golang follows the same idea.

kodablah · on March 3, 2018

So I am curious how the presence of them in WASM would help

dualogy · on March 3, 2018

Jump-to-arbitrary-address / labels-as-values isn't made available to Go devs by Go-the-language, but seems required by Go-the-runtime+compiler, such as for compiling Goroutines.

skybrian · on March 3, 2018

It sort of does via channels, though. Sending a value to a bufferless channel causes the next goroutine to run (eventually). This looks a bit like a goto if you squint.

Go needs multiple stacks and a way to switch to executing a different stack. Web assembly could provide this in a high-level way.

mlevental · on March 3, 2018

>non-JS WASM backend

...what's a non-js wasm backend?

emidoots · on March 3, 2018

WebAssembly can be used anywhere. It is not tied to JavaScript in nature. So think of it kind of like the JVM in a way (but it's not an implementation, just a spec).

tudelo · on March 3, 2018

So people are using wasm in a non web context ?

kodablah · on March 3, 2018

Yup: https://github.com/cretz/asmble

emidoots · on March 3, 2018

Yes. Not a ton of people (yet), but yes, some are.

EugeneOZ · on March 3, 2018

but why, if you have plenty of good languages to do this? Wasm is good for browsers because only choice we have is js.

rictic · on March 4, 2018

I'm ignorant of the details, but WASM is an assembly language that's designed from the ground up for running untrusted code. It will also have multiple independent open source implementations, which has some nice properties.

kodablah · on March 3, 2018

WASM can be good for other things too. It is portable and fairly easy to target. It's like asking "why emscripten of you have plenty of other languages to choose from, LLVM is good for executables"

mlevental · on March 3, 2018

but i'm asking what are some webasm languages/backends/use cases that aren't a JS interpreter?

jononor · on March 3, 2018

There is a work ongoing to enable WASM to be used for Etherum smart contracts, replacing the custom VM/bytecode.

One could use WASM for reconfigurable on-edge computing, all the way down to microcontrollers.

swsieber · on March 3, 2018

WASM for Ethereum would be amazing. There are a lot of languages with much better safety stories than the current one.

kodablah · on March 3, 2018

See http://webassembly.org/docs/non-web/. Portability. For example, I can use Rust on the JVM: https://github.com/cretz/asmble/tree/master/examples/rust-re...

amelius · on March 3, 2018

Strange that wasm doesn't support go-to instructions. Didn't they see the need for it coming? Most compiler backends need it, I suppose.

obl · on March 3, 2018

I'm assuming it's because they wanted existing js/asm.js JITs to easily accept wasm, and those only see structured control flow.

steveklabnik · on March 3, 2018

> Didn't they see the need for it coming?

Wasm exists because it's small and incremental; there's lots of stuff that's needed, but not in the initial spec. As long as it can be added in the future, that's what matters.

hajile · on March 3, 2018

It's easier to add instructions than to remove our restrict them. Hardware or software exploits to break out of the sandbox are huge concerns. I'm all for introducing new features slowly.

truncate · on March 3, 2018

So when they say, it generates a big switch statement, do they generate this big switch statement for whole program or for individual functions which takes enough context to continue at at point where it yielded?

skybrian · on March 3, 2018

Speculating, but it sounds to me like it's within individual functions. Go's program counter is "[split] into 2 parts: PC_F and PC_B. PC_F is the index of the function to be executed. PC_B is the index of the basic block to be executed."

MrBuddyCasino · on March 3, 2018

Afaik in theory stack machines are simpler to implement, but slower than register machines. In practise, the JVM is still pretty quick. Why was WASM designed as a stack machine?

comex · on March 3, 2018

What you’re saying applies to interpreters, which execute the program opcode by opcode, and (in the case of stack machines) keep an actual stack at runtime. WebAssembly was designed to be the input for (at least mildly) optimizing JIT compilers. Since the stack is required to be at a fixed depth at each instruction, when the compiler reads the instruction, it can map each of the inputs it pops to a single earlier instruction that pushed it[1], and record a reference directly to that in its internal IR. After that it completely forgets about the stack. Later on it’ll do register allocation for the native architecture, so the generated native code makes efficient use of registers.

Or in other words, the “stack machine” is basically just a compact encoding of an AST.

(Note that the native “stack” is an entirely separate thing which does exist at runtime.)

[1] Actually, there can be multiple possible instructions from, e.g., inside the ‘if’ and ‘else’ sides of a prior if-block (which can push values that stay on the stack after the end of the block). But since both sides have to push the same number of values, this is still tractable; it turns into a phi node in SSA representation.

MrBuddyCasino · on March 3, 2018

Makes sense, thanks for the detailed response!

FullyFunctional · on March 4, 2018

Interestingly the issues sounds a lot like what a high performance Haskell implementation would run into. One additional note though on GC: highly tuned PL implementation requires completely control over object layout, pointer representation, and GC. (!)

No off-the-shelf GC will ever be perfect for all languages, thus I think WASM would be far better off doing whatever it could to facilitate runtimes implementing their own.

(!) Example from the lazy world: lazy closures when evaluated and updated often results in an indirection. We rely on GC to shortcut these. Sometimes we even have GC perform trivial constant-time evaluations when we know the result is smaller than the suspended evaluation.

Example from the Lisp world: cdr-coded cons cells can cut size in half, but no generic GC would be able to do this.

api · on March 3, 2018

Why is WASM so weird? Why not just thinly model modern CPUs so it can be almost transliterated?

zenhack · on March 3, 2018

One big advantage of stack machines is that they tend to be better on code size, which very much matters when you're transferring code over the network. I expect other parts of the design make it easier to JIT; the lack of arbitrary gotos probably makes analyzing control flow easier, for example.

tbodt · on March 3, 2018

If you tried to look at the document and found that the entire internet has been screwing with the formatting, change "Suggesting" to "Viewing" in the top right. Seems like someone forgot to check permissions on the document.

LyndsySimon · on March 3, 2018

Honestly, the defacement is at least as interesting as the original document.

hashkb · on March 3, 2018

That's not the reason this was posted?

guessmyname · on March 3, 2018

Here is a direct link [1] for easy access; maybe Dang can change the link.

[1] https://docs.google.com/document/d/131vjr4DH6JFnb-blm_uRdaC0...

sctb · on March 3, 2018

Thanks! We updated the link from https://docs.google.com/document/d/131vjr4DH6JFnb-blm_uRdaC0....