Hacker News new | past | comments | ask | show | jobs | submit login
WebAssembly architecture for Go (docs.google.com)
235 points by diakritikal on March 2, 2018 | hide | past | favorite | 69 comments



The main benefit of using the Web Assembly GC is not improved performance within the language, but rather having a single GC that handles both native DOM objects and language objects. Having two GCs interoperate is a big pain, and it's very difficult to pull off without leaks or performance problems. This is especially true for DOM objects, which have no user-visible finalizers. (The closest thing is WeakMaps, but I don't believe these are enough to implement proper cross-language GC, because you can't query whether an potentially-dead object is actually dead.)

Web Assembly GC is not a performance optimization; it's necessary for correctness.


In Chrome today, as an example, the C++ based DOM and V8 (Javascript ) objects are using different GCs. Obviously it is a pain point, but the situation of having different GCs for DOM and language objects is not new with Web Assembly.


I mean, as of Oilpan the V8 garbage collector marks C++ objects by calling trace hooks, as I understand things. That isn't an option for Web Assembly, which doesn't have access to the V8 garbage collector in that way (except via the wasm GC proposal; in fact, that's fundamentally what the wasm GC proposal is).


It's also worth mentioning WASM has no GC today.


My amd64 and ARM CPUs also don't have it.

On languages where the algorithm is an implementation detail, one can make use of reference counting with a cycle collector, which are just a few hundred lines.

Implementing one is pretty simple, making it perform well is another matter.


Unlike your real processor though, you can't directly access memory or ensure its absolute location. You can make an advanced GC that is slow or a basic one that is not.


Which is good. You can always implement GC inside it, but personally, for performance-intensive computing, I do not want GC inside WASM unless eplicitly enabled.


What kind of WASM file sizes can we expect? I know some work has gone into shrinking Go executables, especially in 1.7, but will we be able to produce something like the 15KB Rust Hello World WASMs? Go has a fantastic stdlib but it hasn’t prioritized web optimized file sizes (yet).

I’m looking forward to the results of this work!


A "hello world" in Rust is ~100 bytes; the more stuff you use, the bigger it gets, as less of the stdlib can be removed. The biggie is an allocator, which adds some size, but you can use things like https://github.com/fitzgen/wee_alloc and it's less than a kilobyte...


In Google's Doubleclick for Publishers UI they use Dart apps. The page loads several "apps" compiled to JS, each of which weights more than 2 Mb (even the "changelog" app). Page loads very slowly. So I would expect something similar for Go.


This is huge. Looking forward to see what sort of front-end toolkits will pop up for this new platform (Go for the client-side web).


Worth noting, gopherjs has been around forever, and is mature:

https://github.com/gopherjs/gopherjs

...and already has a reasonable js interop story, whereas my understanding is that calling the dom api from wasm is not the simplest thing.


I'm trying to learn more about this topic and I'm curious if anyone could clarify for me -- The reason Go would need a specific architecture for WebAssembly is because Go supports features, like garbage collection, that WebAssembly does not.

Is that right? Close? An oversimplification. Way off?


Garbage collection isn't really a problem because no CPU architecture supports garbage collection- the compiler is generating code to implement the GC regardless. The issue is that WASM doesn't really resemble real CPUs at all. Basically all modern CPUs are register machines, but WASM is a stack machine.

Say you have the code C=A+B in your program.

The generated assembly for a real machine looks something like this: (R* denote registers and A-C are memory addresses) load A->R1 load B->R2 add R1+R2->R3 store R3->C

WASM looks more like this: push A push B add pop C

This alone makes the approach to code generation and optimization quite different.

WASM does lack some features that you would find in real CPUs, most notably the ability for execution to jump to arbitrary memory locations. They had to work around this to maintain the behavior of goroutines. Functions also live in their own address spaces which necessitates a change to how the program counter works.


Kind of.

It's more that Go _requires_ a runtime to support certain features which require Garbage Collection. The requirement of a runtime in turn dictates a different architecture to run on WebAssembly.


My understanding is that they're treating Web Assembly as just another architecture to compile for - rather than spitting out x86_64 assembly, or ARM assembly, they're emitting Web Assembly.


As others have said, Web assembly is an architecture/target for C, C++, rust, etc. that you compile for, and it will be the same for go.

The link does talk about GC support. It looks like wasm will have one, but go’s will most likely perform better since it is tailored to go. (edit for clarification)

I assume a lot of the planning for this was in the GitHub and mailing list discussions dedicated to go’s wasm support. I’m on mobile but I can link those later.


I was under the impression that WASM does not have one.


Wasm doesn't have one yet, but it's a future extension to to spec.

You can, of course, compile your GC in today, and it will work. Wasm's integrated support will mean re-using the existing GC, which means smaller code size.

Side note: most people thought this would be a precursor to DOM support, but the new "host bindings" proposal opens up DOM support without needing GC support, and many people believe that now the former will land before the latter.


Are GC implementations transparent for the languages?

Can Go simply run with the WASM GC or does it need some specific type of GC?


"the WASM GC" only exists as a proposal, and proposals can change before they land. I don't know enough about it to say, and even if I did, it might be invalid by the time things are actually accepted.

See https://github.com/WebAssembly/gc for the current proposal.


Sure.

But one day, when there are implementations, can every language run with the WASM GC?


No if you care about performance.

To go the extra mile in GC performance, the compiler and GC implementation have to collaborate, taking advantage of language specific features.


IIRC, WASM doesn't have one and, more importantly, has an architecture which makes implementing one on top of WASM efficiently difficult.


Follow-up question, why can't LLVM -> Web Assembly solve the problem?


The regular Go compiler doesn't use LLVM. It is derived from Plan9 compiler toolchain. There has been work on a LLVM Go compiler. There is also Gccgo, Go frontend for GCC, that could use GCC WebAssembly backend.


Could use dragonegg[1] to run gccgo through llvm and then through emscripten.

1: https://dragonegg.llvm.org/


It can, emscripten supports this, however the primary Go compiler doesn't use LLVM so this doesn't help really.


Emscripten does, but LLVM does without emscripten too.


This is kind of what I was getting at... other languages seem to have a webasm pipeline that includes LLVM. I wanted to know what what the difference was between go and these languages that require one to use something other than LLVM.


Besides the issue of Go main compilers not using LLVM, that would only be about the instructions.

There is no way around porting the runtime as well.


x86 elf -> wasm could solve this problem.


No, it wouldn't. At that point you already lost lots of information and your only option is to faithfully reproduce the behaviour of the x86 machine.

You can still do it, of course and performance isn't even that bad. See Fabrice Bellard's x86 emulator using asm.js.

https://bellard.org/jslinux/tech.html


Ok, pick a better ISA, like RISC-V.


That would take quite the effort to build, but I bet lifting to LLVM first might work: https://github.com/trailofbits/mcsema


> Go’s garbage collection is fully supported. WebAssembly is planning to add its own garbage collection, but it is hard to imagine that it would yield a better performance than Go’s own GC which is specifically tailored to Go’s needs

I don't think it's hard to imagine reading the GC proposal. The JS collector that might be reused could be off thread, something WASM can't (yet) do.

> Most file system operations are mapped to Node.js’ “fs” module. In the browser, file system operations are currently not available.

Please please abstract this. As a maintainer of a non-JS WASM backend, I'd love to use Go too.

> Especially a “goto” operation in WebAssembly would be very helpful.

I didn't look into the Go use case enough, but curious how much better this would be than the current labeled block and labeled break approach in WASM. WASM has fairly strict stack/frame rules/types, so arbitrary gotos wouldn't work.


> WASM has fairly strict stack/frame rules/types, so arbitrary gotos wouldn't work.

kinda funny because golang follows the same idea.


So I am curious how the presence of them in WASM would help


Jump-to-arbitrary-address / labels-as-values isn't made available to Go devs by Go-the-language, but seems required by Go-the-runtime+compiler, such as for compiling Goroutines.


It sort of does via channels, though. Sending a value to a bufferless channel causes the next goroutine to run (eventually). This looks a bit like a goto if you squint.

Go needs multiple stacks and a way to switch to executing a different stack. Web assembly could provide this in a high-level way.


>non-JS WASM backend

...what's a non-js wasm backend?


WebAssembly can be used anywhere. It is not tied to JavaScript in nature. So think of it kind of like the JVM in a way (but it's not an implementation, just a spec).


So people are using wasm in a non web context ?



Yes. Not a ton of people (yet), but yes, some are.


but why, if you have plenty of good languages to do this? Wasm is good for browsers because only choice we have is js.


I'm ignorant of the details, but WASM is an assembly language that's designed from the ground up for running untrusted code. It will also have multiple independent open source implementations, which has some nice properties.


WASM can be good for other things too. It is portable and fairly easy to target. It's like asking "why emscripten of you have plenty of other languages to choose from, LLVM is good for executables"


but i'm asking what are some webasm languages/backends/use cases that aren't a JS interpreter?


There is a work ongoing to enable WASM to be used for Etherum smart contracts, replacing the custom VM/bytecode.

One could use WASM for reconfigurable on-edge computing, all the way down to microcontrollers.


WASM for Ethereum would be amazing. There are a lot of languages with much better safety stories than the current one.



Strange that wasm doesn't support go-to instructions. Didn't they see the need for it coming? Most compiler backends need it, I suppose.


I'm assuming it's because they wanted existing js/asm.js JITs to easily accept wasm, and those only see structured control flow.


> Didn't they see the need for it coming?

Wasm exists because it's small and incremental; there's lots of stuff that's needed, but not in the initial spec. As long as it can be added in the future, that's what matters.


It's easier to add instructions than to remove our restrict them. Hardware or software exploits to break out of the sandbox are huge concerns. I'm all for introducing new features slowly.


So when they say, it generates a big switch statement, do they generate this big switch statement for whole program or for individual functions which takes enough context to continue at at point where it yielded?


Speculating, but it sounds to me like it's within individual functions. Go's program counter is "[split] into 2 parts: PC_F and PC_B. PC_F is the index of the function to be executed. PC_B is the index of the basic block to be executed."


Afaik in theory stack machines are simpler to implement, but slower than register machines. In practise, the JVM is still pretty quick. Why was WASM designed as a stack machine?


What you’re saying applies to interpreters, which execute the program opcode by opcode, and (in the case of stack machines) keep an actual stack at runtime. WebAssembly was designed to be the input for (at least mildly) optimizing JIT compilers. Since the stack is required to be at a fixed depth at each instruction, when the compiler reads the instruction, it can map each of the inputs it pops to a single earlier instruction that pushed it[1], and record a reference directly to that in its internal IR. After that it completely forgets about the stack. Later on it’ll do register allocation for the native architecture, so the generated native code makes efficient use of registers.

Or in other words, the “stack machine” is basically just a compact encoding of an AST.

(Note that the native “stack” is an entirely separate thing which does exist at runtime.)

[1] Actually, there can be multiple possible instructions from, e.g., inside the ‘if’ and ‘else’ sides of a prior if-block (which can push values that stay on the stack after the end of the block). But since both sides have to push the same number of values, this is still tractable; it turns into a phi node in SSA representation.


Makes sense, thanks for the detailed response!


Interestingly the issues sounds a lot like what a high performance Haskell implementation would run into. One additional note though on GC: highly tuned PL implementation requires completely control over object layout, pointer representation, and GC. (!)

No off-the-shelf GC will ever be perfect for all languages, thus I think WASM would be far better off doing whatever it could to facilitate runtimes implementing their own.

(!) Example from the lazy world: lazy closures when evaluated and updated often results in an indirection. We rely on GC to shortcut these. Sometimes we even have GC perform trivial constant-time evaluations when we know the result is smaller than the suspended evaluation.

Example from the Lisp world: cdr-coded cons cells can cut size in half, but no generic GC would be able to do this.


Why is WASM so weird? Why not just thinly model modern CPUs so it can be almost transliterated?


One big advantage of stack machines is that they tend to be better on code size, which very much matters when you're transferring code over the network. I expect other parts of the design make it easier to JIT; the lack of arbitrary gotos probably makes analyzing control flow easier, for example.


If you tried to look at the document and found that the entire internet has been screwing with the formatting, change "Suggesting" to "Viewing" in the top right. Seems like someone forgot to check permissions on the document.


Honestly, the defacement is at least as interesting as the original document.


That's not the reason this was posted?


Here is a direct link [1] for easy access; maybe Dang can change the link.

[1] https://docs.google.com/document/d/131vjr4DH6JFnb-blm_uRdaC0...





Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: