> As described above, it is important to “warm-up” JavaScript when benchmarking, giving V8 a chance to optimize it. If you don’t do that, you may very well end up measuring a mixture of the performance characteristics of interpreted JS and optimized machine code.
Since unwarmed first execution is a very common use case on the web, and a very intentional improvement for WASM, it seems foolish to discard that when comparing. I understand for benchmark determinism warming is important in long-running systems, but when parsing/warming is a common part of every run, it deserves to be factored in. I can make WASM look better by removing intentional benefits from JS before comparing too.
You even included in the quote: It’s important to warm-up the code is so you don’t measure a mixture of performance characteristics between interpreted JS and compiled jS. How long the warmup takes is _incredibly_ device dependent, so instead I measured Ignition and SparkPlug independently so you get a feel for the speedup. I did not discard that at all in the comparison.
My mistake if you are actually including the warm up times in the benchmarks, I misunderstood. Arguably, single-execution-from-scratch JS vs WASM benchmarks could have even more value than repeated, warmed-up benchmarks as the former simulates a common web use case. IMO you _should_ very well end up measuring a mixture of the performance characteristics of interpreted JS and optimized machine code if that represents common use.
* You present TurboFan performance as the speed of JavaScript. e.g. your tables present JavaScript with Ignition as ~40x slower than JavaScript in certain test runs.
It's a fantastic analysis, and I am a bit surprised by some of the results. Thanks for doing the work and putting together a great resource.
kodablah's point is a valid one generally. A parallel post notes that a developer can't avoid the warm-up time, but they can by using WASM.
As an aside, does v8 cache the optimized native code to any degree? I know there is code caching in the major browsers to presumably avoid reparsing Javascript, but if I had a theoretical page with say an image blur function, would each visit/load go through the same analysis/optimization process, going from slow to fast?
I imagine such caching is slowly being phased out because it can be used to create 'super-cookies'. That is, you can fingerprint a user by detecting whether certain bits of javascript are or aren't cached. (Detection of being cached is just a matter of measuring execution time).
It _is_ cached, but the wasm binary itself as well as the optimized version to improve startup times. The cache however is per origin. So no other origin can make use of the cache which prevents the fingerprinting aspect.
As a programmer, there's a lot you can do to make your JS run faster under optimization (e.g. avoid deopt) but there's little you can do about the warm up (besides reducing binary size).
So, when you're trying to progressively improve performance of specific code (e.g. boost FPS) the warm up time is better ignored; it's not under your control.
Reducing size or switching to WASM are both things that you can do.
You should measure the cases you actually care about. For a long running app, start-up time is probably not the most important thing, for other apps its very important.
If I'm trying to improve boot time of my node app, then I must benchmark the cold behavior, (the interpreter). If I'm judging template engine performance, I probably want to do that with a hot VM, because that's going to mostly be steady state performance I'm looking at. The major exception to that may be if I have an exceedingly aggressive caching strategy, where most pages are generated shortly after a deployment (evict all the old pages with whatever the new code generates as output).
If you include "warmup" time in JS, then you must also include compile time with WASM (EDIT: to be clear, time to pre-compile wasm from generic bytecode into binary optimized for the particular architecture -- NOT time to compile whatever language into WASM in the first place).
If you're running a given piece of code only once, the interpreted code is almost guaranteed to be MUCH faster than compiling and then executing a large pile of code.
That makes little sense (assuming you mean by "compilation" the compilation to WASM from a higher-level language). The entire point is (or rather should be) to measure runtime impact on the user. If compilation (e.g. parsing, JIT, etc) is part of runtime, you measure that. I wouldn't expect a JS benchmark to measure optimizer/minifier time either.
WASM code is byte code that (after being downloaded) is pre-compiled into (presumed) safe native code to be executed. It is designed to be cross-platform and as such, isn't particularly close to any given architecture. This means that there is still room (and necessity) to optimize for a particular architecture when compiling.
Let's say you have some trivial task. You write and compile to wasm. You also write in JS. It'll be a couple kilobytes for JS and probably a couple hundred kilobytes for wasm with all your dependencies to get a usable environment.
Once downloaded, the JS and wasm bytecode must now be parsed. The JS interpreter starts parsing and running. Meanwhile the wasm code is pre-compiling into native code so it can start execution.
The tiny bit of JS takes 10ms to run in the interpreter. The larger WASM file takes 20ms to parse and 0.1ms to run. Which was faster?
That depends on how long the software runs. WASM only makes sense at the inflection point where the execution lasts long enough to counteract the parse time. This shouldn't come as a surprise. Compile lag is one reason why interpreted languages saw real-world use in the first place.
It's telling that v8 used to NOT have an interpreter, but then added one. Everything was first converted to byte code then run. This resulted in slower startup and slower overall performance. Now, the browser starts interpreting while also converting to bytecode in the background before switching over execution. Functions don't actually need a couple hundred runs in order for the optimizer to know what types they receive. Most functions could be optimized with a very high degree of success after only a couple executions. This isn't done because the time cost of optimizing would outweigh the benefit on code that isn't executed frequently.
> Once downloaded, the JS and wasm bytecode must now be parsed. The JS interpreter starts parsing and running. Meanwhile the wasm code is pre-compiling into native code so it can start execution.
I thought wasm's format was specifically designed so that parsing and compiling could be performed in a streaming fashion, so that you don't have to wait for the download to finish.
JS can already be parsed as it is streamed (I think this has been the case since around Chrome v40 and earlier for Firefox). The JS binary AST proposal could make that even more efficient.
My example assumed you were running the code locally. If it's coming over the wire, larger WASM binaries will suffer an additional penalty over the network because download speed is much slower than parse speed.
Still, you can archive significant speed-ups with WebAssembly at some use cases.
For example, I have a hash function library (https://github.com/Daninet/hash-wasm) where I was able to archive 14x speedup at SHA-1 and 5x speedup at MD5 compared to the best JS implementations.
That's exactly the kind of thing I think WASM is good at - small, computationally expensive libraries that are easy to just plug in.
I'm more of a web developer and every time I think "hmm, could I use this to build a webapp?", but quickly shrug it off because it would create a big headache and the JS execution is rarely the bottleneck (and if it is, it's likely developer error and inefficiencies than the language / interpreter).
It's very similar to the Python/C distinction. Python will often drop into C for the use-cases you're describing. However, unlike WASM, Python/C is the wild west:
- The whole CPython interpreter is the "C-extension interface" which means that the CPython interpreter can hardly change or be optimized or else it will break something in the ecosystem (and for the same compatibility reason it's virtually impossible for alternative optimized interpreters to make headway), and because the interpreter is so poorly optimized the ecosystem depends on C extensions for performance. WASM presumably won't have this distinction.
- Without the abysmal build ecosystem that C and C++ projects tend to bring with them, building and deploying WASM applications will likely be pleasant and easy after a few years. Of course, if your WASM is generated from C/C++ then that's a real bummer, but fortunately this should be a much smaller fraction of the ecosystem than it is with C/Python.
Network roundtrips are unavoidable, but WASM could be used to parse a server response and generate custom HTML to use in replacing some portion of the DOM. It would likely be a lot faster than trying to do the same in pure JS, and it would obviate the use of over-complicated hacks like virtual DOM and the like.
No, parsing the response is usually way too fast to make a difference. Generating an HTML string is also usually pretty fast. The slowness happens when you ask the browser to parse that HTML string and generate the appropriate DOM, WASM is not going to get you out of that.
> The slowness happens when you ask the browser to parse that HTML string and generate the appropriate DOM
If you do it right, that step only has to happen once for each user interaction. You can entirely dispense with the need to do multiple edits to the DOM via pure JS.
It’s not “at the moment” but “continuously from the creation of the virtual DOM concept” - often slower by multiple orders of magnitude.
The misrepresentation of a virtual DOM as a performance improvement came from two things: people who were comparing virtual DOM code to sloppy unoptimized code which was regenerating the DOM on every change and React fans not wanting to believe their new favorite was a regression in any way (not to be confused with the actual React team who certainly knew how to do real benchmarks and were quite open about limitations).
There’s a line of argument that the extra overhead is worth it if the average developer writes more efficient code than they did with other approaches but I think that’s leaving a lot of room for alternatives which don’t have that much inefficiency baked into the design.
I think there’s a bit more nuance to it. React (and other vdom implementations) try do be as efficient as possible when diffing / reconciling with the DOM. Sometimes this can result in improved performance but there are also use cases where you’ll want to provide it with hints (keys, when to be lazy, etc.). https://reactjs.org/docs/reconciliation.html
Above all I would pragmatically argue (subjectively) that the main advantage is enabling a more functional style of programs w/ terrific state management (like Elm). This can lead to fewer errors, easier debugging, and often better performance with less effort.
> I think there’s a bit more nuance to it. React (and other vdom implementations) try do be as efficient as possible when diffing / reconciling with the DOM. Sometimes this can result in improved performance but there are also use cases where you’ll want to provide it with hints (keys, when to be lazy, etc.). https://reactjs.org/docs/reconciliation.html
The key part is remembering that every one of those techniques can be done in normal DOM as well. This is just rediscovering Amdahl's law: there is no way for <virtual DOM> + <real DOM> to be smaller than <real DOM> in the general case. React has improved since the time I found a 5 order of magnitude performance disadvantage (yes, after using keys) but the virtual DOM will always add a substantial amount of overhead to run all of that extra code and the memory footprint is similarly non-trivial.
The better argument to make is your last one, namely that React improves your average code quality and makes it easier for you to focus on the algorithmic improvements which are probably more significant in many applications and could be harder depending on the style. For example, maybe on a large application you found that you were thrashing the DOM because different components were triggering update/measure/update/measure cycles forcing recalculation and switching to React was easier than using fastdom-style techniques to avoid that. Or simply that while it's easy to beat React's performance you found that your team saw enough additional bugs managing things like DOM references that the developer productivity was worth a modest performance impact. Those are all reasonable conclusions but it's important not to forget that there is a tradeoff being made and periodically assess whether you still agree with it.
I agree. I am curious though about how substantial the memory and diffing costs are. I don’t mean that in an I doubt it’s a big deal way, rather I’m genuinely curious and haven’t been able to find any literature on the actual overhead compared to straight up DOM manipulation. I would imagine batching updates to be an advantage of the vdom but only if it’s still that much lighter weight (seeing as you can ignore a ton of stuff from the DOM).
> I would imagine batching updates to be an advantage of the vdom but only if it’s still that much lighter weight (seeing as you can ignore a ton of stuff from the DOM).
There are two separate issues here: one is how well you can avoid updating things which didn't change — for example, at one point I had a big table showing progress for a number of asynchronous operations (hashing + chunked uploads) and the approach I used was saving the appropriate td element in scope so the JavaScript was just doing elem.innerText = x, which is faster than anything which involves regenerating the DOM or updating any other property which the update didn't affect.
The other is how well you can order updates — the DOM doesn't have a batch update concept but what is really critical is not interleaving updates with DOM calls which require it to calculate the layout (e.g. measuring the width or height of an element which depends on what you just updated). You don't necessarily need to batch the updates together logically as long as those reads happen after the updates are completed. A virtual DOM can make that easy but there are other options for queuing them and perhaps doing something like tossing updates into a queue which something like requestAnimationFrame triggers.
So you could probably describe vdom as a smart queue. How smart it is depends on the diffing and how it pushes those changes. Abstracting this from the developer. Bound to be less efficient than an expert (like an expert writing assembly vs C) but just like any other abstraction having both pros and cons.
The question is whether the abstraction is worth the potential savings in complexity (which maybe is not the case, but I sure do love coding in Elm).
Also whether there are other abstractions which might help you work in a way which has different performance characteristics. For example, I've used re:dom (https://redom.js.org/) on projects in the past, LitElement/lit-html are fairly visible, and I know there are at least a couple JSX-without-vdom libraries as well.
There isn't a right answer here: it's always going to be a balance of the kind of work you do, the size and comfort zones of your team, and your user community.
Very interesting thanks for pointing out re:dom. I took a look at their benchmarks and some vdom implementations compare very well to re:dom. I was pleased to see elm’s performance. So it seems like it can be done well when you want it.
https://rawgit.com/krausest/js-framework-benchmark/master/we...
Forcing the browser to continually parse HTML and generate a new DOM tree, recalculate layout, etc. shouldn't be faster than updating specific nodes than need changes.
Absolutely, it's been kind of incredible progress. But it's still going to be a bottleneck more often than JS execution (in my experience at least).
Not always; I have definitely run into applications where parsing large amounts of data in code is a bottleneck, especially when building large charts. But often.
My general worry is that the performance gains from using some WASM will just get eaten up by the overhead of jump between JS and WASM and having to copy/convert data. You might be able reduce the problem by porting more stuff from the JS side to the WASM side, but then you risks pulling in huge chunks of your app.
JS/WASM calls are fast in V8, and still seem to be improved from time to time (e.g. see: https://v8.dev/blog/v8-release-90#webassembly), not sure about any large data optimizations (TBH I'm not sure what this is about though, because usually one would use JS slices into the WASM heap to avoid redundant copying)
That works if the data is already in the Wasm linear memory and you need to access it from JS. If you have strings (or whatever) in JS, you need to copy them into the linear memory for the Wasm module to use.
And there might be other benefits besides performance. I'd like to use WASM to be able to reuse server side code in languages like Rust or Go in the client, so you don't have to re-implement algorithms and tricky processing code in javascript.
I experimented with this some weeks ago and it is certainly possible.
I had a PoC where my server runs Rust, exposes a JSON Rest API using serde to serialize my Rust structs to JSON. On the web Client I compiled Rust to wasm and used the Reqwest crate (http client that uses Fetch in wasm) to talk to my server, Rust structs are shared between server and client.
For me, the beauty about Rust in this setup, is that cross compiling/crossplatform is builtin into the tooling (Cargo). For example the Reqwest crate compiles down to use the browser Fetch api when running in Wasm, and the same crate on the server uses a native implementation using openssl (or rusttls).
I did something making a game. The game logic runs server side however in order to hide latency the clients also run a WASM copy locally. Then once the server processes their moves they check that everything was in-sync and if not reload with the server state.
(In practice the validation is probably not necessary but doesn't hurt to have).
Yeah, WebAssembly have i64/u64 types as first class citizens unlike JavaScript which should emulate it or use BigInt which drastically slower than native 64-bit types. That's why crypto algorithms got a lot of speed benefits. AssemblyScript also show this. See this:
There was a project zero article on HN recently [1] that said that bound check eliminations were removed from v8 because they allowed attackers to easily turn a type confusion into a memory read-write primitive:
> As a result, last year the V8 team issued a hardening patch designed to prevent attackers from abusing bounds check elimination. Instead of removing the checks, the compiler started marking them as “aborting”
But this post, which also appears to be written by someone from google with access to v8 developers, states that :
> You never go out of bounds. This means TurboFan does not need to emit bounds checks [...]
Does someone here know more about bound checks eliminations in TurboFan ? Are they removed in some cases but not others ?
This ensures aborting bounds checks to some parts of the the pipeline, but this doesn't mean that later optimizations can't determine that the check is dead code and remove it. For example: https://doar-e.github.io/blog/2019/05/09/circumventing-chrom...
~lol, imagine not doubling the capacity of a dynamic array upon allocation.~
In all seriousness, this is a great read, and I was mildly surprised JS was about the same speed as Wasm once TurboFan kicks in. As a compiler engineer, it's nice step back and appreciate the myriad of runtime-based optimizations that can be done with modern JIT compilers.
That being said, assembly script doesn't perform and high-level optimizations upon compilation, so I wonder how fast it will be once fully matured.
It’s a trade-off for simplicity. It’s a small team and they are still working towards feature completeness. Deferring optimization to Binaryen is the easy way out at the cost of not having high-level optimizations. If they finish their IR, that will most likely change.
I don't see this brought up very often, but you have a huge world of flexibility in choosing growth rates beyond just adding constants for an O(n) amortized append time or multiplying by a constant for O(1).
E.g., if you choose x -> x(1+1/log(x)) then you get an amortized append time of O(log(n)) while paying a memory overhead approaching 0% for large datasets.
The distribution of (and SLOs for) appends relative to other operations can make that kind of idea more or less attractive, but even common data structures have a lot of room for improvement if you can tailor them to your use case a little bit.
Wasn’t there some consideration that with 2x you could never reuse previous contiguous allocations but at 1.4 (ish?) it was an option and improved fragmentation in some cases?
Of course it depends on the behaviour and binning (or lack thereof) of your allocator.
> it can be mathematically proven that a growth factor of 2 is rigorously the worst possible because it never allows the vector to reuse any of its previously-allocated memory.
> [...]
> choosing 1.5 as the factor allows memory reuse after 4 reallocations; 1.45 allows memory reuse after 3 reallocations; and 1.3 allows reuse after only 2 reallocations
I'm not sure I understand the benefit of this. With a growth factor < 2, you have a chance of getting back chunks of memory that were previously used. That doesn't affect fragmentation / cache hits since all your data is always in the current chunk. What am I missing?
Depends on whether you are talking about tuning the growth factor for a single instance, or whether you are talking about tuning the default growth factor in general.
Eg Python has put a lot of thought into their dynamic array (and dict) growth factors.
Honestly, the fact that AssemblyScript's Array implementation does not double the internal capacity but instead adds just one more slot when reallocating makes me worry about the quality of the language as a whole.
Doubling the allocation is a "hack" that is helpful when reallocations are common and thus is helpful for languages that extend arrays very often (typical of dynamic languages with GC) and where memory is cheap and plentiful.
One of the prime features of assembly language is that the person (compiler) that is generating it expects tight control over what it does. A 2*X allocation when you ask for X is unexpected.
Imagine if, when you went to the ATM and withdrew $100, the bank actually withdrew $200 from your account and held back the extra $100 so that, the next time you went to the bank and withdrew $20 it would take it out of the "held back" amount rather than doing another withdraw. I would be very unhappy with that algorithm.
ISO C++ places no such requirement on std::vector, each implementation is free to choose their own implementation provided it matches the O() notation requirements.
C++ is not like Rust where the implementation dictates the semantics.
In order for append-to-back to have O(1) amortised running time, the capacity needs to be multiplied by some constant >1. Any constant would do just fine in terms of complexity, but 2 is the obvious simple choice, being the first integer greater than 1.
If the capacity is only increased by some constant each time, rather than multiplied, this leads to O(n^2) running time for a sequence of n append-to-back operations, surely something to be avoided.
For `push` to extend capacity by just 1 is an absolutely insane default.
There is no sensible usage for a method that does that. It turns `for(let i = 0; i < n; i++) { arr.push(x); }` from linear into quadratic.
If automatic resizing exists, then it should do it in a sensible way. Otherwise it's just a footgun that you should leave out of the language like C does.
>Abstract: Self’s debugging system provides complete source-level debugging (expected behavior) with globally optimized code. It shields the debugger from optimizations performed by the compiler by dynamically deoptimizing code on demand. Deoptimization only affects the procedure activations that are actively being debugged; all other code runs at full speed. Deoptimization requires the compiler to supply debugging information at discrete interrupt points; the compiler can still perform extensive optimizations between interrupt points without affecting debuggability. At the same time, the inability to interrupt between interrupt points is invisible to the user. Our debugging system also handles programming changes during debugging. Again, the system provides expected behavior: it is possible to change a running program and immediately observe the effects of the change. Dynamic deoptimization transforms old compiled code (which may contain inlined copies of the old version of the changed procedure) into new versions reflecting the current source-level state. To the best of our knowledge, Self is the first practical system providing full expected behavior with globally optimized code.
>Proceedings of the ACM SIGPLAN ‘92 Conference on Programming Language Design and Implementation, pp. 32-43, San Francisco, June, 1992.
What about performance for SpiderMonkey and JavaScriptCore? I’m a little disappointed that everything in the article from a V8-only perspective. Is it too late to want a future where V8+Blink aren’t de facto?
WASM binary size is highly dependent on optimization level. If your WASM is anywhere near the size of your JS, something is off with your compiler settings. Since Rust and C++ are statically typed, compilers can use LTO to remove almost every unreachable instruction. JS, even with tree shaking, gets nowhere near this.
Default settings in Emscipten generate huge binaries. Rust without the "native" target or a bunch of configuration also does.
WASM bytecode is also more compact than JS. There's simply no reason the binaries should be even close to the same size.
The memory usage should also be different by and order of magnitude. WASM has basically no overhead above machine sizes for most types. Look at Benchmarks Game, memory use JS vs Rust. You should be seeing a difference close to that.
I can't speak for speed. But I can say I did a head-to-head comparison between the fastest pure JS PNG encoder I could find vs a C encoder transpiled to WASM, and the transpiled encoder was >10X faster.
It's hard to say if something is off in the benchmark or compilation but I find it hard to believe there's such a big difference between my tests and yours. Emcripten especially is not exactly easy to use and maybe a good place to look for size and speed optimization
> WASM bytecode is also more compact than JS. There's simply no reason the binaries should be even close to the same size.
The author does explain this pretty well. For an exact 1:1 comparison, yes WASM beats JS for size. JS comes with built in functionality (e.g. a garbage collector) that doesn't cost any size, but in the WASM case needs to be brought along taking up space. Even if you don't want GC, you don't get any WASM 'standard library'.
I really think WebAssembly missed the target. What we really needed was a language doing away with the dynamic nature of JavaScript, while adding 64 bit integers and generic SIMD instructions, and keeping high level features like strings and automatic memory management.
Instead we got the most bare-bone language imaginable, making everyone have to reinvent the wheel for everything. That is not actually fast. If WebAssembly had stuff like basic string manipulation, browsers could easily map that directly to efficient implementations. But with the current rules everything has to be provided as basic instructions that must be compiled under much stricter security rules.
WebAssembly is not assembly, it is an intermediate representation shared between two compilers, and it is actually pretty bad at that.
Eh, maybe that's what you need, but not what we need ;)
WASM is an exceptionally good standard compared to most other things on the web platform.
The thing about WASM performance is that WASM is fast (unless you do stupid things), but what's surprising to most people is that Javascript isn't slow either (at an absurdly high engineering and complexity cost compared to WASM though).
What is the point of WASM if JavaScript is just as fast? After reading the article I'm not left with the impression that getting good performance out of WASM is particularly easy either. WASM has most of the performance footguns from C, requires including a lot of details and gunk, like memory layout and an allocator, that the browser compiler would probably be better off doing on its own.
WASM is a much better compilation target than Javascript (and at least as important: it frees Javascript from being a compilation target, instead JS can focus on being a programming language written by humans again).
I'd argue that the main point of WASM is not the performance gain, but that it opens up a fairly straightforward path to use different languages on the web (e.g. it was possible to upstream a WASM backend into LLVM, but if the Emscripten team would have tried to upstream an asm.js backend into LLVM, they'd be laughed out of the room I'm sure - and asm.js also wasn't fast without special handling by the Javascript engine either).
Also don't forget that the above blog post is mostly about "WASM isn't as fast as it should be when using AssemblyScript", which is more of a problem to solve for AssemblyScript than WASM, because when used from C it's fairly easy to get "near-native" performance.
PS: all the disadvantages you're listing (like the linear memory layout) are actually massive advantages (for instance when trying to optimize cache misses) ;)
WASM can be faster than JS but you need a language that doesn't shoehorn a GC into the compiled binary. I'm not really sure what the author expected here. Modern managed runtimes usually give you the benefit of bump allocation in the nursery for free with a generational GC and the runtime has a lot more room to optimize the GC phase. None of this is possible without a native GC for webassembly.
This isn't WebAssembly being slow the benchmarks just show the overhead of the GC. If you are writing a computationally expensive algorithm in Rust or C++ wasm can be a lot faster but its hard to get close to native performance(i.e. running on bare metal x86/arm).
Our webassembly prototype is about 3-4x faster in raw computation(and that is targeting webassembly exclusively) but has a much higher overhead when interacting with the DOM. That is basically the limiting factor especially on mobile.
> I want to be very clear: Any generalized, quantitative take-away from this article would be ill-advised.
I don't think WASM and JS are "just as fast"--the author only did a couple of microbenchmarks. There are almost certainly many cases in which WASM would outperform JS, but they probably aren't going to be tight loops over an array or similar.
One nice thing about wasm is you can (sometimes) bring apps to the web very quickly. I got an emulator written in C fully ported to wasm in about an hour. To be fair this particular app hit all of the sweet spots of Emscripten, and you won't get that lucky with most apps.
Part of the reason it's bare bones is because they started with an MVP, and are continuing to add features through a community/standards process. Garbage Collection, reference types, and Interface Types are popular proposals being worked on that I think would address some of your issues. More here: https://github.com/WebAssembly/proposals
I mean, it's a virtual machine like the JVM. It's just that the code is already generally run through an optimizer, while Java bytecode typically isn't.
Yeah, that is a way to look at it. My point isn't what we call it exactly, my point is that it is a hack job that is fairly mediocre at doing what it is supposed to do.
The compiler in the browser is going to run its own optimization anyway, preoptimizing the input isn't necessarily going to help.
Having read the spec and proposals recently when looking at web assembly as a cross-platform bytecode, I have to disagree, it seems very well designed to me. They started with an MVP, and are continuously working on and adding proposals to extend it with more features, including some of the ones you want, I think. Why do you think it's a hack job?
Additionally, I think optimized webassembly should normally be a big benefit, helping both startup time and optimizations the engine might miss (also helping the engine focus on other optimizations / making simpler engines performant).
edit: Indeed, optimization before webassembly makes a big difference in the article's benchmarks, as you can see with how C++/Rust was faster than the hand-crafted AssemblyScript, theoretically because C++/Rust is going through LLVM optimizations.
I work on an application that could benefit from web assembly but the biggest hurdle for me is that it's kind of complex.
The web used to be quite simple but now it sometimes feels like you have to have a very large team in order to do anything. Sure, I could probably do web assembly but that would take time from other things that also is important.
I get that it is very useful for larger teams making larger applications like Figma. But for small teams it feels like the tooling isn't there yet to do anything useful in the timespan that I need to.
That being said, assemblyscript seems very interesting and I probably should look into it.
It's not that complex, anything in computers can seem complex if you're not familiar with it. I would suggest that you're just not familiar with it, and that's not a knock against you, it's just something that you need to study like anything else. It's not true that you need a large team to do anything on the web these days, the same technology works today that has worked for the last two decades, but if you want to leverage the benefits of new technology you need to invest the time to understand how it works, that's just the nature of technological advancement.
Parent mentioned the lack of tooling, and that is indeed where most of the cost of using WebAssembly lies.
You don’t really need to learn the assembly language itself since you’ll probably just be calling emcc.
However, you may need to build code to marshall more than just ints and strings to the JS code. Even after you do, you’ll run into the classical issues of keeping track of object references across a GC & non-GC system.
You may need debugging and find that in-browser debuggers for WASM are primitive/non-existent. You may need to figure out how unmangle stack traces —- including mixed JS/WASM traces. Third-party tools like Sentry for error reporting may not have built support (they sort of recently have and is very under-documented).
All solvable problems, but it’s a lot of time spent not building the product. There are plenty of good uses cases but it’s usually not the ones based on the false premise that native is somehow always better than interpreted.
It looks complex until you sit down and do it and force yourself to understand what you are doing. Most things are like that.
The way I structure learning things like this in my team is by organizing spikes. We commit to diving into some technology with the goal of finding out how feasible it is for us to use it and a secondary goal of maybe getting something useful going. However the primary goal is finding out if it can work and if so exactly how. If it works out it becomes a regular thing we work on and integrate. This usually starts with studying what is there, what the risks and benefits are, etc.
At some point you reach the point where the only way to learn more is simply doing it. You can analyze something to death without fully understanding it. Just sitting down and doing it becomes the logical next step. The payoff is usually non linear: you gain more if it works than you lose if it doesn't. This is one of those things that you might suspect is valuable like that. So your job is finding that out in an efficient way.
In your team, I would task one or two of your people with spending a max of 2 days to validate that they can take a simple bit of typescript, convert it to assembly script, compile it and hook it up. Chances are pretty good that they'll have working code at the end of those two days. Worst case you lose two days. Best case you figure out it's easy and just works and you move forward.
Assuming that WebAssembly is for performance is an invalid assumption. The reason for WebAssembly is to provide a runtime environment for AOT compiled languages such as C++, Rust. The performance of programs written on those languages and compiled to the Webasm target may or may not exceed the performance of a program of the same functionality written in JS and executing in the same V8. There's no reason to expect performance to be radically different just because it's Wasm.
The headline is a bit missleading, as the main article is about AssemblyScript, a typescript subset compiling to WebAssembler. So it seems, quite some of the mentioned problems come from the immaturity (or design problems?) of AssemblyScript and not necessarily from wasm.
That seems to mean every bit of your title is nonsense. You aren't benchmarking webasm at its best and 'magic pixie dust' is already nonsense on its own.
As anecdata, I've also found that using typed arrays does not speed up (actually slows down, by 10% or so) code which uses integer arrays. Making sure that arrays stay in packed-small-integer format (which is not always obvious: for example one has to replace x => -x with x => 0-x to avoid the old IEEE -0 from kicking in) consistently outperforms typed arrays for me, by a large margin if allocating many small arrays, and by a small margin if allocating one large array.
I found similarly "meh" performance improvements when trying to port my Javascript code to webassembly: either modern Javascript implementations are absolutely amazing, or webassembly runtimes still have a ways to go.
This is most likely because allocating typed arrays is really, really slow in JavaScript. IIRC it has to do with the fact that there is a lot of flexibility regarding backing buffers or something; each typed array object has about 200 bytes of overhead compared to a dozen bytes for a plain array or object (roughly).
Typed arrays are mainly faster in scenarios where you allocate one or a few large TypedArrays once and then re-use them a lot.
That also explains your experience with many small arrays. That's basically the worst way to use TypedArrays.
Yes, I did think it wasn't a good way to used typed arrays, but I was still surprised that when allocating only a very small number of large arrays, just plain Javascript arrays outperformed typed arrays.
A good trick to convince JS engines that you are dealing with integers is to use the "`|0` operator":
let a = 1;
let b = 2;
let c = a+b|0; // c is guaranteed to be a 32 bit signed integer
This not only ensures correct 32 bit integer semantics (like wrapping around), but also helps the engines to use actual integer instructions in the generated machine code.
For unsigned 32 bit integers, there is `>>>0`, and
for multiplication, there is Math.imul().
One of my colleagues told me to stop doing that because in (I think) V8 these values are immediately converted back to a double nowadays. So annotating all code with |0 doesn't really add speed benefits there, just extra conversions between doubles and integers. Said colleague used to maintain human-asmjs so I trust he knows what he's talking about.
>> This not only ensures correct 32 bit integer semantics (like wrapping around), but also helps the engines to use actual integer instructions in the generated machine code.
But there is only one type of number in javascript. Everything is just a double(BigInt aside). You get a 32-bit integer because the bitwise operator casts the result to one. c has the exact same semantics as any other js number.
Yeah there are tricks to convince the engine you are using an integral type but those unless you are doing a lot of benchmarks they aren't really useful. Any compilation tier can choose to use any intermediate representation it wants.
> But there is only one type of number in javascript. Everything is just a double(BigInt aside).
There is nothing stopping JS engines from trying to infer when a number is only an integer and optimize for that. In fact, that's what Small Integer (SMI) optimizations are all about[0].
It's just that |0 isn't really able to guarantee that our number value is the type of SMI that V8 optimizes for (since the V8 SMIs are 31 bits, and bitmasking operations only guarantee 32 bit integers)
One important thing to note that this post kind of hints at: JavaScript can be optimized more than equivalent WebAssembly if you give the JS runtime enough help, because it can use runtime-only information to produce better-optimized JS. It can exploit type information gathered during runtime to devirtualize method calls and produce type-specialized code, while also doing things like escape analysis to eliminate some allocations entirely. You have to carefully identify places in your native code where you can do unchecked array accesses, etc, but the JS runtime just figures it out for you.
For any of those optimizations to happen for your WASM code, the compiler has to be able to do it statically and that can be much harder. Devirtualization in particular is essential for Java or C# to run fast and some C++ codebases also benefit tremendously from it. If you're interacting a lot with JS APIs from WASM (like issuing network requests or creating DOM elements, etc) you're going to be dealing with lots of dynamically typed data, and in those scenarios handwritten JS may actually be faster than WASM because the runtime can JIT optimal code with the right type specializations.
Note that these optimizations will fail if you aren't careful about how you write your JS: If a given function f(x,y) is passed values of different types during execution, it probably won't be fully optimized. If you have two functions f1(x,y) and f2(x,y) and ensure that each one is only passed values of a certain type, they will both be heavily optimized (iirc the JS runtime terminology for these functions is 'monomorphic') Naturally, this means uses of Function.apply and Function.call should be avoided at all costs.
I have seen much the same made about Java versus C++ for the past 25 years that the Java byte code JIT would have more information and thus be able to optimize and do stuff like devirtualization better.
However, that has not planned out.
There are a few reasons for this. First C++ and Rust optimizers can do amazing things when they are given time. In addition, I think devirtualization is not as big a deal in C++ and Rust because in general you avoid writing code that uses virtual functions when you are writing performance sensitive code and instead use things like templates/generics where there is no indirect function calls.
Other than 3D AAA game engines, all the C++ software that I replaced with either Java or .NET solutions has kept the customers happy and lowered the TCO of their products.
This wasn't tiny CLI that occasionally lands on HN, rather large scale desktop applications or distributed computing clusters.
Winning micro-benchmarks is not everything, which is why except for Windows with WinUI (which still remains to be seen if it can move windevs away from Forms/WPF in its current incomplete state), all OS vendors are migrating to other languages for their App development SDKs, leaving C++ and Rust only for low level OS components.
I don't think C++ and Rust are for just low-level. I've build a lot of GUI apps and distributed ones with C++
QT is a beast.
The issue with Java is the reverse engineering. Back in 90's, The main selling point for Java was to prevent it, because at that time the bytecode was hard to understand at least. Now tools has grown and it's fairly easy to reverse engineer Java, even if one obfuscate it.
As for C++, Inline code and template code make it pain in ass.
I'm sure big companies that care about intellectual property would use C++ over Java anytime. C++ also has mature obfuscation tools that make it even more difficult.
Java has it's place. It's great language if you use in server side or isolated env (From commercial viewpoint).
Nevertheless, I've built many web apps using C++ too.
They were and are written in all sorts of C++ flavours, including past C++11.
Reverse engineering is never an issue with Java if one actually uses the right tooling, commercial AOT compilers exisst since around 2000, it is a matter of buying them.
Google where I works at is largely a C++ shop despite Android Java.
AOT compilers exists for Java, Net, Javascript. I however doubt the user experience of those.
For example, GraalVM mentions the following,
"There is a small portion of Java features are not susceptible to ahead-of-time compilation, and will therefore miss out on the performance advantages. To be able to build a highly optimized native executable, GraalVM runs an aggressive static analysis that requires a closed-world assumption, which means that all classes and all bytecodes that are reachable at run time must be known at build time. Therefore, it is not possible to load new data that have not been available during ahead-of-time compilation."
Not everyone is Google, and since you work there you are surely aware of tooling like Ghidra and IDA.
GraalVM is not what I would pick for AOT Java projects, there are other products since 2000.
In any case, this isn't a comparisasion of language bullet points.
Just because a software product has been migrated from C++ into Java, .NET or whatever language, it doesn't mean it is a sacrilege to keep some native lib around, which is exactly where all mainstream OSes are going, with C++ being left for the bottom layers.
How many desktop GUIs is Google shipping written in pure C++?
1 - Qt is not an OS SDK. Apparently you missed that part of my comment.
2 - Qt has been migrating away from pure C++, again you also missed pure from my comment, modern Qt applications are written in Qt Quick, a JavaScript dialect, with underlying components written in C++.
C++ Widgets have hardly changed since Qt 4, other than being updated to the underlying Qt infrastructure.
Again, read my comment, very carefully, then you migth get it.
Afterwards go learn how to create a GUI for macOS, iOS, Android, ChromeOS, Windows, WebOS (as shipped by LG) or even Fuchsia, using only their SDKs and nothing else.
As for Qt not moving away from pure C++, go use it with pure C++ on iOS, Android and embedded devices.
> Afterwards go learn how to create a GUI for macOS, iOS, Android, ChromeOS, Windows, WebOS (as shipped by LG) or even Fuchsia, using only their SDKs and nothing else.
Why use native SDK when there is QT?
The same can be said for Java. Why use Swing when there is native SDK?
Many devs are too religious arguing for home team and don't embrace polyglot programming.
Just because a product is mainly written in managed language X, doesn't mean some library can't be written in something else.
C and C++ devs have forgotten the days when their beloved programs in 8 and 16 bit home computers, were a pile of inline Assembly if performance was to be anywhere of an acceptable level.
Embrace the safety and productivity of higher level languages (with AOT and JIT compilers), and let a couple of native libs be the "inline Assembly" if and only if, a profiler proves it is actually required instead of choosing a better data structure or algorithm.
Your argument basically boils down to "If you write fast C++ it will be fast", which is true. But a significant fraction of code out there is not fast C++ written by experts to be fast.
This is different than "Java will be faster than C++ because of HotSpot" arguments, because java is competing with C++. This is not a competition between JS and native C++, it's a competition between JS and WASM.
A for loop is not too interesting application -- it is not what Java optimizes for, and chances are you didn't benchmark it correctly.
To optimize your program in a low level language you have to basically have a whole plan for the architecture of your program beforehand, and every major change to that will break your optimizations. Also, don't forget about non-standard object life cycles, which is really common. Complex C++ programs basically employ their own GCs, which will be inferior to any one included in the JVM.
Of course low-level programs have their place (plenty of), eg. audio processing, embedded, million other, but the average business/CRUD app will be faster* both to execute and to produce in Java, as well as better maintainable.
* With enough time a competent team could of course write a faster version of it in C++, but it's not a good use of their time, and you would be surprised how hard it, especially with ever-changing requirements.
A language either cares about low level details or not. You can’t have it both ways. And c++ is absolutely a low level language.
> I don't know any complex C++ program that employ their own GCs when C++ has RAII which is superior to GC.
RAII is not at all a replacement for GC. It is only suitable for a subset of object lifetimes. There are plenty of cases where you can’t really pinpoint a scope-exit where this given object should be reclaimed.
A GC is a necessity in many concurrent algorithms that simply could not be written without.
> Just give a try for C++11/14/17
I have and I like it. There are domains where I would not even start writing Java, and vice versa with C++.
Your CRUD app may have been a breeze but what if the requirement has changed now touching on a core of your program. You have to refactor and it will be really expensive, compared to a high level language. Every memory allocation/deallocation have to be thought out again and tested (and while rust can warn about it, you still have to write a major refactor as it is another low level lang)
Being multi-paradigm is a different axis all around. Low-level (which is by the way not a well-defined concept, C is actually also high level, only assembly is low, but that usage is not that useful) means that low level details leak into your high level description of code, making the two coupled. You can’t make them invisible.
Also, as an example, think of Qt. A widget’s lifetime is absolutely not scope-based, nor is it living throughout the whole program. You have to explicitly destruct it somewhere. And there are plenty of other examples.
And as I said, I’m familiar with RAII, it’s really great when the given object is scope-based, but can’t do anything otherwise.
> C++ is a OOP language just like Java. You do it same way as you do in Java. Use inheritance.
And if the new subclass has some non-standard object life cycle you HAVE to handle that case somewhere else, modifying another aspect of the code. It is not invisible, unless you want leaking code/memory corruption.
> And if the new subclass has some non-standard object life cycle you HAVE to handle that case somewhere else, modifying another aspect of the code. It is not invisible, unless you want leaking code/memory corruption.
The main problems with Java aren't being JITted, it's that it's not expressive enough. It doesn't have SIMD (yet) or value types (yet…?).
I would expect a JIT to not really be able to find a lot of magic optimization opportunities, though maybe there are some, and it'd actually be annoying if it could. The most important thing in a tool like that is predictability, because you can't make development decisions based on magic.
That may be part of it, but I imagine the JVM's safety obligations are also a significant factor. If the JIT can't elide array bounds checks, checks must be performed at runtime. Runtime type checks might be needed. Runtime arithmetic checks might also be needed. The JVM is also more constraining regarding concurrency gone awry, than the C/C++ memory model. [0] More broadly, the JVM's lack of undefined behaviour constrains the optimiser in ways the C/C++ approach does not (although I'm open to the idea that it's overstated how much of a performance win is owed to C and C++ having many kinds of undefined behaviour).
And of course there's the GC and Java's high object-churn, even where lifetimes are known statically. To my knowledge, escape analysis (the relevant family of JIT optimisations) still hasn't really addressed this.
The JIT can elide array bound checks really often, and most "low hanging" optimizations are solved quite cleverly (it's way out of scope for my knowledge, but I remember reading that null checks are elided by trapping segfaults? Does it make sense?).
There is no over/underflow checks so I don't know what you mean by arithmetic checks -- in pure number crunching the JVM is insanely fast.
And you are right in that many Java libs/programs are quite happy to create garbage, though with generational GCs it is really cheap. Escape analysis is great, but primitive classes in Project Valhalla will solve this last problem of object locality.
Sounds right. No need to generate instructions to perform the check if you can rely on a hardware trap, by means of signal-handling cleverness.
> There is no over/underflow checks so I don't know what you mean by arithmetic checks -- in pure number crunching the JVM is insanely fast.
Integer multiplication, addition, and subtraction, are all defined in Java to have wrapping behaviour, and are easily implemented. Whatever the input values, there's no way those operations can fail. (Incidentally, this is a terrible way of handling overflow. This turned up recently in discussion. [0]) Division is trickier. In Java, integer division by zero results in an exception being thrown. Apparently JVMs can implement this with signal-handling cleverness similar to dereferencing null references. [1] Two's complement integer division has another edge case, which is undefined behaviour in C/C++ but which, iirc, results in an exception in Java: INT_MIN / -1. I believe the JIT has to emit instructions to check for this, as it's not possible to leverage signal-handling there.
I don't know how well modern Java performs in floating-point arithmetic. Here's an old tirade about it [2] and discussion. [3]
> with generational GCs it is really cheap.
At the risk of going off topic: doesn't Java tend to perform somewhere around 60% the speed of C/C++, while using considerably more memory? Perhaps the GC isn't to blame, but clearly the blame belongs somewhere. It's like the way advocates of Electron will insist that modern HTML rendering engines are fast and efficient, the DOM is fast and efficient, and JavaScript is fast and efficient... and yet here we are, with Electron-based applications reliably taking several times the computational resources of competing solutions using conventional toolkits.
> primitive classes in Project Valhalla will solve this last problem of object locality
Interesting, sounds like the kind of ambitious initiative that will require deep changes to the JVM.
> At the risk of going off topic: doesn't Java tend to perform somewhere around 60% the speed of C/C++, while using considerably more memory?
It is hard to properly benchmark this generally, for small programs it is “at most” within 2-3X, but I believe for more complex applications it closes the gap quite well (many things can be “dynamically” inlined even between classes far from each other). Not sure how it fares with PGOs.
And yeah it does use more memory, both the runtime/JIT/GC and each object has considerable overhead, but I don’t think that comparing it to Electron is apt. Electron is slow because it adds additional steps to the picture, not because of the JS engine itself. V8 is similarly an engineering gem, and it can be stupidly fast from time to time.
As for the GC:
The GC itself is required for some program to work correctly. C/C++ codebases often create their own GC, and that will surely be slower than any of the multiple GCs found in the JVM. But for short-living programs the GC doesn’t even run (similarly to how some short lived C program leaves clean up to the OS), so rather the former is responsible for the bigger memory usage.
All in all, where ultimate control over memory/execution is not required (that is, you don’t need a low level language), Java is fast enough, especially combined with it being productive and easy (and safe) to refactor, as well as having top notch profiling tools (with so low overhead, that it can be run in production as well).
Optimizations like 'these two function arguments are always int31' in v8 or spidermonkey are 100% predictable at this point and result in all your type checks and boxing being eliminated, and with the known types it also becomes much cheaper/faster to create object instances (since now if you store those values into properties of an object, that object's shape is fully known). Various properties like this can extend out into larger parts of your JS application.
There's still a lot of magic you can't rely on, but you'd be surprised how much you CAN rely on. Asm.js was built on this observation: If you write your JS following some basic rules it's actually pretty easy to land on predictable, well-optimized paths. Of course, one of WASM's advantages is that by design you're almost always on those paths and don't have to worry.
> The most important thing in a tool like that is predictability, because you can't make development decisions based on magic.
Fortunately you've got the best profiling tools available, so you don't have to guess. And also you get to see the relative importance of the function you try to optimize, whether that actually is the bottleneck (and actually people often guess wrongly where the bottleneck is)
It surely has had support for AVX for several releases, although via the autovectorization support, and explicit SIMD has been made available as preview on Java 16.
Autovectorization is the kind of magic you can't rely on. It sort of works on a single platform but you will always run into cases it doesn't handle even if you own your own team of autovectorization engineers who tell you it's perfect.
At the other hand, the explicit Vector API will use the correct "flavor" of SIMD instructions on the platform and will gracefully fall back to non-simd version if it is not supported. And as far as I know, the SIMD story is quite bad with C.
It's pretty good in C with assembly, inline or not. SIMD usually involves a lot of aliasing violations and intrinsics have weird hard to read names, so I find assembly easier to deal with than C here.
Compiling TypeScript to JavaScript is essentially just removing the type annotations. Compiling to a bytecode VM would be orders of magnitude more work, especially since TS is defined to have exactly the same runtime semantics as JavaScript.
TS classes are not JS classes, TS has it's own implementation of async/await, etc. Just check any compiled TS code and you'll see it. It's very frustrating when you want to quickly patch a bug in 3rd-party library.
TS can target older versions of JS without modern features, just like Babel. If you target a more recent latest release of ES, the emitted JS should be pretty much the same as the source TS, just without type annotations.
Idk man. webassembly.org — the site authored by the inventors — literally starts with “WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine.”
It is a VM. Most VMs use High-Level Intermediate Representation (HIR or bytecode). WebAssembly uses a Low-Level Intermediate Representation. (LIR) A LIR is not an assembly.
AssemblyScript initially targeted TS->WASM compilation, by only supporting a strict subset of the TS language. But at some point they dropped that idea and defined their own TS-like language. I don't know the reason for this, but my guess is that TS is too dynamic to just directly compile it to WASM?
It can indirectly by simply calling the javascript APIs via bindings. That works well enough and is also how you can use things like webgl, openal and other browser APIs.
But they are also working on more efficient bindings.
This unfortunately introduces a lot of overhead and doesn't scale well for larger applications. WebGL calls are already incredibly slow compared to native, and the trampolining between WASM and JS world adds on top.
When WASM got released (around 2017), there was already that discussion to allow direct bindings without a JS roundtrip, but AFAIK there is still no actual implementation for this in any browser.
WebGL is slower than native mainly because of the additional security-validations compared to a native GL driver, e.g. it cannot simply forward calls into the underlying 3D-API but instead WebGL needs to take everything apart, look at each single piece to make sure it's correct, reassamble everything and then call the underlying 3D-API (roughly speaking).
Another problem is that WebGL relies on garbage collected Javascript objects, but this problem can't really be solved on the WASM side, even with the "anyref" proposal (this would just allow to remove the mapping layer between integer ids and Javascript objects that's currently needed).
Doesn't seem to stop MS with Blazor (.Net), Rust, and a few others from doing this. Also, there are plenty of games running in web assembly using bindings for things like WebGL and openal via similar bindings. As far as I know the current situation is pretty workable already and getting better. E.g. garbage collection is coming pretty soon.
I guess it depends on what you are doing. For most people doing web assembly, the point is avoiding dealing with/minimizing the need for interacting with javascript. But still, it seems there are some nice virtual dom options for Rust: https://github.com/fitzgen/dodrio that are allededly fast and performant (not a Rust programmer myself).
Can anyone provide any pointer or update on this. I remember reading it is coming for the past 3 years and never heard anything. Google Search doesn't show any useful results.
Just you wait until Lars Bak [0] gets hired by some company to make a fast WebAssembly runtime. Until that happens I won't take any performance comparisons of WASM vs. X seriously :).
So AssemblyScript can beat JavaScript if you benchmark every function and then optimize them by hand every time it is slower?
So most (all?) of the code posted which looked like a straight port to AssemblyScript was slower than JavaScript before optimizing it? I don‘t know how you feel, but i personally don‘t want to optimize every function to get the promised speed :(
If your app is doing most of the work it needs to do in 1ms, but one path takes 200ms, then clearly you only need to optimize things on the slow path. You don't have to optimize everything to get a huge perf improvement.
For me, the showstopper regarding WebAssembly is that browsers do not support a textual version that I can just throw in where I want to hand optimize a function.
If I could just replace my slowest Javascript function with handcrafted WebAssembly code, that would be great.
But having to dabble with external compilers and splitting my code into multiple files is too much of a burden.
Shouldn't be too hard to make a library that would allow this. Would you be interested in that ? So let's say something like the following example, would you use it?
Obviously this would take more than a bit of time to start up (seconds), but the idea is of course that you don't do this once you deploy to production, and replace by inline webassembly.
You can pretty easily just ship a 1kb .wasm module and load it and export a function from it to call from JS. Of course, then all your data needs to live in wasm-accessible memory, and you can't use strings or objects anymore...
Looking at the C++ code, it seems like you could use std::push_heap/pop_heap to implement your binary heap. The code would be simpler, and there is a chance it could be faster since a lot of the standard library algorithms are very heavily optimized.
In my work, I have come to conclusion that it seldom pays off to go "native" when working with Node.js. More often than not, rewriting some computationally heavy code in C and sticking it as a native module yielded marginally better results when compared with properly optimized js code. Though, that doesn't negate other advantages of using said technologies: predictable performance from the start and re-using existing code base.
Totally unrelated to the content (which was really great), I found it interesting that he shared his benchmarking setup as a _private_ gist (https://gist.github.com/surma/40e632f57a1aec4439be6fa7db95bc...) which is actually more like an opaque repository with multiple files.
It has forks, revisions, probably some tooling built around (git -> gist) but it's not indexed and can only be found by finding the link somewhere (in most cases).
Is this a more wide-spread recent pattern? Wondering what's the desired outcome in how it compares to just a public repo.
JavaScript is fast. The browser is fast. But communication between the browser and javascript is really really slow.
They are written in languages with incompatible memory models, so lots of data must be copied when communication. They are running in different runtimes, so your javascript JIt can not inline function calls into the DOM.
That's why to this day, if you want to render a bunch of html from javascript, it is faster to generate a giant strin g of markup and pass that to the browser in a single 'innerHTML = "foo"' and let the browser parse all that, than it is to call a bunch of "createElement(); setAttribute(); appendChild();" calls.
WASM is theoretically better for CPU intensive workload. As the article states even for CPU intensive workloads there are still quite a few limitations to take into account. What it is good for ATM is IMHO mostly just reusing existing C, rust, whatever (choose your WASM supported language here) code. Practically most web applications are not slow because of CPU bottlenecks but more because of too much communication, large code size etc. WASM at its current state does not seem to have an good answer yet for the code size issue.
That was a really informative read. I think myself and many others figured the biggest issue with WASM right now is purely the inconvenient development flow, and if you are willing to put up with it you'd just automatically get better performance. But there seems to be much more to it than that.
I hope WASM can continue to grow in both of those areas, cause I still like the idea but it's clearly still an immature technology.
That’s the whole point. V8 is _really_ good at taking any form of JS code and making it fast, without me having to apply optimizations. The other languages only started being competitive once I hand-optimized them.
A bit off topic but what is the status of the hypothetical norm that would allow to do DOM/Web APIs call natively from WASM without JS wrappers ? Because afaik it is one of the biggest blocker regarding WASM application (not compute-heavy libs) performance.
Is there a good comparison between different WASM languages / compilers? I imagine that, since we are in the early days, performance between compilers could vary significantly. Compare to V8, which has had thousands of man-hours from high level engineers.
JavaScript performance is completely vendor specific. There's no reference implementation for a JavaScript VM.
So, it's more useful to talk about, let's say, v8 performance. Much of v8 performance is also version specific, and not all of it is documented.
Code running on v8 can be blazing fast, or it can be slow. It depends on whether the code can be "optimized" (JIT compiled) and kept that way (because code can also be "deoptimized", meaning, the jitted version gets thrown away).
With JavaScript, if you want to ensure your code always runs fast as it possibly can, you have to become acquainted with the rules behind optimization and deoptimization, and start tracing optimizations and deoptimizations to make sure that your code gets optimized, and stays optimized. This process can be time consuming and can make your JavaScript look non-idiomatic and less readable.
In the other hand, WebAssembly performance is easier to reason about with respect to what's described above.
Mayabe I'm a nitpicker by default, maybe it was so trivial that the article's author didn't thought of including it in methodology BUT before you test your speed of your port (regardless from what language to what other one) you test that end results are exactly the same.
I see no mention of his port of blur functionality where he tested to see the results of original JavaScript blurring algorithm be the same of the porting one. And believe me, image manipulation can bite you in the proverbial rear at edge cases the best, I've been there. What I want is to see also testing included in the article, not just bench-marking his own solution. Try testing at least the classic 256 cases, that's RGB(x,x,x) (examples: RGB(0,0,0)-black...RGB(127, 127, 127)-gray...RGB(255, 255, 255)-white) then a few thousand random images. Only after that test you can safely move to benchmark for the speed.
Try making realtime audio and video filters in JS vs WASM. There are some domains were there is a very real difference, and it is enough to put things in the realm of the viable.
I.e. most WASM proposals must be accepted before we can use the DOM from WASM... by the pace it's been moving forward, I guess this will take at least several years (3-4+).
Webassembly can access anything you give it access to, and only that.
That’s great from a trust perspective because it means you can use a binary blob with confidence that it can’t do anything it isn't explicitly allowed to.
There’s no reason to specify access to the DOM for webassembly since you can grant that access from JS.
The hard bit is making it fast; ideally you could call between WASM and browser code with zero trampolines, but going via JS means you need two.
> There’s no reason to specify access to the DOM for webassembly since you can grant that access from JS.
There are reasons. If I could write web apps in a different language without having to use any JS, I would. It would be wonderful to be able to pick whatever language you want, compile it, and then deploy it. Having a JS bridge just seems like a clunky workaround that you have to live with.
> That’s great from a trust perspective because it means you can use a binary blob with confidence that it can’t do anything it isn't explicitly allowed to.
JavaScript is already sandboxed and can access only what you allow it to. Why have a sandbox within a sandbox?
You can bridge through JavaScript. It's not a big deal in practice. WASM is still very immature. You do not want to build your whole app in WASM if you want to keep a full head of hair.
It does limit a lot of use-cases from being viable in WASM.
Anything that needs to do a lot of DOM access will probably see a big performance hit if you rewrite it in WASM because there will be too much overhead from crossing the JS-WASM border.
It definitely won't be able to break the WASM sandbox, so my guess is that AssemblyScript itself adds runtime checks of its own and that this is a hint that we don't need those.
Imagine an HN-like page is written in this and uses unchecked in some code path for displaying user comments.
I comment something that exploits that code. Then when you come along and view my comment. Now whatever my exploit does is running with your permissions instead of mine.
No, the exploit would be in the comment text. You're exploiting a bug in the comment display code when it displays your comment to another user to do something that the comment display code isn't supposed to do.
Much like exploiting a C program that handles untrusted text and doesn't bounds check it. You aren't supposed to be able to run any code at all, but a vulnerability lets you make the program do something it isn't supposed to.
Some of the easiest exploits would be prevented since you probably can't overwrite code like you can (or used to be able to, a lot of platforms have added protection against this) in C, but some exploits are still possible just by overwriting other variables with values that the program doesn't expect.
No, that's never allowed - that's the strength of Wasm. Any unchecked helpers are for language-level semantic blocks within Wasm memory itself, not for leaving the sandbox. So worst case you might override and corrupt your own data.
WASM lacks garbage collection and is statically typed. You'd have to write a whole Python interpreter, so I'd expect the end result would be slower than the official Python interpreter.
I imagine it would make more sense to compile Python to JavaScript and leverage the optimising JavaScript JIT engines, but no doubt an efficient transpiler would be a significant undertaking. The Transcrypt project [0] does something like this but I don't think it emphasises performance.
Someone has to come in and make it supported. Adopting nlvm isn't going to magically make it easier.
The reason it's easier in Rust is because someone that's passionate about WASM came along and did the work up front to document everything and make it as simple to compile to WASM as possible, the same is certainly possible in Nim, just need someone to put in some work :)
I really do like Nim, but suggesting it to do WebAssembly out of the box could be deceiving and do no good. The project is currently in the hard spot where it's hard to convince your team of using it, but because its hard they is less contribution.
As for llvm, you might be right, I'm not an expert in that particular field. But it seems there is a lot of already backed in stuff you'll have for free. Like Wasm, debugging, optimization. We'll see how the Crystal project manage to get Wasm and how llvm will help them.
Seems to me that the flaw with WebAssembly is that it tries to fix the symptom, not the cause. As such, it's doomed to failure.
Just stop festooning a site with pop-ups, trackers, geolocation, subscriptions and all the countless other crap that's increasingly being shoved into web pages. The problem will go away by itself.
> Decide for yourself, whether this article is worth your time.
Did you even read the article? It goes into a deep dive on AssemblyScript and WASM including discussions on Rust’s std::vec and Go’s slices, bump allocators, TurboFan/Sparkplug/Ignition/Liftoff benchmarks, and -O3 versus -O3s flags and contains links to a pull request to try to help with one of the noticed performance issues.
This is a well-written dive into the technology, writing it off because you don't like the definition of "strongly typed" in this sentence is a bit premature.
WebAssembly IS strongly typed, and that property DOES help it to generate machine code right away. Heck, even its if statements and loops have types associated with them, and those types can help compilation from a stack machine to a register machine. The article even mentions a concrete reason later down, that deopts because a type passed to a function changed can't happen in webassembly like it can in JIT compiled JS.
And indeed, this article does seem to be worth my time.
I think the author just meant their scripting language is strongly typed, but they says wasm because that’s the only target for that language.
I wasn’t confused by that statement when reading the article. Obviously wasm isn’t strongly typed but obviously that’s not literally what the author meant.
The article was interesting and they clearly spent a substantial amount of time creating it.
Is this maybe something with academic definition of strongly typed?
Because the way I understand Webassembly, it surely is strongly typed.
edit:
"Strongly typed is a concept used to refer to a programming language that enforces strict restrictions on intermixing of values with differing data types."
either you don't understand type systems, or you don't understand WebAssembly. in either case, I can encourage reading the article, because it sheds some light on both topics!
Since unwarmed first execution is a very common use case on the web, and a very intentional improvement for WASM, it seems foolish to discard that when comparing. I understand for benchmark determinism warming is important in long-running systems, but when parsing/warming is a common part of every run, it deserves to be factored in. I can make WASM look better by removing intentional benefits from JS before comparing too.