Is WebAssembly magic performance pixie dust?

kodablah · on April 14, 2021

> As described above, it is important to “warm-up” JavaScript when benchmarking, giving V8 a chance to optimize it. If you don’t do that, you may very well end up measuring a mixture of the performance characteristics of interpreted JS and optimized machine code.

Since unwarmed first execution is a very common use case on the web, and a very intentional improvement for WASM, it seems foolish to discard that when comparing. I understand for benchmark determinism warming is important in long-running systems, but when parsing/warming is a common part of every run, it deserves to be factored in. I can make WASM look better by removing intentional benefits from JS before comparing too.

dassurma · on April 14, 2021

Author here!

You even included in the quote: It’s important to warm-up the code is so you don’t measure a mixture of performance characteristics between interpreted JS and compiled jS. How long the warmup takes is _incredibly_ device dependent, so instead I measured Ignition and SparkPlug independently so you get a feel for the speedup. I did not discard that at all in the comparison.

kodablah · on April 14, 2021

My mistake if you are actually including the warm up times in the benchmarks, I misunderstood. Arguably, single-execution-from-scratch JS vs WASM benchmarks could have even more value than repeated, warmed-up benchmarks as the former simulates a common web use case. IMO you _should_ very well end up measuring a mixture of the performance characteristics of interpreted JS and optimized machine code if that represents common use.

azernik · on April 14, 2021

It is indeed in the earlier benchmarks, but:

* It's not in the later ones

* You present TurboFan performance as the speed of JavaScript. e.g. your tables present JavaScript with Ignition as ~40x slower than JavaScript in certain test runs.

wnevets · on April 14, 2021

Will be this on the next HTTP 203?

defaultname · on April 14, 2021

It's a fantastic analysis, and I am a bit surprised by some of the results. Thanks for doing the work and putting together a great resource.

kodablah's point is a valid one generally. A parallel post notes that a developer can't avoid the warm-up time, but they can by using WASM.

As an aside, does v8 cache the optimized native code to any degree? I know there is code caching in the major browsers to presumably avoid reparsing Javascript, but if I had a theoretical page with say an image blur function, would each visit/load go through the same analysis/optimization process, going from slow to fast?

rocqua · on April 14, 2021

I imagine such caching is slowly being phased out because it can be used to create 'super-cookies'. That is, you can fingerprint a user by detecting whether certain bits of javascript are or aren't cached. (Detection of being cached is just a matter of measuring execution time).

dassurma · on April 14, 2021

It _is_ cached, but the wasm binary itself as well as the optimized version to improve startup times. The cache however is per origin. So no other origin can make use of the cache which prevents the fingerprinting aspect.

tantalor · on April 14, 2021

As a programmer, there's a lot you can do to make your JS run faster under optimization (e.g. avoid deopt) but there's little you can do about the warm up (besides reducing binary size).

So, when you're trying to progressively improve performance of specific code (e.g. boost FPS) the warm up time is better ignored; it's not under your control.

TheCoelacanth · on April 14, 2021

Reducing size or switching to WASM are both things that you can do.

You should measure the cases you actually care about. For a long running app, start-up time is probably not the most important thing, for other apps its very important.

throwaway894345 · on April 14, 2021

I agree if the benchmark's use case were optimizing JS; however, the question we're trying to answer is roughly "how does JS compare to WASM?".

hinkley · on April 14, 2021

If you really want answers you should do both.

If I'm trying to improve boot time of my node app, then I must benchmark the cold behavior, (the interpreter). If I'm judging template engine performance, I probably want to do that with a hot VM, because that's going to mostly be steady state performance I'm looking at. The major exception to that may be if I have an exceedingly aggressive caching strategy, where most pages are generated shortly after a deployment (evict all the old pages with whatever the new code generates as output).

hajile · on April 14, 2021

If you include "warmup" time in JS, then you must also include compile time with WASM (EDIT: to be clear, time to pre-compile wasm from generic bytecode into binary optimized for the particular architecture -- NOT time to compile whatever language into WASM in the first place).

If you're running a given piece of code only once, the interpreted code is almost guaranteed to be MUCH faster than compiling and then executing a large pile of code.

kodablah · on April 14, 2021

That makes little sense (assuming you mean by "compilation" the compilation to WASM from a higher-level language). The entire point is (or rather should be) to measure runtime impact on the user. If compilation (e.g. parsing, JIT, etc) is part of runtime, you measure that. I wouldn't expect a JS benchmark to measure optimizer/minifier time either.

hajile · on April 14, 2021

WASM code is byte code that (after being downloaded) is pre-compiled into (presumed) safe native code to be executed. It is designed to be cross-platform and as such, isn't particularly close to any given architecture. This means that there is still room (and necessity) to optimize for a particular architecture when compiling.

Let's say you have some trivial task. You write and compile to wasm. You also write in JS. It'll be a couple kilobytes for JS and probably a couple hundred kilobytes for wasm with all your dependencies to get a usable environment.

Once downloaded, the JS and wasm bytecode must now be parsed. The JS interpreter starts parsing and running. Meanwhile the wasm code is pre-compiling into native code so it can start execution.

The tiny bit of JS takes 10ms to run in the interpreter. The larger WASM file takes 20ms to parse and 0.1ms to run. Which was faster?

That depends on how long the software runs. WASM only makes sense at the inflection point where the execution lasts long enough to counteract the parse time. This shouldn't come as a surprise. Compile lag is one reason why interpreted languages saw real-world use in the first place.

It's telling that v8 used to NOT have an interpreter, but then added one. Everything was first converted to byte code then run. This resulted in slower startup and slower overall performance. Now, the browser starts interpreting while also converting to bytecode in the background before switching over execution. Functions don't actually need a couple hundred runs in order for the optimizer to know what types they receive. Most functions could be optimized with a very high degree of success after only a couple executions. This isn't done because the time cost of optimizing would outweigh the benefit on code that isn't executed frequently.

staticassertion · on April 14, 2021

> Once downloaded, the JS and wasm bytecode must now be parsed. The JS interpreter starts parsing and running. Meanwhile the wasm code is pre-compiling into native code so it can start execution.

I thought wasm's format was specifically designed so that parsing and compiling could be performed in a streaming fashion, so that you don't have to wait for the download to finish.

hajile · on April 14, 2021

JS can already be parsed as it is streamed (I think this has been the case since around Chrome v40 and earlier for Firefox). The JS binary AST proposal could make that even more efficient.

My example assumed you were running the code locally. If it's coming over the wire, larger WASM binaries will suffer an additional penalty over the network because download speed is much slower than parse speed.

https://hacks.mozilla.org/2017/02/what-makes-webassembly-fas...

https://hacks.mozilla.org/2018/01/making-webassembly-even-fa...

daninet · on April 14, 2021

Still, you can archive significant speed-ups with WebAssembly at some use cases.

For example, I have a hash function library (https://github.com/Daninet/hash-wasm) where I was able to archive 14x speedup at SHA-1 and 5x speedup at MD5 compared to the best JS implementations.

You can run the benchmarks on your computer here: https://daninet.github.io/hash-wasm-benchmark/

Cthulhu_ · on April 14, 2021

That's exactly the kind of thing I think WASM is good at - small, computationally expensive libraries that are easy to just plug in.

I'm more of a web developer and every time I think "hmm, could I use this to build a webapp?", but quickly shrug it off because it would create a big headache and the JS execution is rarely the bottleneck (and if it is, it's likely developer error and inefficiencies than the language / interpreter).

throwaway894345 · on April 14, 2021

It's very similar to the Python/C distinction. Python will often drop into C for the use-cases you're describing. However, unlike WASM, Python/C is the wild west:

- The whole CPython interpreter is the "C-extension interface" which means that the CPython interpreter can hardly change or be optimized or else it will break something in the ecosystem (and for the same compatibility reason it's virtually impossible for alternative optimized interpreters to make headway), and because the interpreter is so poorly optimized the ecosystem depends on C extensions for performance. WASM presumably won't have this distinction.

- Without the abysmal build ecosystem that C and C++ projects tend to bring with them, building and deploying WASM applications will likely be pleasant and easy after a few years. Of course, if your WASM is generated from C/C++ then that's a real bummer, but fortunately this should be a much smaller fraction of the ecosystem than it is with C/Python.

NohatCoder · on April 14, 2021

Yeah, and most of the time if "JavaScript is slow" it is because of DOM manipulation or network latency, WASM can't even do those things.

zozbot234 · on April 14, 2021

Network roundtrips are unavoidable, but WASM could be used to parse a server response and generate custom HTML to use in replacing some portion of the DOM. It would likely be a lot faster than trying to do the same in pure JS, and it would obviate the use of over-complicated hacks like virtual DOM and the like.

NohatCoder · on April 14, 2021

No, parsing the response is usually way too fast to make a difference. Generating an HTML string is also usually pretty fast. The slowness happens when you ask the browser to parse that HTML string and generate the appropriate DOM, WASM is not going to get you out of that.

zozbot234 · on April 14, 2021

> The slowness happens when you ask the browser to parse that HTML string and generate the appropriate DOM

If you do it right, that step only has to happen once for each user interaction. You can entirely dispense with the need to do multiple edits to the DOM via pure JS.

baybal2 · on April 14, 2021

Multiple edits on the HTML are by far not anywhere near as performance devastating as they were a decade ago.

At the moment, the "virtual DOM" approach is actually going against performance optimization.

JS frameworks like react, vue, angualr etc effectively replicates a big portion of browser's internal logic for nothing.

acdha · on April 14, 2021

It’s not “at the moment” but “continuously from the creation of the virtual DOM concept” - often slower by multiple orders of magnitude.

The misrepresentation of a virtual DOM as a performance improvement came from two things: people who were comparing virtual DOM code to sloppy unoptimized code which was regenerating the DOM on every change and React fans not wanting to believe their new favorite was a regression in any way (not to be confused with the actual React team who certainly knew how to do real benchmarks and were quite open about limitations).

There’s a line of argument that the extra overhead is worth it if the average developer writes more efficient code than they did with other approaches but I think that’s leaving a lot of room for alternatives which don’t have that much inefficiency baked into the design.

rapind · on April 14, 2021

I think there’s a bit more nuance to it. React (and other vdom implementations) try do be as efficient as possible when diffing / reconciling with the DOM. Sometimes this can result in improved performance but there are also use cases where you’ll want to provide it with hints (keys, when to be lazy, etc.). https://reactjs.org/docs/reconciliation.html

Above all I would pragmatically argue (subjectively) that the main advantage is enabling a more functional style of programs w/ terrific state management (like Elm). This can lead to fewer errors, easier debugging, and often better performance with less effort.

acdha · on April 14, 2021

> I think there’s a bit more nuance to it. React (and other vdom implementations) try do be as efficient as possible when diffing / reconciling with the DOM. Sometimes this can result in improved performance but there are also use cases where you’ll want to provide it with hints (keys, when to be lazy, etc.). https://reactjs.org/docs/reconciliation.html

The key part is remembering that every one of those techniques can be done in normal DOM as well. This is just rediscovering Amdahl's law: there is no way for <virtual DOM> + <real DOM> to be smaller than <real DOM> in the general case. React has improved since the time I found a 5 order of magnitude performance disadvantage (yes, after using keys) but the virtual DOM will always add a substantial amount of overhead to run all of that extra code and the memory footprint is similarly non-trivial.

The better argument to make is your last one, namely that React improves your average code quality and makes it easier for you to focus on the algorithmic improvements which are probably more significant in many applications and could be harder depending on the style. For example, maybe on a large application you found that you were thrashing the DOM because different components were triggering update/measure/update/measure cycles forcing recalculation and switching to React was easier than using fastdom-style techniques to avoid that. Or simply that while it's easy to beat React's performance you found that your team saw enough additional bugs managing things like DOM references that the developer productivity was worth a modest performance impact. Those are all reasonable conclusions but it's important not to forget that there is a tradeoff being made and periodically assess whether you still agree with it.

rapind · on April 14, 2021

I agree. I am curious though about how substantial the memory and diffing costs are. I don’t mean that in an I doubt it’s a big deal way, rather I’m genuinely curious and haven’t been able to find any literature on the actual overhead compared to straight up DOM manipulation. I would imagine batching updates to be an advantage of the vdom but only if it’s still that much lighter weight (seeing as you can ignore a ton of stuff from the DOM).

acdha · on April 14, 2021

> I would imagine batching updates to be an advantage of the vdom but only if it’s still that much lighter weight (seeing as you can ignore a ton of stuff from the DOM).

There are two separate issues here: one is how well you can avoid updating things which didn't change — for example, at one point I had a big table showing progress for a number of asynchronous operations (hashing + chunked uploads) and the approach I used was saving the appropriate td element in scope so the JavaScript was just doing elem.innerText = x, which is faster than anything which involves regenerating the DOM or updating any other property which the update didn't affect.

The other is how well you can order updates — the DOM doesn't have a batch update concept but what is really critical is not interleaving updates with DOM calls which require it to calculate the layout (e.g. measuring the width or height of an element which depends on what you just updated). You don't necessarily need to batch the updates together logically as long as those reads happen after the updates are completed. A virtual DOM can make that easy but there are other options for queuing them and perhaps doing something like tossing updates into a queue which something like requestAnimationFrame triggers.

rapind · on April 15, 2021

So you could probably describe vdom as a smart queue. How smart it is depends on the diffing and how it pushes those changes. Abstracting this from the developer. Bound to be less efficient than an expert (like an expert writing assembly vs C) but just like any other abstraction having both pros and cons.

The question is whether the abstraction is worth the potential savings in complexity (which maybe is not the case, but I sure do love coding in Elm).

acdha · on April 15, 2021

Also whether there are other abstractions which might help you work in a way which has different performance characteristics. For example, I've used re:dom (https://redom.js.org/) on projects in the past, LitElement/lit-html are fairly visible, and I know there are at least a couple JSX-without-vdom libraries as well.

There isn't a right answer here: it's always going to be a balance of the kind of work you do, the size and comfort zones of your team, and your user community.

rapind · on April 15, 2021

Very interesting thanks for pointing out re:dom. I took a look at their benchmarks and some vdom implementations compare very well to re:dom. I was pleased to see elm’s performance. So it seems like it can be done well when you want it. https://rawgit.com/krausest/js-framework-benchmark/master/we...

abacadaba · on April 14, 2021

Na, the slowness comes from asking the browser to do that 1000's of times in a loop every click :)

scsilver · on April 14, 2021

Frankly that just seems more difficult and handles an issue I havent run into in 5 years that couldn't be solved with js performance optimizations.

Does WASM really make sense for something that isnt constantly doing high performance calculations? Do I gain anything from using it in most SPA's?

underwater · on April 14, 2021

Forcing the browser to continually parse HTML and generate a new DOM tree, recalculate layout, etc. shouldn't be faster than updating specific nodes than need changes.

dahfizz · on April 14, 2021

The first roundtrip is unavoidable. Making another handful of roundtrips every time the user scrolls the page is definitely avoidable.

miohtama · on April 14, 2021

What would it take to make DOM manipulation faster?

onion2k · on April 14, 2021

DOM manipulation

Browser vendors have done a lot of work on that over the past decade or so. It's nowhere near as slow as it was in the early days.

danShumway · on April 14, 2021

Absolutely, it's been kind of incredible progress. But it's still going to be a bottleneck more often than JS execution (in my experience at least).

Not always; I have definitely run into applications where parsing large amounts of data in code is a bottleneck, especially when building large charts. But often.

__s · on April 14, 2021

Where in my case "small, computationally expensive library" is a card game engine & its AI search

Black101 · on April 14, 2021

I think WASM is also good at hiding the source code? which is the main reason why I don't like it...

sime2009 · on April 14, 2021

My general worry is that the performance gains from using some WASM will just get eaten up by the overhead of jump between JS and WASM and having to copy/convert data. You might be able reduce the problem by porting more stuff from the JS side to the WASM side, but then you risks pulling in huge chunks of your app.

elenchev · on April 14, 2021

JS <--> WASM function calls are not an issue[1], passing large amounts of data is though.

1. https://hacks.mozilla.org/2018/10/calls-between-javascript-a...

sime2009 · on April 14, 2021

Does anyone know if that is also the case on V8?

flohofwoe · on April 14, 2021

JS/WASM calls are fast in V8, and still seem to be improved from time to time (e.g. see: https://v8.dev/blog/v8-release-90#webassembly), not sure about any large data optimizations (TBH I'm not sure what this is about though, because usually one would use JS slices into the WASM heap to avoid redundant copying)

cdcarter · on April 14, 2021

That works if the data is already in the Wasm linear memory and you need to access it from JS. If you have strings (or whatever) in JS, you need to copy them into the linear memory for the Wasm module to use.

alpaca128 · on April 14, 2021

Any real performance gains will easily be balanced out by websites doubling their size once again, WASM or not.

slx26 · on April 14, 2021

And there might be other benefits besides performance. I'd like to use WASM to be able to reuse server side code in languages like Rust or Go in the client, so you don't have to re-implement algorithms and tricky processing code in javascript.

vbsteven · on April 14, 2021

I experimented with this some weeks ago and it is certainly possible.

I had a PoC where my server runs Rust, exposes a JSON Rest API using serde to serialize my Rust structs to JSON. On the web Client I compiled Rust to wasm and used the Reqwest crate (http client that uses Fetch in wasm) to talk to my server, Rust structs are shared between server and client.

For me, the beauty about Rust in this setup, is that cross compiling/crossplatform is builtin into the tooling (Cargo). For example the Reqwest crate compiles down to use the browser Fetch api when running in Wasm, and the same crate on the server uses a native implementation using openssl (or rusttls).

kevincox · on April 14, 2021

I did something making a game. The game logic runs server side however in order to hide latency the clients also run a WASM copy locally. Then once the server processes their moves they check that everything was in-sync and if not reload with the server state.

(In practice the validation is probably not necessary but doesn't hurt to have).

rapsey · on April 14, 2021

I even use druid for a simple browser gui on top of a rust json rest service. For an internal tool. Serde on both ends. Works great.

xupybd · on April 14, 2021

Dot net does this with Bolero. I need to give it a go.

edflsafoiewq · on April 14, 2021

asm.js suffices for that.

TheCoelacanth · on April 14, 2021

Asm.js is a non-standard precursor to wasm.

jholman · on April 16, 2021

You know a technique, based on asm.js, that lets you use Rust or Go in the browser?

hackcasual · on April 14, 2021

A large part of that would be better support for integer math, and 64 bit in particular for sha1

CodesInChaos · on April 14, 2021

Why are 64-bit integers useful for SHA-1 which uses 32-bit words?

vnorilo · on April 14, 2021

The full product of 32-bit multiply is 64-bit.

CodesInChaos · on April 15, 2021

I don't think SHA-1 uses any multiplications, only bitwise operations (not, and, or, xor), addition and rotation.

maxgraey · on April 14, 2021

Yeah, WebAssembly have i64/u64 types as first class citizens unlike JavaScript which should emulate it or use BigInt which drastically slower than native 64-bit types. That's why crypto algorithms got a lot of speed benefits. AssemblyScript also show this. See this:

https://github.com/FriendlyCaptcha/friendly-pow https://github.com/hugomrdias/rabin-wasm

lovasoa · on April 14, 2021

There was a project zero article on HN recently [1] that said that bound check eliminations were removed from v8 because they allowed attackers to easily turn a type confusion into a memory read-write primitive:

> As a result, last year the V8 team issued a hardening patch designed to prevent attackers from abusing bounds check elimination. Instead of removing the checks, the compiler started marking them as “aborting”

But this post, which also appears to be written by someone from google with access to v8 developers, states that :

> You never go out of bounds. This means TurboFan does not need to emit bounds checks [...]

Does someone here know more about bound checks eliminations in TurboFan ? Are they removed in some cases but not others ?

[1] https://googleprojectzero.blogspot.com/2021/01/in-wild-serie...

saagarjha · on April 14, 2021

This ensures aborting bounds checks to some parts of the the pipeline, but this doesn't mean that later optimizations can't determine that the check is dead code and remove it. For example: https://doar-e.github.io/blog/2019/05/09/circumventing-chrom...

isaacimagine · on April 14, 2021

~lol, imagine not doubling the capacity of a dynamic array upon allocation.~

In all seriousness, this is a great read, and I was mildly surprised JS was about the same speed as Wasm once TurboFan kicks in. As a compiler engineer, it's nice step back and appreciate the myriad of runtime-based optimizations that can be done with modern JIT compilers.

That being said, assembly script doesn't perform and high-level optimizations upon compilation, so I wonder how fast it will be once fully matured.

brabel · on April 14, 2021

> assembly script doesn't perform and high-level optimizations upon compilation

IIRC it does because it uses binaryen[0] (as claimed in the link) which is an optimizing compiler.

[0] https://github.com/WebAssembly/binaryen

dassurma · on April 14, 2021

Author here.

It’s a trade-off for simplicity. It’s a small team and they are still working towards feature completeness. Deferring optimization to Binaryen is the easy way out at the cost of not having high-level optimizations. If they finish their IR, that will most likely change.

kevingadd · on April 14, 2021

The correct growth rate is typically not actually 2x, but it's certainly not +1 either :)

hansvm · on April 14, 2021

I don't see this brought up very often, but you have a huge world of flexibility in choosing growth rates beyond just adding constants for an O(n) amortized append time or multiplying by a constant for O(1).

E.g., if you choose x -> x(1+1/log(x)) then you get an amortized append time of O(log(n)) while paying a memory overhead approaching 0% for large datasets.

The distribution of (and SLOs for) appends relative to other operations can make that kind of idea more or less attractive, but even common data structures have a lot of room for improvement if you can tailor them to your use case a little bit.

RicardoLuis0 · on April 14, 2021

growth rate varies between use cases, too small and you pay on performance overhead, too large and you pay on memory overhead, 2x is a decent mid-spot

masklinn · on April 14, 2021

Wasn’t there some consideration that with 2x you could never reuse previous contiguous allocations but at 1.4 (ish?) it was an option and improved fragmentation in some cases?

Of course it depends on the behaviour and binning (or lack thereof) of your allocator.

Edit: it’s 1.5, any growth factor below 2 has this property, how far below 2 regulates how quickly it happens, https://github.com/facebook/folly/blob/master/folly/docs/FBV...

> it can be mathematically proven that a growth factor of 2 is rigorously the worst possible because it never allows the vector to reuse any of its previously-allocated memory.

> [...]

> choosing 1.5 as the factor allows memory reuse after 4 reallocations; 1.45 allows memory reuse after 3 reallocations; and 1.3 allows reuse after only 2 reallocations

dahfizz · on April 14, 2021

I'm not sure I understand the benefit of this. With a growth factor < 2, you have a chance of getting back chunks of memory that were previously used. That doesn't affect fragmentation / cache hits since all your data is always in the current chunk. What am I missing?

adrusi · on April 14, 2021

Eh, at the point you're tuning the growth factor, you might consider just not using a dynamically sized array

eru · on April 14, 2021

Depends on whether you are talking about tuning the growth factor for a single instance, or whether you are talking about tuning the default growth factor in general.

Eg Python has put a lot of thought into their dynamic array (and dict) growth factors.

debt93 · on April 14, 2021

Honestly, the fact that AssemblyScript's Array implementation does not double the internal capacity but instead adds just one more slot when reallocating makes me worry about the quality of the language as a whole.

I hope it is just an oversight, but come on...

gvb · on April 14, 2021

Doubling the allocation is a "hack" that is helpful when reallocations are common and thus is helpful for languages that extend arrays very often (typical of dynamic languages with GC) and where memory is cheap and plentiful.

One of the prime features of assembly language is that the person (compiler) that is generating it expects tight control over what it does. A 2*X allocation when you ask for X is unexpected.

Imagine if, when you went to the ATM and withdrew $100, the bank actually withdrew $200 from your account and held back the extra $100 so that, the next time you went to the bank and withdrew $20 it would take it out of the "held back" amount rather than doing another withdraw. I would be very unhappy with that algorithm.

debt93 · on April 14, 2021

It is not a "hack". It is the behavior that I expect from a dynamically resizable array.

std::vector in C++ does it, so does Vec in Rust, and they are not dynamic languages with GC.

pjmlp · on April 14, 2021

Which std::vector though?

ISO C++ places no such requirement on std::vector, each implementation is free to choose their own implementation provided it matches the O() notation requirements.

C++ is not like Rust where the implementation dictates the semantics.

joppy · on April 14, 2021

In order for append-to-back to have O(1) amortised running time, the capacity needs to be multiplied by some constant >1. Any constant would do just fine in terms of complexity, but 2 is the obvious simple choice, being the first integer greater than 1.

If the capacity is only increased by some constant each time, rather than multiplied, this leads to O(n^2) running time for a sequence of n append-to-back operations, surely something to be avoided.

TheCoelacanth · on April 14, 2021

For `push` to extend capacity by just 1 is an absolutely insane default.

There is no sensible usage for a method that does that. It turns `for(let i = 0; i < n; i++) { arr.push(x); }` from linear into quadratic.

If automatic resizing exists, then it should do it in a sensible way. Otherwise it's just a footgun that you should leave out of the language like C does.

kohlerm · on April 14, 2021

Also without an integrated GC, most modern languages do not run well on ASM.

kohlerm · on April 15, 2021

I meant WASM of course :-)

DonHopkins · on April 14, 2021

"Dynamic Deoptimization" should have been called "Dynamic Pessimization".

Debugging Optimized Code with Dynamic Deoptimization

By Urs Hölzle (Stanford University), Craig Chambers (University of Washington) and David Ungar (Sun Microsystems Labs).

https://bibliography.selflanguage.org/_static/dynamic-deopti...

>Abstract: Self’s debugging system provides complete source-level debugging (expected behavior) with globally optimized code. It shields the debugger from optimizations performed by the compiler by dynamically deoptimizing code on demand. Deoptimization only affects the procedure activations that are actively being debugged; all other code runs at full speed. Deoptimization requires the compiler to supply debugging information at discrete interrupt points; the compiler can still perform extensive optimizations between interrupt points without affecting debuggability. At the same time, the inability to interrupt between interrupt points is invisible to the user. Our debugging system also handles programming changes during debugging. Again, the system provides expected behavior: it is possible to change a running program and immediately observe the effects of the change. Dynamic deoptimization transforms old compiled code (which may contain inlined copies of the old version of the changed procedure) into new versions reflecting the current source-level state. To the best of our knowledge, Self is the first practical system providing full expected behavior with globally optimized code.

>Proceedings of the ACM SIGPLAN ‘92 Conference on Programming Language Design and Implementation, pp. 32-43, San Francisco, June, 1992.

moonchild · on April 14, 2021

See also: ‘Not So Fast: Analyzing the Performance of WebAssembly vs. Native Code’ - https://www.usenix.org/system/files/atc19-jangda.pdf

justin66 · on April 14, 2021

Two years old. I bet the "actionable guidance for future optimization efforts" they provide has been acted upon.

vanderZwan · on April 14, 2021

I'm sure it's being acted upon, but these kinds of developments take time so I'm not sure if it has "landed" yet.

toastal · on April 14, 2021

What about performance for SpiderMonkey and JavaScriptCore? I’m a little disappointed that everything in the article from a V8-only perspective. Is it too late to want a future where V8+Blink aren’t de facto?

edit: Just saw

> Web Advocate @Google.

self · on April 14, 2021

> I’m a little disappointed that everything in the article from a V8-only perspective.

The advantage of using v8 (d8) here is that the author could benchmark different optimization strategies. I don't think jsc lets you do that.

throwaway189262 · on April 15, 2021

Some things.

WASM binary size is highly dependent on optimization level. If your WASM is anywhere near the size of your JS, something is off with your compiler settings. Since Rust and C++ are statically typed, compilers can use LTO to remove almost every unreachable instruction. JS, even with tree shaking, gets nowhere near this.

Default settings in Emscipten generate huge binaries. Rust without the "native" target or a bunch of configuration also does.

WASM bytecode is also more compact than JS. There's simply no reason the binaries should be even close to the same size.

The memory usage should also be different by and order of magnitude. WASM has basically no overhead above machine sizes for most types. Look at Benchmarks Game, memory use JS vs Rust. You should be seeing a difference close to that.

I can't speak for speed. But I can say I did a head-to-head comparison between the fastest pure JS PNG encoder I could find vs a C encoder transpiled to WASM, and the transpiled encoder was >10X faster.

It's hard to say if something is off in the benchmark or compilation but I find it hard to believe there's such a big difference between my tests and yours. Emcripten especially is not exactly easy to use and maybe a good place to look for size and speed optimization

rkangel · on April 15, 2021

> WASM bytecode is also more compact than JS. There's simply no reason the binaries should be even close to the same size.

The author does explain this pretty well. For an exact 1:1 comparison, yes WASM beats JS for size. JS comes with built in functionality (e.g. a garbage collector) that doesn't cost any size, but in the WASM case needs to be brought along taking up space. Even if you don't want GC, you don't get any WASM 'standard library'.

NohatCoder · on April 14, 2021

I really think WebAssembly missed the target. What we really needed was a language doing away with the dynamic nature of JavaScript, while adding 64 bit integers and generic SIMD instructions, and keeping high level features like strings and automatic memory management.

Instead we got the most bare-bone language imaginable, making everyone have to reinvent the wheel for everything. That is not actually fast. If WebAssembly had stuff like basic string manipulation, browsers could easily map that directly to efficient implementations. But with the current rules everything has to be provided as basic instructions that must be compiled under much stricter security rules.

WebAssembly is not assembly, it is an intermediate representation shared between two compilers, and it is actually pretty bad at that.

flohofwoe · on April 14, 2021

Eh, maybe that's what you need, but not what we need ;)

WASM is an exceptionally good standard compared to most other things on the web platform.

The thing about WASM performance is that WASM is fast (unless you do stupid things), but what's surprising to most people is that Javascript isn't slow either (at an absurdly high engineering and complexity cost compared to WASM though).

NohatCoder · on April 14, 2021

What is the point of WASM if JavaScript is just as fast? After reading the article I'm not left with the impression that getting good performance out of WASM is particularly easy either. WASM has most of the performance footguns from C, requires including a lot of details and gunk, like memory layout and an allocator, that the browser compiler would probably be better off doing on its own.

flohofwoe · on April 14, 2021

WASM is a much better compilation target than Javascript (and at least as important: it frees Javascript from being a compilation target, instead JS can focus on being a programming language written by humans again).

I'd argue that the main point of WASM is not the performance gain, but that it opens up a fairly straightforward path to use different languages on the web (e.g. it was possible to upstream a WASM backend into LLVM, but if the Emscripten team would have tried to upstream an asm.js backend into LLVM, they'd be laughed out of the room I'm sure - and asm.js also wasn't fast without special handling by the Javascript engine either).

Also don't forget that the above blog post is mostly about "WASM isn't as fast as it should be when using AssemblyScript", which is more of a problem to solve for AssemblyScript than WASM, because when used from C it's fairly easy to get "near-native" performance.

PS: all the disadvantages you're listing (like the linear memory layout) are actually massive advantages (for instance when trying to optimize cache misses) ;)

cpleppert · on April 14, 2021

WASM can be faster than JS but you need a language that doesn't shoehorn a GC into the compiled binary. I'm not really sure what the author expected here. Modern managed runtimes usually give you the benefit of bump allocation in the nursery for free with a generational GC and the runtime has a lot more room to optimize the GC phase. None of this is possible without a native GC for webassembly.

This isn't WebAssembly being slow the benchmarks just show the overhead of the GC. If you are writing a computationally expensive algorithm in Rust or C++ wasm can be a lot faster but its hard to get close to native performance(i.e. running on bare metal x86/arm).

Our webassembly prototype is about 3-4x faster in raw computation(and that is targeting webassembly exclusively) but has a much higher overhead when interacting with the DOM. That is basically the limiting factor especially on mobile.

throwaway894345 · on April 14, 2021

From TFA:

> I want to be very clear: Any generalized, quantitative take-away from this article would be ill-advised.

I don't think WASM and JS are "just as fast"--the author only did a couple of microbenchmarks. There are almost certainly many cases in which WASM would outperform JS, but they probably aren't going to be tight loops over an array or similar.

city41 · on April 14, 2021

One nice thing about wasm is you can (sometimes) bring apps to the web very quickly. I got an emulator written in C fully ported to wasm in about an hour. To be fair this particular app hit all of the sweet spots of Emscripten, and you won't get that lucky with most apps.

bzbarsky · on April 14, 2021

Note that the article only tested V8. V8 did not have the fastest WASM implementation last I checked.... Results may well differ in other engines.

hutzlibu · on April 14, 2021

a) getting the best of other ecosystems as well

b) WASM is faster today, if done right

c) be faster with WASM with ease ... in the future, after it all and mostly tooling for it, gets stable

miloignis · on April 14, 2021

Part of the reason it's bare bones is because they started with an MVP, and are continuing to add features through a community/standards process. Garbage Collection, reference types, and Interface Types are popular proposals being worked on that I think would address some of your issues. More here: https://github.com/WebAssembly/proposals

saagarjha · on April 14, 2021

I mean, it's a virtual machine like the JVM. It's just that the code is already generally run through an optimizer, while Java bytecode typically isn't.

NohatCoder · on April 14, 2021

Yeah, that is a way to look at it. My point isn't what we call it exactly, my point is that it is a hack job that is fairly mediocre at doing what it is supposed to do.

The compiler in the browser is going to run its own optimization anyway, preoptimizing the input isn't necessarily going to help.

miloignis · on April 14, 2021

Having read the spec and proposals recently when looking at web assembly as a cross-platform bytecode, I have to disagree, it seems very well designed to me. They started with an MVP, and are continuously working on and adding proposals to extend it with more features, including some of the ones you want, I think. Why do you think it's a hack job?

Additionally, I think optimized webassembly should normally be a big benefit, helping both startup time and optimizations the engine might miss (also helping the engine focus on other optimizations / making simpler engines performant).

edit: Indeed, optimization before webassembly makes a big difference in the article's benchmarks, as you can see with how C++/Rust was faster than the hand-crafted AssemblyScript, theoretically because C++/Rust is going through LLVM optimizations.

staticelf · on April 14, 2021

I work on an application that could benefit from web assembly but the biggest hurdle for me is that it's kind of complex.

The web used to be quite simple but now it sometimes feels like you have to have a very large team in order to do anything. Sure, I could probably do web assembly but that would take time from other things that also is important.

I get that it is very useful for larger teams making larger applications like Figma. But for small teams it feels like the tooling isn't there yet to do anything useful in the timespan that I need to.

That being said, assemblyscript seems very interesting and I probably should look into it.

root_axis · on April 14, 2021

It's not that complex, anything in computers can seem complex if you're not familiar with it. I would suggest that you're just not familiar with it, and that's not a knock against you, it's just something that you need to study like anything else. It's not true that you need a large team to do anything on the web these days, the same technology works today that has worked for the last two decades, but if you want to leverage the benefits of new technology you need to invest the time to understand how it works, that's just the nature of technological advancement.

rudi-c · on April 14, 2021

Parent mentioned the lack of tooling, and that is indeed where most of the cost of using WebAssembly lies.

You don’t really need to learn the assembly language itself since you’ll probably just be calling emcc.

However, you may need to build code to marshall more than just ints and strings to the JS code. Even after you do, you’ll run into the classical issues of keeping track of object references across a GC & non-GC system.

You may need debugging and find that in-browser debuggers for WASM are primitive/non-existent. You may need to figure out how unmangle stack traces —- including mixed JS/WASM traces. Third-party tools like Sentry for error reporting may not have built support (they sort of recently have and is very under-documented).

All solvable problems, but it’s a lot of time spent not building the product. There are plenty of good uses cases but it’s usually not the ones based on the false premise that native is somehow always better than interpreted.

staticelf · on April 14, 2021

Your answer is my thoughts perfectly put in text. Better explanation than what I could come up with myself. Thanks!

hutzlibu · on April 14, 2021

Also, with wasm you are now dealing with memory leaks and hunting them down, is no fun at the moment.

jillesvangurp · on April 14, 2021

It looks complex until you sit down and do it and force yourself to understand what you are doing. Most things are like that.

The way I structure learning things like this in my team is by organizing spikes. We commit to diving into some technology with the goal of finding out how feasible it is for us to use it and a secondary goal of maybe getting something useful going. However the primary goal is finding out if it can work and if so exactly how. If it works out it becomes a regular thing we work on and integrate. This usually starts with studying what is there, what the risks and benefits are, etc.

At some point you reach the point where the only way to learn more is simply doing it. You can analyze something to death without fully understanding it. Just sitting down and doing it becomes the logical next step. The payoff is usually non linear: you gain more if it works than you lose if it doesn't. This is one of those things that you might suspect is valuable like that. So your job is finding that out in an efficient way.

In your team, I would task one or two of your people with spending a max of 2 days to validate that they can take a simple bit of typescript, convert it to assembly script, compile it and hook it up. Chances are pretty good that they'll have working code at the end of those two days. Worst case you lose two days. Best case you figure out it's easy and just works and you move forward.

enos_feedler · on April 14, 2021

I agree. Is there something specific that you found difficult that should be simple?

dboreham · on April 14, 2021

Assuming that WebAssembly is for performance is an invalid assumption. The reason for WebAssembly is to provide a runtime environment for AOT compiled languages such as C++, Rust. The performance of programs written on those languages and compiled to the Webasm target may or may not exceed the performance of a program of the same functionality written in JS and executing in the same V8. There's no reason to expect performance to be radically different just because it's Wasm.

hutzlibu · on April 14, 2021

The headline is a bit missleading, as the main article is about AssemblyScript, a typescript subset compiling to WebAssembler. So it seems, quite some of the mentioned problems come from the immaturity (or design problems?) of AssemblyScript and not necessarily from wasm.

dassurma · on April 14, 2021

Author here.

Using ASC was intentional, _because_ it is somewhat immature, and still manages to outperform JavaScript in the first two cases.

In the third case, where I couldn’t get ASC to outperform JavaScript, I tried Rust and C++ as well.

CyberDildonics · on April 14, 2021

That seems to mean every bit of your title is nonsense. You aren't benchmarking webasm at its best and 'magic pixie dust' is already nonsense on its own.

joppy · on April 14, 2021

As anecdata, I've also found that using typed arrays does not speed up (actually slows down, by 10% or so) code which uses integer arrays. Making sure that arrays stay in packed-small-integer format (which is not always obvious: for example one has to replace x => -x with x => 0-x to avoid the old IEEE -0 from kicking in) consistently outperforms typed arrays for me, by a large margin if allocating many small arrays, and by a small margin if allocating one large array.

I found similarly "meh" performance improvements when trying to port my Javascript code to webassembly: either modern Javascript implementations are absolutely amazing, or webassembly runtimes still have a ways to go.

vanderZwan · on April 14, 2021

This is most likely because allocating typed arrays is really, really slow in JavaScript. IIRC it has to do with the fact that there is a lot of flexibility regarding backing buffers or something; each typed array object has about 200 bytes of overhead compared to a dozen bytes for a plain array or object (roughly).

Typed arrays are mainly faster in scenarios where you allocate one or a few large TypedArrays once and then re-use them a lot.

That also explains your experience with many small arrays. That's basically the worst way to use TypedArrays.

joppy · on April 14, 2021

Yes, I did think it wasn't a good way to used typed arrays, but I was still surprised that when allocating only a very small number of large arrays, just plain Javascript arrays outperformed typed arrays.

yuri91 · on April 14, 2021

A good trick to convince JS engines that you are dealing with integers is to use the "`|0` operator":

let a = 1; let b = 2; let c = a+b|0; // c is guaranteed to be a 32 bit signed integer

This not only ensures correct 32 bit integer semantics (like wrapping around), but also helps the engines to use actual integer instructions in the generated machine code.

For unsigned 32 bit integers, there is `>>>0`, and for multiplication, there is Math.imul().

vanderZwan · on April 14, 2021

One of my colleagues told me to stop doing that because in (I think) V8 these values are immediately converted back to a double nowadays. So annotating all code with |0 doesn't really add speed benefits there, just extra conversions between doubles and integers. Said colleague used to maintain human-asmjs so I trust he knows what he's talking about.

[0] https://github.com/zbjornson/human-asmjs

cpleppert · on April 14, 2021

>> This not only ensures correct 32 bit integer semantics (like wrapping around), but also helps the engines to use actual integer instructions in the generated machine code.

But there is only one type of number in javascript. Everything is just a double(BigInt aside). You get a 32-bit integer because the bitwise operator casts the result to one. c has the exact same semantics as any other js number.

Yeah there are tricks to convince the engine you are using an integral type but those unless you are doing a lot of benchmarks they aren't really useful. Any compilation tier can choose to use any intermediate representation it wants.

vanderZwan · on April 15, 2021

> But there is only one type of number in javascript. Everything is just a double(BigInt aside).

There is nothing stopping JS engines from trying to infer when a number is only an integer and optimize for that. In fact, that's what Small Integer (SMI) optimizations are all about[0].

It's just that |0 isn't really able to guarantee that our number value is the type of SMI that V8 optimizes for (since the V8 SMIs are 31 bits, and bitmasking operations only guarantee 32 bit integers)

https://ponyfoo.com/articles/an-introduction-to-speculative-...

kevingadd · on April 14, 2021

One important thing to note that this post kind of hints at: JavaScript can be optimized more than equivalent WebAssembly if you give the JS runtime enough help, because it can use runtime-only information to produce better-optimized JS. It can exploit type information gathered during runtime to devirtualize method calls and produce type-specialized code, while also doing things like escape analysis to eliminate some allocations entirely. You have to carefully identify places in your native code where you can do unchecked array accesses, etc, but the JS runtime just figures it out for you.

For any of those optimizations to happen for your WASM code, the compiler has to be able to do it statically and that can be much harder. Devirtualization in particular is essential for Java or C# to run fast and some C++ codebases also benefit tremendously from it. If you're interacting a lot with JS APIs from WASM (like issuing network requests or creating DOM elements, etc) you're going to be dealing with lots of dynamically typed data, and in those scenarios handwritten JS may actually be faster than WASM because the runtime can JIT optimal code with the right type specializations.

Note that these optimizations will fail if you aren't careful about how you write your JS: If a given function f(x,y) is passed values of different types during execution, it probably won't be fully optimized. If you have two functions f1(x,y) and f2(x,y) and ensure that each one is only passed values of a certain type, they will both be heavily optimized (iirc the JS runtime terminology for these functions is 'monomorphic') Naturally, this means uses of Function.apply and Function.call should be avoided at all costs.

RcouF1uZ4gsC · on April 14, 2021

I have seen much the same made about Java versus C++ for the past 25 years that the Java byte code JIT would have more information and thus be able to optimize and do stuff like devirtualization better.

However, that has not planned out.

There are a few reasons for this. First C++ and Rust optimizers can do amazing things when they are given time. In addition, I think devirtualization is not as big a deal in C++ and Rust because in general you avoid writing code that uses virtual functions when you are writing performance sensitive code and instead use things like templates/generics where there is no indirect function calls.

pjmlp · on April 14, 2021

Other than 3D AAA game engines, all the C++ software that I replaced with either Java or .NET solutions has kept the customers happy and lowered the TCO of their products.

This wasn't tiny CLI that occasionally lands on HN, rather large scale desktop applications or distributed computing clusters.

Winning micro-benchmarks is not everything, which is why except for Windows with WinUI (which still remains to be seen if it can move windevs away from Forms/WPF in its current incomplete state), all OS vendors are migrating to other languages for their App development SDKs, leaving C++ and Rust only for low level OS components.

v8dev123 · on April 14, 2021

Are those C++ ones written in pre-C++11 ?

I don't think C++ and Rust are for just low-level. I've build a lot of GUI apps and distributed ones with C++

QT is a beast.

The issue with Java is the reverse engineering. Back in 90's, The main selling point for Java was to prevent it, because at that time the bytecode was hard to understand at least. Now tools has grown and it's fairly easy to reverse engineer Java, even if one obfuscate it.

As for C++, Inline code and template code make it pain in ass.

I'm sure big companies that care about intellectual property would use C++ over Java anytime. C++ also has mature obfuscation tools that make it even more difficult.

Java has it's place. It's great language if you use in server side or isolated env (From commercial viewpoint).

Nevertheless, I've built many web apps using C++ too.

pjmlp · on April 14, 2021

They were and are written in all sorts of C++ flavours, including past C++11.

Reverse engineering is never an issue with Java if one actually uses the right tooling, commercial AOT compilers exisst since around 2000, it is a matter of buying them.

I assume JetBrains and Google are big companies.

v8dev123 · on April 14, 2021

Google where I works at is largely a C++ shop despite Android Java.

AOT compilers exists for Java, Net, Javascript. I however doubt the user experience of those.

For example, GraalVM mentions the following,

"There is a small portion of Java features are not susceptible to ahead-of-time compilation, and will therefore miss out on the performance advantages. To be able to build a highly optimized native executable, GraalVM runs an aggressive static analysis that requires a closed-world assumption, which means that all classes and all bytecodes that are reachable at run time must be known at build time. Therefore, it is not possible to load new data that have not been available during ahead-of-time compilation."

pjmlp · on April 14, 2021

Not everyone is Google, and since you work there you are surely aware of tooling like Ghidra and IDA.

GraalVM is not what I would pick for AOT Java projects, there are other products since 2000.

In any case, this isn't a comparisasion of language bullet points.

Just because a software product has been migrated from C++ into Java, .NET or whatever language, it doesn't mean it is a sacrilege to keep some native lib around, which is exactly where all mainstream OSes are going, with C++ being left for the bottom layers.

How many desktop GUIs is Google shipping written in pure C++?

v8dev123 · on April 14, 2021

"In 2017, the Qt Company estimated a community of about 1 million developers worldwide[18] in over 70 industries."

See list of companies using C++ QT for GUI,

https://en.wikipedia.org/wiki/Qt_(software)

pjmlp · on April 14, 2021

1 - Qt is not an OS SDK. Apparently you missed that part of my comment.

2 - Qt has been migrating away from pure C++, again you also missed pure from my comment, modern Qt applications are written in Qt Quick, a JavaScript dialect, with underlying components written in C++.

C++ Widgets have hardly changed since Qt 4, other than being updated to the underlying Qt infrastructure.

v8dev123 · on April 14, 2021

1. QT is cross platform.

2. QT not moving away from pure C++.

C++ apps always been more responsive than java swing and electron.

pjmlp · on April 15, 2021

Again, read my comment, very carefully, then you migth get it.

Afterwards go learn how to create a GUI for macOS, iOS, Android, ChromeOS, Windows, WebOS (as shipped by LG) or even Fuchsia, using only their SDKs and nothing else.

As for Qt not moving away from pure C++, go use it with pure C++ on iOS, Android and embedded devices.

Maybe you will get it, but I am not so sure.

v8dev123 · on April 15, 2021

QT is not moving away from pure C++. QT using QML for front end now and for backend you have to use C++.

Please see this commend by Ivan,

https://www.reddit.com/r/cpp/comments/s0mhf/qt_5_moving_away...

> Afterwards go learn how to create a GUI for macOS, iOS, Android, ChromeOS, Windows, WebOS (as shipped by LG) or even Fuchsia, using only their SDKs and nothing else.

Why use native SDK when there is QT?

The same can be said for Java. Why use Swing when there is native SDK?

Ahahhaa

The way you treat C++ is unfair.

pjmlp · on April 15, 2021

So you really don't get it.

Java is only part of Android OS SDK, you are the one bring Swing into the picture.

WHAT MAINSTREAM OS HAS QT ON THEIR SDK?

RcouF1uZ4gsC · on April 14, 2021

> How many desktop GUIs is Google shipping written in pure C++?

You mean apart from the Chrome browser?

pjmlp · on April 14, 2021

https://developer.chrome.com/blog/migrating-to-typescript/

https://developer.chrome.com/blog/puppeteer-typescript/

https://chromium.googlesource.com/chromium/src/+/master/docs...

Pure => not mixed with anything else

https://dictionary.cambridge.org/dictionary/english/pure?q=P...

RcouF1uZ4gsC · on April 14, 2021

> all the C++ software that I replaced with either Java or .NET solutions has kept the customers happy and lowered the TCO of their products.

Were your customers enterprises that made their employees use those products, or were they end user, consumer products?

In my experience, enterprises are happy to push slow, laggy, hard to use corporate tools on their employees as long as it saves them money.

pjmlp · on April 14, 2021

The customers of the enterprises.

Many devs are too religious arguing for home team and don't embrace polyglot programming.

Just because a product is mainly written in managed language X, doesn't mean some library can't be written in something else.

C and C++ devs have forgotten the days when their beloved programs in 8 and 16 bit home computers, were a pile of inline Assembly if performance was to be anywhere of an acceptable level.

Embrace the safety and productivity of higher level languages (with AOT and JIT compilers), and let a couple of native libs be the "inline Assembly" if and only if, a profiler proves it is actually required instead of choosing a better data structure or algorithm.

kevingadd · on April 14, 2021

Your argument basically boils down to "If you write fast C++ it will be fast", which is true. But a significant fraction of code out there is not fast C++ written by experts to be fast.

This is different than "Java will be faster than C++ because of HotSpot" arguments, because java is competing with C++. This is not a competition between JS and native C++, it's a competition between JS and WASM.

v8dev123 · on April 14, 2021

You don't have to be an expert to write Fast C++ that beats Java. I wrote a simple for loop and when compiled with -O3 it beat the Java version of it.

You just need to know your tools, that's it. Plenty people forget -O flag and existence of libraries like Folly.

If you combine PGO along with these must knows, you seriously will become much more faster.

kaba0 · on April 14, 2021

A for loop is not too interesting application -- it is not what Java optimizes for, and chances are you didn't benchmark it correctly.

To optimize your program in a low level language you have to basically have a whole plan for the architecture of your program beforehand, and every major change to that will break your optimizations. Also, don't forget about non-standard object life cycles, which is really common. Complex C++ programs basically employ their own GCs, which will be inferior to any one included in the JVM.

Of course low-level programs have their place (plenty of), eg. audio processing, embedded, million other, but the average business/CRUD app will be faster* both to execute and to produce in Java, as well as better maintainable.

* With enough time a competent team could of course write a faster version of it in C++, but it's not a good use of their time, and you would be surprised how hard it, especially with ever-changing requirements.

v8dev123 · on April 14, 2021

C++ is not just a low level language. It's consists both high level and low level. C however a low level language.

I bench-marked using Intel vTune.

for loop is interesting. It's why Tensorflow Core written in C++ instead Java.

I don't know any complex C++ program that employ their own GCs when C++ has RAII which is superior to GC.

Just give a try for C++11/14/17 and you will see which one is more maintainable and expressive.

Look at Chromium codebase. It's the most beautiful codebase I've ever been to.

I've done a lot of CRUD web apps in C++ using expresscpp [1] and I would say it's easy as ABC.

A lot of Java folks haven't tried C++11/14/17 (Modern C++).

C++ is Zen of OOP.

[1] https://github.com/expresscpp/expresscpp

kaba0 · on April 14, 2021

> C++ is not just a low level language

A language either cares about low level details or not. You can’t have it both ways. And c++ is absolutely a low level language.

> I don't know any complex C++ program that employ their own GCs when C++ has RAII which is superior to GC.

RAII is not at all a replacement for GC. It is only suitable for a subset of object lifetimes. There are plenty of cases where you can’t really pinpoint a scope-exit where this given object should be reclaimed.

A GC is a necessity in many concurrent algorithms that simply could not be written without.

> Just give a try for C++11/14/17

I have and I like it. There are domains where I would not even start writing Java, and vice versa with C++.

Your CRUD app may have been a breeze but what if the requirement has changed now touching on a core of your program. You have to refactor and it will be really expensive, compared to a high level language. Every memory allocation/deallocation have to be thought out again and tested (and while rust can warn about it, you still have to write a major refactor as it is another low level lang)

v8dev123 · on April 14, 2021

> A language either cares about low level details or not. You can’t have it both ways. And c++ is absolutely a low level language.

Please tell me why you can't. C++ is both not one. It's a multi paradigm language.

In Modern C++, the low level details invisible.

> Every memory allocation/deallocation have to be thought out again and tested

True If you're writing C with Classes or Java Style C++.

>> C with Classes >>> malloc()

>> Java Style C++ >>> new and delete everywhere

> There are plenty of cases where you can’t really pinpoint a scope-exit where this given object should be reclaimed.

Show me. I'd bet your case can be solved with xvalues.

> A GC is a necessity in many concurrent algorithms that simply could not be written without.

Show me a concurrent algorithm that needs GC.

> but what if the requirement has changed now touching on a core of your program.

C++ is a OOP language just like Java. You do it same way as you do in Java. Use inheritance.

> major refactor as it is another low level lang

No. It's not a low level language if you write Modern C++.

The case for Java very clear prior 2011 but now C++ has caught up.

kaba0 · on April 15, 2021

> It's a multi paradigm language.

Being multi-paradigm is a different axis all around. Low-level (which is by the way not a well-defined concept, C is actually also high level, only assembly is low, but that usage is not that useful) means that low level details leak into your high level description of code, making the two coupled. You can’t make them invisible.

Also, as an example, think of Qt. A widget’s lifetime is absolutely not scope-based, nor is it living throughout the whole program. You have to explicitly destruct it somewhere. And there are plenty of other examples.

And as I said, I’m familiar with RAII, it’s really great when the given object is scope-based, but can’t do anything otherwise.

> C++ is a OOP language just like Java. You do it same way as you do in Java. Use inheritance.

And if the new subclass has some non-standard object life cycle you HAVE to handle that case somewhere else, modifying another aspect of the code. It is not invisible, unless you want leaking code/memory corruption.

v8dev123 · on April 15, 2021

> low level details leak into your high level description of code, making the two coupled. You can’t make them invisible.

It's your job to make it not to leak. You have to write Modern C++ wrappers around C libs.

Similarity, The same can be said for Java. You can do low level in Java.

C++ is not C. C++ has backward compatibility with C.

Look at Boost folks, they wrote a Modern C++ wrapper around a C HTTP parser.

> And as I said, I’m familiar with RAII, it’s really great when the given object is scope-based, but can’t do anything otherwise.

Nothing is impossible.

You can use Scope Exit Guard with QT Widget.

https://github.com/ricab/scope_guard

> And if the new subclass has some non-standard object life cycle you HAVE to handle that case somewhere else, modifying another aspect of the code. It is not invisible, unless you want leaking code/memory corruption.

Again, Scope Exit Guards solve your problem!

astrange · on April 14, 2021

The main problems with Java aren't being JITted, it's that it's not expressive enough. It doesn't have SIMD (yet) or value types (yet…?).

I would expect a JIT to not really be able to find a lot of magic optimization opportunities, though maybe there are some, and it'd actually be annoying if it could. The most important thing in a tool like that is predictability, because you can't make development decisions based on magic.

MaxBarraclough · on April 14, 2021

> it's that it's not expressive enough

That may be part of it, but I imagine the JVM's safety obligations are also a significant factor. If the JIT can't elide array bounds checks, checks must be performed at runtime. Runtime type checks might be needed. Runtime arithmetic checks might also be needed. The JVM is also more constraining regarding concurrency gone awry, than the C/C++ memory model. [0] More broadly, the JVM's lack of undefined behaviour constrains the optimiser in ways the C/C++ approach does not (although I'm open to the idea that it's overstated how much of a performance win is owed to C and C++ having many kinds of undefined behaviour).

And of course there's the GC and Java's high object-churn, even where lifetimes are known statically. To my knowledge, escape analysis (the relevant family of JIT optimisations) still hasn't really addressed this.

[0] https://softwareengineering.stackexchange.com/q/262428/

kaba0 · on April 14, 2021

The JIT can elide array bound checks really often, and most "low hanging" optimizations are solved quite cleverly (it's way out of scope for my knowledge, but I remember reading that null checks are elided by trapping segfaults? Does it make sense?). There is no over/underflow checks so I don't know what you mean by arithmetic checks -- in pure number crunching the JVM is insanely fast.

And you are right in that many Java libs/programs are quite happy to create garbage, though with generational GCs it is really cheap. Escape analysis is great, but primitive classes in Project Valhalla will solve this last problem of object locality.

MaxBarraclough · on April 14, 2021

> null checks are elided by trapping segfaults

Sounds right. No need to generate instructions to perform the check if you can rely on a hardware trap, by means of signal-handling cleverness.

> There is no over/underflow checks so I don't know what you mean by arithmetic checks -- in pure number crunching the JVM is insanely fast.

Integer multiplication, addition, and subtraction, are all defined in Java to have wrapping behaviour, and are easily implemented. Whatever the input values, there's no way those operations can fail. (Incidentally, this is a terrible way of handling overflow. This turned up recently in discussion. [0]) Division is trickier. In Java, integer division by zero results in an exception being thrown. Apparently JVMs can implement this with signal-handling cleverness similar to dereferencing null references. [1] Two's complement integer division has another edge case, which is undefined behaviour in C/C++ but which, iirc, results in an exception in Java: INT_MIN / -1. I believe the JIT has to emit instructions to check for this, as it's not possible to leverage signal-handling there.

I don't know how well modern Java performs in floating-point arithmetic. Here's an old tirade about it [2] and discussion. [3]

> with generational GCs it is really cheap.

At the risk of going off topic: doesn't Java tend to perform somewhere around 60% the speed of C/C++, while using considerably more memory? Perhaps the GC isn't to blame, but clearly the blame belongs somewhere. It's like the way advocates of Electron will insist that modern HTML rendering engines are fast and efficient, the DOM is fast and efficient, and JavaScript is fast and efficient... and yet here we are, with Electron-based applications reliably taking several times the computational resources of competing solutions using conventional toolkits.

> primitive classes in Project Valhalla will solve this last problem of object locality

Interesting, sounds like the kind of ambitious initiative that will require deep changes to the JVM.

[0] https://news.ycombinator.com/item?id=26666013

[1] https://www.javaer101.com/en/article/3117893.html

[2] (PDF) https://people.eecs.berkeley.edu/~wkahan/JAVAhurt.pdf

[3] https://news.ycombinator.com/item?id=6585828

kaba0 · on April 15, 2021

> At the risk of going off topic: doesn't Java tend to perform somewhere around 60% the speed of C/C++, while using considerably more memory?

It is hard to properly benchmark this generally, for small programs it is “at most” within 2-3X, but I believe for more complex applications it closes the gap quite well (many things can be “dynamically” inlined even between classes far from each other). Not sure how it fares with PGOs.

And yeah it does use more memory, both the runtime/JIT/GC and each object has considerable overhead, but I don’t think that comparing it to Electron is apt. Electron is slow because it adds additional steps to the picture, not because of the JS engine itself. V8 is similarly an engineering gem, and it can be stupidly fast from time to time.

As for the GC: The GC itself is required for some program to work correctly. C/C++ codebases often create their own GC, and that will surely be slower than any of the multiple GCs found in the JVM. But for short-living programs the GC doesn’t even run (similarly to how some short lived C program leaves clean up to the OS), so rather the former is responsible for the bigger memory usage.

All in all, where ultimate control over memory/execution is not required (that is, you don’t need a low level language), Java is fast enough, especially combined with it being productive and easy (and safe) to refactor, as well as having top notch profiling tools (with so low overhead, that it can be run in production as well).

kevingadd · on April 14, 2021

Optimizations like 'these two function arguments are always int31' in v8 or spidermonkey are 100% predictable at this point and result in all your type checks and boxing being eliminated, and with the known types it also becomes much cheaper/faster to create object instances (since now if you store those values into properties of an object, that object's shape is fully known). Various properties like this can extend out into larger parts of your JS application.

There's still a lot of magic you can't rely on, but you'd be surprised how much you CAN rely on. Asm.js was built on this observation: If you write your JS following some basic rules it's actually pretty easy to land on predictable, well-optimized paths. Of course, one of WASM's advantages is that by design you're almost always on those paths and don't have to worry.

kaba0 · on April 14, 2021

> The most important thing in a tool like that is predictability, because you can't make development decisions based on magic.

Fortunately you've got the best profiling tools available, so you don't have to guess. And also you get to see the relative importance of the function you try to optimize, whether that actually is the bottleneck (and actually people often guess wrongly where the bottleneck is)

pjmlp · on April 14, 2021

It surely has had support for AVX for several releases, although via the autovectorization support, and explicit SIMD has been made available as preview on Java 16.

astrange · on April 14, 2021

Autovectorization is the kind of magic you can't rely on. It sort of works on a single platform but you will always run into cases it doesn't handle even if you own your own team of autovectorization engineers who tell you it's perfect.

pjmlp · on April 14, 2021

A magic shared with C, Fortran and C++ compilers, among others, so support is there.

sudosysgen · on April 14, 2021

Compiled autovectorization is miles more reliable than JIT autovectorization.

kaba0 · on April 14, 2021

At the other hand, the explicit Vector API will use the correct "flavor" of SIMD instructions on the platform and will gracefully fall back to non-simd version if it is not supported. And as far as I know, the SIMD story is quite bad with C.

sudosysgen · on April 14, 2021

Yes it's quite bad with C. With C++ and Rust it's much much better when you do it properly.

pjmlp · on April 14, 2021

When you do it properly is the big question.

astrange · on April 15, 2021

It's pretty good in C with assembly, inline or not. SIMD usually involves a lot of aliasing violations and intrinsics have weird hard to read names, so I find assembly easier to deal with than C here.

lilyball · on April 14, 2021

Surely there’s some way to do profile-guided optimization when compiling to WASM?

kevingadd · on April 14, 2021

Some WASM compiler toolchains have PGO available, yes.

fergie · on April 14, 2021

Why does TypeScript use javascript as a compile target and not web assembly?

aikah · on April 14, 2021

Because Typescript was designed to be a tool to write Javascript with type annotations at first place and nothing else.

paavohtl · on April 14, 2021

Compiling TypeScript to JavaScript is essentially just removing the type annotations. Compiling to a bytecode VM would be orders of magnitude more work, especially since TS is defined to have exactly the same runtime semantics as JavaScript.

sesm · on April 14, 2021

Unfortunately, it's not.

TS classes are not JS classes, TS has it's own implementation of async/await, etc. Just check any compiled TS code and you'll see it. It's very frustrating when you want to quickly patch a bug in 3rd-party library.

paavohtl · on April 14, 2021

TS can target older versions of JS without modern features, just like Babel. If you target a more recent latest release of ES, the emitted JS should be pretty much the same as the source TS, just without type annotations.

sime2009 · on April 14, 2021

You are most likely looking at code which has been compiled to target an older ES version which doesn't have these features.

jariel · on April 14, 2021

Web assembly is very low level. It's not a VM, there are no strings. It's designed to model 'assembly' style instructions.

To do TS on JS you'd have to build a VM on top of WA.

And that would probably be pointless of course.

dassurma · on April 14, 2021

Idk man. webassembly.org — the site authored by the inventors — literally starts with “WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine.”

jariel · on April 14, 2021

My answer is correct.

Yes of course WASM is a VM of some kind, that should be obvious enough i.e. people are not running machine code in WASM.

My response indicated that if you want to run JS on WASM you have to build another VM on top of WASM (which you indicated is already a VM, sure).

To do JS in WASM you'd have to build something like V8 (a VM) on top of WASM.

pjmlp · on April 14, 2021

Actually, they might. It just a matter of creating a FPGA implementation.

crazypython · on April 14, 2021

It is a VM. Most VMs use High-Level Intermediate Representation (HIR or bytecode). WebAssembly uses a Low-Level Intermediate Representation. (LIR) A LIR is not an assembly.

FlyingSnake · on April 14, 2021

Can TS be compiled directly to WASM? I don't think WASM and JS are 1:1 compatible, but I'm out of my depth here.

_s47s · on April 14, 2021

AssemblyScript initially targeted TS->WASM compilation, by only supporting a strict subset of the TS language. But at some point they dropped that idea and defined their own TS-like language. I don't know the reason for this, but my guess is that TS is too dynamic to just directly compile it to WASM?

pjmlp · on April 14, 2021

Microsoft has done exactly the same on their MakeCode IoT compiler, using TypeScript and Python.

"MakeCode Languages: Blocks, Static TypeScript and Static Python"

https://makecode.com/language

It is very hard to generate good AOT code when dealing with dynamic languages, of any kind.

qeternity · on April 14, 2021

WASM cannot manipulate the DOM.

jillesvangurp · on April 14, 2021

It can indirectly by simply calling the javascript APIs via bindings. That works well enough and is also how you can use things like webgl, openal and other browser APIs.

But they are also working on more efficient bindings.

_s47s · on April 14, 2021

This unfortunately introduces a lot of overhead and doesn't scale well for larger applications. WebGL calls are already incredibly slow compared to native, and the trampolining between WASM and JS world adds on top.

When WASM got released (around 2017), there was already that discussion to allow direct bindings without a JS roundtrip, but AFAIK there is still no actual implementation for this in any browser.

flohofwoe · on April 14, 2021

WebGL is slower than native mainly because of the additional security-validations compared to a native GL driver, e.g. it cannot simply forward calls into the underlying 3D-API but instead WebGL needs to take everything apart, look at each single piece to make sure it's correct, reassamble everything and then call the underlying 3D-API (roughly speaking).

The calling overhead between WASM and JS is quite negligable compared to that (at least since around 2018: https://hacks.mozilla.org/2018/10/calls-between-javascript-a...).

Another problem is that WebGL relies on garbage collected Javascript objects, but this problem can't really be solved on the WASM side, even with the "anyref" proposal (this would just allow to remove the mapping layer between integer ids and Javascript objects that's currently needed).

jillesvangurp · on April 14, 2021

Doesn't seem to stop MS with Blazor (.Net), Rust, and a few others from doing this. Also, there are plenty of games running in web assembly using bindings for things like WebGL and openal via similar bindings. As far as I know the current situation is pretty workable already and getting better. E.g. garbage collection is coming pretty soon.

I guess it depends on what you are doing. For most people doing web assembly, the point is avoiding dealing with/minimizing the need for interacting with javascript. But still, it seems there are some nice virtual dom options for Rust: https://github.com/fitzgen/dodrio that are allededly fast and performant (not a Rust programmer myself).

ksec · on April 14, 2021

Can anyone provide any pointer or update on this. I remember reading it is coming for the past 3 years and never heard anything. Google Search doesn't show any useful results.

kevingadd · on April 14, 2021

TS predates WASM by years and WASM is also not adequate for executing TS.

not_knuth · on April 14, 2021

Just you wait until Lars Bak [0] gets hired by some company to make a fast WebAssembly runtime. Until that happens I won't take any performance comparisons of WASM vs. X seriously :).

[0] https://en.wikipedia.org/wiki/Lars_Bak_(computer_programmer)

tpetry · on April 14, 2021

So AssemblyScript can beat JavaScript if you benchmark every function and then optimize them by hand every time it is slower?

So most (all?) of the code posted which looked like a straight port to AssemblyScript was slower than JavaScript before optimizing it? I don‘t know how you feel, but i personally don‘t want to optimize every function to get the promised speed :(

onion2k · on April 14, 2021

If your app is doing most of the work it needs to do in 1ms, but one path takes 200ms, then clearly you only need to optimize things on the slow path. You don't have to optimize everything to get a huge perf improvement.

hackcasual · on April 14, 2021

AssemblyScript is still in development, if you're interested in WASM optimizing an app, Rust or C are better bets

TekMol · on April 14, 2021

For me, the showstopper regarding WebAssembly is that browsers do not support a textual version that I can just throw in where I want to hand optimize a function.

If I could just replace my slowest Javascript function with handcrafted WebAssembly code, that would be great.

But having to dabble with external compilers and splitting my code into multiple files is too much of a burden.

candiodari · on April 14, 2021

Shouldn't be too hard to make a library that would allow this. Would you be interested in that ? So let's say something like the following example, would you use it?

This should be possible:

    <script src="some.url/gopherjs.js">
    <script type="application/golang">
      package main

      import "fmt"

      func main() {
        fmt.Printf("yeah baby\n") // Effectively console.log
      }
    </script>

Obviously this would take more than a bit of time to start up (seconds), but the idea is of course that you don't do this once you deploy to production, and replace by inline webassembly.

TekMol · on April 14, 2021

I would surely try it out!

What is "inline webassembly"?

candiodari · on April 14, 2021

    WebAssembly.instantiate(new Uint8Array([0,97,115,109,1,0,0,0,....], { type: 'application/wasm' });

Inline webassembly = directly specifying the wasm binary inside of the html or javascript file.

kevingadd · on April 14, 2021

You can pretty easily just ship a 1kb .wasm module and load it and export a function from it to call from JS. Of course, then all your data needs to live in wasm-accessible memory, and you can't use strings or objects anymore...

RcouF1uZ4gsC · on April 14, 2021

Looking at the C++ code, it seems like you could use std::push_heap/pop_heap to implement your binary heap. The code would be simpler, and there is a chance it could be faster since a lot of the standard library algorithms are very heavily optimized.

maga · on April 14, 2021

A few years ago I did similar comparison but in context of Node.js and sans manual optimizations: https://github.com/zandaqo/iswasmfast

In my work, I have come to conclusion that it seldom pays off to go "native" when working with Node.js. More often than not, rewriting some computationally heavy code in C and sticking it as a native module yielded marginally better results when compared with properly optimized js code. Though, that doesn't negate other advantages of using said technologies: predictable performance from the start and re-using existing code base.

bowd · on April 14, 2021

Totally unrelated to the content (which was really great), I found it interesting that he shared his benchmarking setup as a _private_ gist (https://gist.github.com/surma/40e632f57a1aec4439be6fa7db95bc...) which is actually more like an opaque repository with multiple files.

It has forks, revisions, probably some tooling built around (git -> gist) but it's not indexed and can only be found by finding the link somewhere (in most cases).

Is this a more wide-spread recent pattern? Wondering what's the desired outcome in how it compares to just a public repo.

admax88q · on April 14, 2021

JavaScript is fast. The browser is fast. But communication between the browser and javascript is really really slow.

They are written in languages with incompatible memory models, so lots of data must be copied when communication. They are running in different runtimes, so your javascript JIt can not inline function calls into the DOM.

That's why to this day, if you want to render a bunch of html from javascript, it is faster to generate a giant strin g of markup and pass that to the browser in a single 'innerHTML = "foo"' and let the browser parse all that, than it is to call a bunch of "createElement(); setAttribute(); appendChild();" calls.

zemnmez · on April 14, 2021

I benchmarked this recently and I am pretty sure this is not true. Many of these 'single page site frameworks' work via this method.

admax88q · on April 14, 2021

You are correct, I just did my own benchmark, and the DOM approach is no longer correct.

It used to be the case a number of years ago when I last benchmarked.

tiltrus · on April 14, 2021

Can you share any benchmarks indicating this?

admax88q · on April 14, 2021

I whipped up one here with a random comment dump from HN. It appears I'm no longer correct.

https://gist.github.com/adamvy/afcace8cbdbe56995626f59f6ea2b...

Load this as a script tag in an html file to run.

Last time I benchmarked this it was true, but that was a number of years ago.