WebAssembly: Docker Without Containers

jeroenhd · on Dec 21, 2022

What I'm missing in these articles is a performance comparison. All WASMed tools I've tried were really cool proofs of concept, but the performance was always lacking at the very least.

I see several languages moving towards more and more WASM but on a technical level I don't see the benefit of WASM over something like Firecracker. Docker and other sandboxes have to deal with shared kernels and all the risks associated with that, but leveraging virtual machines instead solves that issue. There are already proof of concept implementations to replace Docker with VMs as a virtualisation layer, so I wonder if it wouldn't be better to invest time in getting those wrappers completely up and running rather than coming up with essentially "Java but we also emulate the OS".

Until WASM advocates start including benchmarks in their blogs, I'll keep watching this stuff from a distance.

kllrnohj · on Dec 21, 2022

> Fast - it can offer native-like speed via the JIT/AOT capabilities of most runtimes. No cold starts, unlike booting a VM or starting a container.

What do you mean? This bullet point had a rocket emoji! Surely you don't actually want evidence to support a rocket emoji?!?

ridruejo · on Dec 21, 2022

No cold-starts means no overhead of starting a process to answer a request like most container-based serverless environments (without having to keep pre-warmed instances which kind of defeats the purpose) A couple references regarding cold-starts and performance in serverless environments.

https://www.fastly.com/blog/lucet-performance-and-lifecycle https://arxiv.org/abs/2010.07115

kllrnohj · on Dec 21, 2022

Are we so what's-old-is-new-again as to be re-inventing fast-cgi at this point? And why are we pretending this has anything to do with WASM instead of just how your API/service is designed?

laurencerowe · on Dec 21, 2022

FastCGI reuses the same process for multiple requests. As I understand it wasmtime now supports very fast startup so you can use a new instance per request (avoiding the risk of inter-request bugs) with very low overhead (5 microseconds on their benchmark https://bytecodealliance.org/articles/wasmtime-10-performanc....)

With Firecracker I believe snapshot restore time is around 2-3ms. In my tests wasmtime ran about 50% the speed of native so depending on your workload it might still be faster for short running jobs where the startup time dominates. (Wasmer was maybe 80-90% of native speed but I don't know their startup times.)

kllrnohj · on Dec 21, 2022

> As I understand it wasmtime now supports very fast startup

WASM startup isn't going to be any faster than native code startup. It's going to be strictly worse if anything thanks to the JIT, although you can AOT that to native and then just restore parity with native code.

Which just gets back to the speed of your startup depends on what your startup does.

laurencerowe · on Dec 21, 2022

From what I understand, wasmtime's fast startup is conceptually similar to forking a process per instantiation, but much faster since it uses lazy initialization and has fewer operating system resources to setup.

Some of those techniques can be applied to native code too, see "On-demand-fork: A Microsecond Fork for Memory-Intensive and Latency-Sensitive Applications" https://www.cs.purdue.edu/homes/pfonseca/papers/eurosys21-od...

But I think wasmtime can always be faster to instantiate since the guarantees provided by the runtime allow it to safely reset and reuse instantiations:

"We implemented an “instance allocator” in Wasmtime that makes use of this copy-on-write (CoW) technique for very fast instantiations. It also uses a Linux syscall known as madvise to quickly “reset” the page mappings back to the original read-only heap image, so we can reuse the same mappings over and over when the same Wasm program is re-instantiated many times. (One might imagine this would be the case in a server serving many requests, for example!)"

https://bytecodealliance.org/articles/wasmtime-10-performanc...

bombolo · on Dec 22, 2022

> you can use a new instance per request

I doubt this will ever be as fast

laurencerowe · on Dec 22, 2022

What approach would be faster that provides similar isolation? It certainly seems a lot faster than fork and seemingly faster than the on-demand-fork I mentioned elsewhere.

ridruejo · on Dec 21, 2022

That’s basically the history of computing! There are some intrinsic advantages to using Wasm vs VMs or containers for certain scenarios, like serverless. That’s very similar to what’s going on with cloudflare workers and V8 isolates. It’s not one size fits all by any means, but it is certainly really good for many scenarios where containers are not

hultner · on Dec 21, 2022

I don’t know about the rest of you but I’m pretty sure rockets are a couple of magnitudes faster then blue whales. Q.E.D. #

dtgriscom · on Dec 21, 2022

... rocket-propelled blue whales? (I mean, if we have sharks with lasers...)

dathinab · on Dec 21, 2022

It's native-like speed for some programs.

But it also depends a bit on the application.

Some applications can benefit a a lot from CPU specific instructions combinations which are not available to wasm (with available I mean implicitly, i.e. your wasm code gets compiled to them).

Luckily for a lot of use-cases this doesn't matter much(1) and some degree of SIMD support is often(2) available.

(1): Without micro-optimizations which most times aren't done as due to their maintenance/development cost.

(2): I'm not quite up to date. I think 128bit SIMD is available in most (all?) relevant WASI runtimes and at least some browsers.

forty · on Dec 21, 2022

Yeah, when I read this sentence, my thought was: how much slower does "fast" mean? :)

solomatov · on Dec 21, 2022

@nine_k shared this https://programming-language-benchmarks.vercel.app/wasm-vs-r... in the comment tree. The results are pretty bad. You could lose 2x or more in cpu perf. There're cases where wasm is pretty close to native though.

naikrovek · on Dec 22, 2022

yeah I have known about this for a while, but no one I've spoken to personally believes me. to them WASM is pure win and there are no downsides.

when I mention performance, they kinda waffle a bit, saying "CPU is cheap" or something similar, and they start to show a hint of understanding when I say that cloud resources are billed by unit of CPU time, and by amount of RAM used. then I say that our mutual employer invokes lambdas hundreds of trillions of times per year and I think they briefly understand before being caught up in "new stuff is awesome" technology fetishism again.

it's exhausting.

everyone should live overseas for a couple years because it changes how you view the world... everyone should be a game developer for a couple years as well, because you will quickly notice just how unbelievably slow modern software is. more people need to see that.

security is important! portability is important! other things are important, always, and when you gain a sense of just how slow software is today in comparison to how unbelievably fast modern hardware is, it becomes very hard for me to think positively of anything that lowers performance further for almost any reason.

dralley · on Dec 22, 2022

I wouldn't call 50% loss "pretty bad". I mean sure, it's not great, but if you were to go from Rust to C# or Java you would most likely see a similar loss.

kaba0 · on Dec 22, 2022

That would depend on the type of code you write. Heavily allocating code can be very fast in case of the JVM, and you can’t always avoid dynamic allocations/arenas are not always a solution.

et1337 · on Dec 21, 2022

The biggest missing thing in my mind is threading support. Great performance isn’t very useful if it only runs on one core.

blacklion · on Dec 21, 2022

The biggest missing thing (for production) is observability.

Look at old-good JVM. It has tons of tools to analyze and understand behavior of your production system. You could have thread dumps (stack traces of all existing threads) at any moment with negligible performance impact, you could dump heap and analyze it off-site, you could have tons of metrics, about each dark corner of mutexes, GC process, about JIT, including, if you need it, generated native code!

Many of these thing you could get on production, not in sand-box.

If you system behaves strangely, live-locks, consume more memory than you think it should, tharsh GC, you name it, you have all tools to understand what is wrong, find bugs or mis-configurations, etc.

With all these new-and-shiny WASM and not-so-shiny JS VMs you mostly in the dark now. Service become unresponsive? latency goes to the roof? Only thing you could do - restart.

It is not property of WASM per se, but this infrastructure is too immature now, comparing to 25+ year old technology.

spreiti · on Dec 21, 2022

This a 1000 times. People like to hate on Java but when there are problems to diagnose on production systems it is second to none.

But from my experience most people don't know these tools even exist so the only thing they do is restarting and guessing where the problem might be if it persists.

cogman10 · on Dec 21, 2022

Flight recorder is a godsend and I've not seen it's equal in any other language/ecosystem.

Any JVM anywhere can answer the question "why am I running slow" with a quick run of flight recorder. Memory, CPU, socket time, GC impact, TLB, thread dumps, etc. It's all there in one file that imposes something like a 1->2% performance impact if you run it constantly.

It's just so good.

ridruejo · on Dec 22, 2022

Completely agree. Observability and debugging are some areas where the ecosystem is quite immature or inexistent. My take is that wasm is more or less where the web was in 97-98 Lots of excitement and possibilities but also lots of technical challenges and experimentation

ridruejo · on Dec 21, 2022

Threading support is already implemented in some browsers and well on its way to standardization https://webassembly.org/roadmap/

Should address that concern. However there is another way of looking at performance and is in the context of serverless where typically single threaded performance is inportant, as well as cold start time etc and that’s why Wasm is popular in that scenario

paulgb · on Dec 21, 2022

“Threading support” in this sense is a bit of a misnomer, it’s really support for thread-safe memory constructs. Creating threads is left up to the runtime. On the web, this is done with WebWorkers, but on the server side I don’t think there is yet a standard way to do it supported by major runtimes.

kllrnohj · on Dec 21, 2022

> Creating threads is left up to the runtime. On the web, this is done with WebWorkers

WebWorkers don't give you multi-threading behaviors (heaps/address spaces are not shared). WebWorkers would be how you launch a new process, but there's still otherwise no way to make a thread (nor even a fork() equivalent for that matter).

paulgb · on Dec 21, 2022

I think you could share an address space by using the same SharedArrayBuffer to back the linear memory of both?

I could be wrong here, I haven’t done it, but I thought this was the reason for supporting atomics in the first place.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

kllrnohj · on Dec 21, 2022

That only shares one allocation (like shared memory does in regular multi-process scenarios), but you still can't share the address space or even any object heaps at all. Like it's not possible to allocate javascript objects out of a SharedArrayBuffer such that you could pretend you had a shared address space by sticking everything in that.

As in, SharedArrayBuffer is equivalent to shm_open. Which means it's not even that good as a shared memory construct as it's missing all the protection enforcement of memfd (or Android's ashmem)

circuit10 · on Dec 21, 2022

WASM stores everything in an array buffer, it doesn't use JavaScript objects because it's not JavaScript (though there are starting to be features that allow it to interoperate with JS objects). If it didn't store everything in a big memory array then it wouldn't really work because C assumes that

And no, WASM doesn't support memory protection

monocasa · on Dec 21, 2022

But shared array buffer is the underlying primitive generally for webassembly accessible memory.

ridruejo · on Dec 21, 2022

Support on the server side is planned, still waiting for standardization https://github.com/bytecodealliance/wasmtime/issues/888

melony · on Dec 21, 2022

No shared memory options?

paulgb · on Dec 21, 2022

There is a SharedMemoryBuffer, but it’s a web platform thing, not available in out-of-browser runtimes like wasmtime or wasmer or wasmedge (which Docker uses).

dathinab · on Dec 21, 2022

"nice" threading support is not there

but you can have threaded wasm code in any evergreen browser since a more then a year as far as I'm aware

Basically the trick is that you use multiple web-workers with the same WAS program and the same shared buffer. Then you also add some JS glue code to coordinate which thread is the main thread and which threads you use as thread pool (e.g. in rust/wasm with rayon you can set it up as worker pool).

Now there are some drawbacks (last time when I used it, might have gotten better):

- threads are started/managed from outside (so don't expect any kind of "spawn" function to work, generally spawning new threads is non-trivial and so is (properly) cleaning up old threads, through if you need a fixed worker pool it's all fine)

- there where some limitations wrt. threading/synchronization which made certain usages of concurrency rather slow (through many where fine)

- no "synchronized" operations mustn't be called from WASM code called by the main JS thread. This means in most situations you need to pass data to web workers and then to WASM (instead of e.g. passing it to WASM and then using in wasm a mpmc-channel to pass it to the worker pool). There are some optimizations around passing pointers as numbers to/from the web-workers but it's limited and not nice. Or at least wasn't ~a year ago.

- bugs in Safari leading to strange crashed for code running in all other browsers nicely under unclear and non-debuggable circumstances (probably fixed, I hope)

Anyway all in all using rust->wasm with rayon and a thread pool was already surprisingly viable ~1 year ago.

moss2 · on Dec 21, 2022

> This allows for legacy applications to be ported to a browser and directly communicate with the JS code that runs in client-side Web applications.

Knowing nothing about WebAssembly, I would guess it's because JS runs on a single thread.

hermanradtke · on Dec 21, 2022

This is only true on the browser. Server-side JS has threads: https://nodejs.org/api/worker_threads.html

koonsolo · on Dec 21, 2022

On the browser you have Web Workers and Service Workers, both run on separate threads :).

https://web.dev/workers-overview/#:~:text=Web%20workers%20an....

https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers...

https://developer.mozilla.org/en-US/docs/Web/API/Service_Wor...

kllrnohj · on Dec 21, 2022

Those are really more like separate processes. There's no ability to do a shared heap in browser JS, meaning it functionally doesn't have threads.

Whether or not workers are actually implemented as threads or processes in the runtime is irrelevant. As far as the JS code itself is concerned & what you can do with it, browser JS is lacking multi-threading. There's just no way to do a shared heap, and that is the biggest defining difference between a process and a thread.

afavour · on Dec 21, 2022

The web has SharedArrayBuffer. It’s just difficult to work with.

kllrnohj · on Dec 21, 2022

Creating a shared memory allocation between 2 processes doesn't convert them to threads. The heaps are still distinct.

azakai · on Dec 21, 2022

The JavaScript heaps are distinct, that's true, but there is a single shared wasm heap which is used from multiple threads. That is enough to implement the pthreads API.

Applications like Photoshop and Google Earth use ptheads on the Web so their compiled C++ is multithreaded, very similar to how it would run natively, and with similar responsiveness and throughput speedups. Though there are some limitations too, see

https://emscripten.org/docs/porting/pthreads.html

koonsolo · on Dec 21, 2022

Practically speaking, you just want shared memory in your threads. What would a shared heap offer that shared memory can't?

flohofwoe · on Dec 21, 2022

The entire WASM heap is a single ArrayBuffer (or SharedArrayBuffer) object.

mike_hearn · on Dec 21, 2022

That's not really threading by the definition normally used in other languages. You can't allocate JS objects or structures and read/write them from multiple threads at once. JS is an inherently thread unsafe language and likely always will be.

skwee357 · on Dec 21, 2022

Not exactly what you've asked for, but I did a "dummy" benchmark of Rust vs WASM on my blog[0]. WASM is impressive technology indeed, and it will be interesting to see whether it will get into the mainstream of software engineering.

[0] https://www.yieldcode.blog/post/native-rust-wasm

fooker · on Dec 22, 2022

> but the performance was always lacking at the very least.

This is an easy engineering problem which will be solved when there's enough motivation and engineers working on it.

Increasing adoption is more of a business problem though, and it is unclear if performance is the bottleneck here.

If you have a wasm product that has to be as fast as native code, the solution is to find compiler engineers (or a company that specializes in this) who will solve this for your situation.

chakkepolja · on Dec 22, 2022

Bro we are hearing about this sufficient smarts compiler for some decades now.

fooker · on Dec 22, 2022

You are right when it comes to general purpose compilers.

You can do a lot more if you just want to speed up your codebase.

All the big tech companies employ multiple hundred compiler engineers each for this purpose.

P5fRxh5kUvp2th · on Dec 21, 2022

I started skimming and skipping because that's the only thing I was really interested in.

tony-allan · on Dec 21, 2022

If WASM+WASI existed in 2008, we wouldn't have needed to create Docker. That's how important it is. WebAssembly on the server is the future of computing.

- Solomon Hykes (co-founder of Docker)

https://twitter.com/solomonstre/status/1111004913222324225

nine_k · on Dec 21, 2022

This sounds incredible, as if the co-founder of Docker fails to understand the crucial value proposition of Docker (hence Docker's financial troubles, maybe).

The point of Docker is the ability to take the existing Rube-Goldberg-machine configurations of software, in any and many languages (including the gluing bash scripts), and put it basically unchanged into a controlled, isolated, replicated, shippable environment, with zero performance penalty.

It's very unlike WASM / WASI approach which requires recompiling stuff, runs non-native code, and completely changes the environment in which the code has to run. It's also like 2x as slow, compared to native code. It has its important upsides, but they are very unlike Docker's, in my eyes.

buu700 · on Dec 21, 2022

This is what initially confused me about comparisons between Docker and wasm, as someone who's long been a fan of both.

As far as I know, wasm won't let me apt-get install a bunch of stuff, set up cron jobs, glue together miscellaneous bash scripts, and magically run it without containers or VMs on any host architecture. That's the use case I'm most familiar with for Docker; wasm as I knew it was just a neat way to run untrusted native code on arbitrary machines with reasonable performance and security.

I think the disconnect is that Docker is also used as essentially a glorified package manager (sort of like Snap), combined with a runtime/interface that makes it convenient for devops purposes. From that perspective, I suppose there's not much difference between running a standalone binary inside its own dedicated Linux environment, and compiling it to wasm to run directly on the host OS, so long as the API remains unchanged.

It's just needlessly confusing to go as far as to call wasm a wholesale replacement for Docker. It's like saying Java is an alternative to Windows.

jcelerier · on Dec 21, 2022

for me the only use of docker is "quickly reproduce whatever $LINUX_DISTRO my users run and make sure that my stuff builds & run on it without having to install a complete VM in virtualbox" ; I also don't really understand how WASM would help with this in any way

xvector · on Dec 21, 2022

It’s absolutely wild to me that Solomon so fundamentally misunderstands the appeal of Docker.

ridruejo · on Dec 21, 2022

Solomon is no longer at Docker (hasn’t been for a while) and Docker Inc is doing extremely well financially after the split and renewed focus on developers. This Wasm release also shows they are very forward looking… I am very positive on the company and what they are doing (no affiliation other than like Scott and the team over there)

mdeeks · on Dec 21, 2022

Genuine question: How much of that financial improvement is due to them requiring companies to now pay for Docker Desktop on macOS? We found ourselves essentially with no option but to pay up for a full year on short notice. We want nothing of their other paid offerings like builds or repo hosting. The sales rep basically confirmed we're just paying for the thing we used to get for free now. The whole call was a giant "FU, too bad". Left an extremely bad taste in our mouths. Obviously we're going to focus heavily on dumping Docker Desktop as fast as possible in the next quarters.

ahtihn · on Dec 21, 2022

> Obviously we're going to focus heavily on dumping Docker Desktop as fast as possible in the next quarters.

If you need to invest significant effort into dumping it, it's almost certainly cheaper to just pay for it. Especially so if the alternative makes any sort of compromise on developer experience.

pantry-man · on Dec 21, 2022

My workplace is also working on dumping docker desktop. Their website says it's $24 per month per user if you have over 100 users. We have around 2000 engineers, so it's half a million dollars a year for something that used to be free.

If it takes two engineers half a year to get a replacement working you're already back in black, and honestly I'm not even sure why (in our specific case) it would take that long when there already are free alternatives.

mdeeks · on Dec 22, 2022

In our case we have multiple tools that wrap docker and at first try none of them worked with podman. Podman doesn't actually have an API socket you can interact with. There is something called podman-helper but for the life of me I could not get it to work reliably. Also the API responses were not formatted the same way so now our code has to detect and fork logic if it is podman

And this is only the first example I saw. Now we have to root out all apps everywhere across the company that might integrate this tightly with docker.

Then there are performance considerations. Docker Inc apparently did quite a bit to improve performance, especially disk perf. We need to verify all of our existing workflows still work reliably.

None of this is hard, it just takes time and effort.

ridruejo · on Dec 21, 2022

Yes, there is not a clear, free alternative. The license kicks in at 10MM in revenue, at that point there are certain things that are easier to pay than try to replace (Slack, Google Apps… ). Everyone has to make their own decision of where they focus their attention/money, in my case I have authorized for every one of my reports that has needed it and just moved on …

acdha · on Dec 22, 2022

I’m mixed on this because I generally like paying for tools but this was too fast (institutional budget year pain) and Docker Mac was characterized by long-running unfixed bugs like high CPU usage. I think they really screwed up their business model and are burning through goodwill trying to recover but I wish they’d tried working with the community.

I mostly use the CLI so I have Podman a try and it took less than a single lunch to completely replace Docker for me. Performance is excellent for ARM, and acceptable for the x86 containers I infrequently use.

https://gist.github.com/acdha/9be1c3521af4f18d9f86264a889581...

holografix · on Dec 21, 2022

Genuine question: do you think people creating software that’s incredibly valuable for you should’t get paid?

You had it for free for a long time? Lucky you!

cogman10 · on Dec 21, 2022

I don't think anyone has a problem with that.

The problem is really more one of "we operated so long without paying and now, blam, everyone pays next year".

It'd be sort of like if github deciding "You know what, everyone now needs to pay $7 a month/user for github". Perfectly within their right, but also a little bit of whiplash for a large number of people.

The next question is if this will last. There's already competitors to docker (rancher/podman/various k8s on my box things).

nine_k · on Dec 21, 2022

Exactly! They decided that you would rather pay than migrate off it, so they can start charging, and stop leaving substantial money on the table.

It seems that their assumptions were correct.

mdeeks · on Dec 22, 2022

It feels like a bait and switch. They got us hooked on a free product and then slapped us with a bill and only short time to change all of our software to not use it if we didn't want to. Also the pricing jumped up the longer you waited to sign. So we had little time to evaluate Rancher and others.

I do agree it is a product that is potentially worth paying for. I do NOT think that the product is worth the amount we were suddenly forced to pay. It felt like extortion.

nine_k · on Dec 21, 2022

Last time I checked, it was completely possible to install Virtualbox and normal Docker CLI on a Mac, and everything worked well (for me), without Docker Desktop at all.

But that last time was on Intel Macs; maybe the situation is completely different on ARM.

mdeeks · on Dec 22, 2022

It doesn't work on M1 last I checked. And anyway even if they updated Virtualbox for ARM, Oracle is predatory and sends nasty emails about paying up if they detect anyone in your company uses it.

nine_k · on Dec 22, 2022

I remember that the open-source Virtualbox part is free from any nagware. The closed-source parts are those needed for desktop integration (good screen, clipboard, file sharing, etc), but they are not needed for running a Linux VM with Docker (or some other container runtime BTW).

ClumsyPilot · on Dec 21, 2022

Except there is no secure docker runtime, and there never will be. If you want secure, you have tp put it in a VM , which gives you a performance penalty again.

Secure means you can run arbitrary untrusted code, and webassembly cam do that, and docker can't.

nine_k · on Dec 21, 2022

Oh shucks. Docker's value proposition is emphatically not security. If you want really good insulation, run a proper VM.

Docker's value proposition is convenient. reproducible, self-contained packaging of software. It's the ability to deploy pieces of existing, battle-tested, gnarly and imperfect software next to each other, and care not about their conflicting or missing dependencies. It's more like Flatpak or AppImage, only more popular and easy.

This packaging also includes a kind of network insulation, exposing only the desired ports, making it easy to have VLANs between containers that do not interfere, etc. This is, again, not a serious security mechanism, but more of a convenience, but a very valuable convenience.

ptx · on Dec 21, 2022

> Docker's value proposition is emphatically not security [...] It's the ability to deploy [...] gnarly and imperfect software

This sounds like a recipe for disaster to me and is why I haven't gotten into Docker.

If the software being deployed is too complicated to build and install without Docker, but Docker doesn't provide secure isolation, how can you be sure that this "gnarly and imperfect" mess of a system is secure?

greiskul · on Dec 21, 2022

It's not about the software being complicated to install without Docker. It's about Docker making the installation process uniform no matter the software.

cstrahan · on Dec 21, 2022

I think you're missing the GP's point.

What docker provides is a way to shrink-wrap a given build and all of its runtime dependencies.

Despite popular misconception, what docker does not give you is a deterministic way to build that software. A Dockerfile provides the RUN steps necessary to build the software, but dependencies must still be fetched over the network, introducing non-determinism.

You can spend 6 months happily using the latest release of some image, only to find that there's a critical bug or vulnerability that needs addressing ASAP. It kind of sucks for that complacency to turn to terror when you try to patch the software and rebuild, only to find inscrutable errors due to an absolutely bonkers build system.

"ERR: Version A of Foo is incompatible with version B of Bar"

Okay... but what happened here? What versions were we pulling before? Oh, the precise version isn't pinned in the build system, so.. I dunno. Great. It would really help if I knew if it was A that updated breaking B, or the reverse.

Then multiply that by 1000x.

And then add in (for Debian-esque base images) Apt repositories disappearing over time, git feature branches being deleted, tarballs falling off the edge of the internet, etc.

Now, before someone says "well, that's on you if your Dockerfile obscures so much build non-determinism!"

I agree with that statement! But that is a non sequitur with respect to the original premise: the build systems (and the web of dependencies they pull in) in third-party software you don't have ownership of is getting crazier and crazier, and Docker helps perpetuate this state of affairs, and the industry suffers as a whole.

pnt12 · on Dec 22, 2022

> Despite popular misconception, what docker does not give you is a deterministic way to build that software.

I read an interesting thought: reproducibility is a spectrum. Docker isn't as reproducible as nix, but when used with version control and ci/cd, is damn more reproducible than zips with code and ftping them to servers.

Sohcahtoa82 · on Dec 21, 2022

> how can you be sure that this "gnarly and imperfect" mess of a system is secure?

You can't.

But it's not like escaping a container is going to happen because of a simple bug. You need an exploitable vulnerability in the containerized app that creates a path to escaping the container.

But yeah, if you want to isolate an app for security reasons, then you need a VM.

ptx · on Dec 21, 2022

Right, but if Docker allows us to package much more complicated applications (as opposed to being forced to simplify) that gives the bugs more room to hide and increases the risk of unforeseen interactions resulting in security bugs in the application.

So what I'm trying to say is that making complicated applications easier to deploy doesn't seem like a win unless you also mitigate the increased security risk that comes with more complicated applications.

xvector · on Dec 21, 2022

Docker isn’t billed as a security solution but that doesn’t mean that it doesn’t come with some good defaults. You get some namespace/cgroup/seccomp/etc defaults OOTB which is still probably an improvement over what the organization currently has.

pxc · on Dec 21, 2022

> Docker's value proposition is convenient, reproducible, self-contained packaging of software.

Docker doesn't solve any packaging problems, though. It just piggybacks off of other package management solutions and allows ad-hoc, unmanaged modifications to OS images (the convenience) and contains that result for easy distribution.

But nothing in that process ensures reproducibility— the package managers wrapped in Dockerfiles are typically non-deterministic: what any set of commands for them will do depends on the state of the internet at the time they run. Similarly, composing Docker layers is not like composing packages: reuse of packaged objects is minimal rather than maximal, granularity is course, dependency management details may vary from container to container (as they may be based on different Linux distros or language-specific package management ecosystems), and it's very easy to end up with software and configuration installed with which there is no associated package management metadata.

Docker doesn't know anything about packages. Container scanning tools that do things like produce a bill of materials or scan for known vulnerabilities inside Docker containers simply have to guess at what distro is installed inside the container and then reconstruct that information in a distro-specific way to the best of their ability! Docker doesn't solve package management issues so much as punt on them (which is, of course, convenient, because package management is hard).

> [Docker is] more like Flatpak [...] only more popular and easy.

I don't think this is a sound comparison, either. Flatpak is a desktop-oriented containerization solution, for packaging graphical software that will predictably need to interact with the local filesystem, GPU, sound, and other resources. It's also a solution that tackles security updates and deduplication in a serious (and effective) way involving some discipline and enriching shared runtimes with actual metadata rather than just composing filesystem layers together.

It may indeed be easier to crap out something which will be considered a valid Docker image than it is to crap out something which will be considered a valid Flatpak application, but that does not make Docker easier. It's not easier for a desktop user to keep a collection of 50 Docker containers patched for security fixes than it is for a desktop user to keep a collection of Flatpak applications patched for security fixes. It's not easier to take a random Docker container and plug it into your operating system's native file picker or sound system for use with graphical applications than it is to do so with a random Flatpak application. It's not easier to determine what the heck exactly is actually installed in a Docker container than it is to see what is in a Flatpak container. It is not easier to plug a random Docker application into your operating system's default password manager, and so on, and so on.

Writing Flatpak packages requires actually thinking about things that Docker doesn't because Flatpak actually solves package management issues (and other things) that Docker doesn't.

pxc · on Dec 21, 2022

Downvotes without replies is always disappointing.

I would be curious to hear what is wrong in my comment above from anyone who has actually worked on general purpose packaging (e.g., written a package to be included in or overlaid onto a ports tree, maintained RPMs built from RPM spec files, run their own Ubuntu PPA, etc.), implemented tools that scan containers (e.g., SCA scanning or SBOM generation tools), or done reproducibility research.

Would someone with an awareness of full-fledged package management solutions based on or built with containers (e.g., Luet, Distri, Flatpak) really argue that having fine-grained abstractions for reasoning about dependencies or shared runtimes, performing security updates, etc., makes no difference as to what kind of software we're talking about and what problems it solves?

To me it seems obvious that

  - not all software distribution mechanisms are package management solutions
  - Docker cannot see or reckon with individual packages
  - the Docker ecosystem relies on rather than replaces package managers, build systems, etc.

and so on. Are there serious arguments to be had here about those things, or do people just feel like my earlier comment was somehow unkind to Docker?

nine_k · on Dec 22, 2022

Docker containers solve the packaging-for-deployment problems that a ton of people used to have. Were they not solving it, they won't be so predominant.

Docker does handle deduplication on a certain level: every layer is only built once, and shared among all images that use it. This can be strategically used to seriously reduce the summary size of your containers.

Desktop users are not the target audience of Docker, except if you consider running a sham prod configuration on your dev machine desktop use. Containers are intended for the server side, and they are fine there.

Not sharing too much, and plainly embracing the existing chaotic practices of software creation and containing them, so that they don't interfere with each other, is the core value proposition of Docker containers. They do not require you to change your existing key practices at a lower level; your Babel / CMake / pyenv / whatnot setup can remain. But it changes the deployment story of it.

mkher8 · on Dec 21, 2022

> Oh shucks. Docker's value proposition is emphatically not security.

Congratulations, you have just discovered the additional value that WASM will bring to the Docker approach.

nine_k · on Dec 21, 2022

WASM is the "run a proper VM" approach mentioned in the first line.

WASM is great, but it solves a different problem.

cmeacham98 · on Dec 21, 2022

There are serious attempts at secure container runtimes (see gVisor) and runtimes that run container images in a real VM (see Firecracker).

This meme that containers are inherently insecure just because Docker doesn't attempt to be a security product needs to die. Docker hasn't been the only player in the container runtime space for a long time.

Thaxll · on Dec 21, 2022

Insecure is a very strong word, world has been running on Docker for sometime and it works fine and is not as insecure has you seem to think.

ClumsyPilot · on Dec 21, 2022

The world does not rely on docker for security at all, cloud providers put each tenant of container runtimes behind an additional barrier, like a VM.

Secure runtimes are a superior solution to virtualisation and separate kernels. There are only two secure runtimes in common use - for javascriot and webassembly.

The point about hardware support is very unfair - if you invest the same level of effort and hardware support into webassembly, you can also i prove its performance. Thats like compaining that electric cars suck because there are no chargers - its just infra.

cmeacham98 · on Dec 21, 2022

I agree that Docker actually has a relatively good track record, but it is true that Docker will never be on the same level as a VM hypervisor that was designed from the ground up to be a security barrier.

Thus, when people bring up "Docker is insecure" I try not to get down in the weeds arguing about the specifics, and instead point out alternative projects that are designed with security in mind. I find it's a much stronger counterargument.

erk__ · on Dec 21, 2022

I think that security has always been a goal of FreeBSD jails and I believe they are a bit more hardened than docker

vermaden · on Dec 21, 2022

Docker only uses namespaces and cgoups.

Docker does not provide and security or isolation.

To have security and isolation with Docker you must use something external like SELinux or AppArmor.

Hope that helps.

Regards.

rapidlua · on Dec 21, 2022

> Docker only uses namespaces and cgroups.

How is that not isolation?

kaba0 · on Dec 22, 2022

The docker daemon has basically root over your system, so any escape can end very badly.

xvector · on Dec 21, 2022

Docker also uses seccomp.

drdaeman · on Dec 21, 2022

How many folks out there need to run arbitrary untrusted code on their systems? I think most of the time, the code they run is either their own or from a trusted third party.

And for those use cases, where secure containment of running applications is not a goal at all (beyond maybe taking some basic precautions), I fail to see any value to recompile it to WASM, CLR, JVM, Z80 bytecode or whatever else.

Those who need isolation - yeah, WASM could be a very solid alternative to having a VM. But it's a pretty niche use case.

cube2222 · on Dec 21, 2022

Isn't gVisor a secure docker runtime?

Sure, it doesn't reuse the kernel, but it's not a VM either.

jakogut · on Dec 21, 2022

> you have tp put it in a VM , which gives you a performance penalty

This claim is dubious for many configurations of hardware accelerated virtualization. The hardware creates another ring 0 for each guest kernel, and guests run at the same level as the host. It's true that layering things like filesystems and networking incur overhead, but it's just as easy to pass through a physical disk, and bridge virtual TAP interfaces to physical NICs.

Hardware virtualization is very flexible, and there's a configuration out there that will meet the performance requirements of the vast majority of projects.

nine_k · on Dec 21, 2022

A native code VM gives really little performance penalty.

Consider VMs like those which run Javascript, or WASM. JIT compilation can get pretty close to C performance, as JVM and LuaJIT show though, given enough RAM at runtime, and money for development.

groestl · on Dec 21, 2022

Didn't say secure though, just isolated. Often I'm fine with some subset of isolation.

yjftsjthsd-h · on Dec 21, 2022

runq exists, and gvisor can be used as a docker runtime, so yes docker absolutely can.

enos_feedler · on Dec 21, 2022

the crucial value proposition of Docker was to show megacap cloud providers that containers are a thing for developers and to build them first class into their cloud platforms -> kubernetes. I went to a GCP conference shortly after Docker went big and everyone was talking about it. It was no surprise that Google mentioned the word "container" 100s of times throughout the full day conference and never mentioned the name Docker once.

kaba0 · on Dec 22, 2022

> The point of Docker is the ability to take the existing Rube-Goldberg-machine configurations of software, in any and many languages

Well, Docker is not good at this, your Dockerimage can be as non-reproducible as it gets, it just pushes the problem to a different level. Nix and other package managers are the actual solution to this issue.

solomatov · on Dec 21, 2022

> It's also like 2x as slow,

Do you have links to recent benchmarks? My understanding is that there were investment in bridging the gap recently.

nine_k · on Dec 21, 2022

I looked at stuff like https://programming-language-benchmarks.vercel.app/wasm-vs-r...

(Comparisons with small input sizes are not informative; look at larger runs.)

solomatov · on Dec 21, 2022

Thanks a lot. The difference is huge unfortunately :(

talkingtab · on Dec 21, 2022

Good point.

I just wonder how many founders and co-founder fail to understand the crucial value proposition of their business. I suspect one attribute of successful start ups is that over time they come to understand that aspect. And perhaps when we hear of startups "pivoting" that is not the result of a though process but instead a forehead slapping "why didn't I see that?" moment.

pjmlp · on Dec 21, 2022

Indeed, he just had to make use of JVM or CLR based application servers instead.

That is what this whole trend is all about, replicating application servers with WASM.

Every time I need to dive into k8s stuff, I can only think "this was so much easier when configuring WebSphere and EAR deployments".

orf · on Dec 21, 2022

Nobody is saying WebSpehere didn’t have benefits and configuring it likely was easier than full-blown Kubernetes.

But the similarities are shallow at best and tied you into a single VM.

Saying “this whole trend is all about replicating application servers with WASM” makes it sound derivative. I mean, yes, but only yes in the same way “cloud computing is just replicating large remote shared mainframes”.

cmrdporcupine · on Dec 21, 2022

The problem is that WASM isn't really what you're saying. Yes, it's language agnostic. But as part of its agnosticism, it is also completely lacking in basically any runtime services. Yes, with WASI we get a POSIX-type API, but we're still lacking garbage collection, sophisticated memory management, optimized complex types, monitoring conventions etc.

This is great for running existing "native" type code compiled from C/C++/Rust but its profoundly unsuited to the kind of development that most application or service developers do, which is in higher level languages with automatic memory management, monitoring / profiling services, etc. All of which either have to be re-invented in the WASM world, or run inside the WASM container at 2x or more the runtime/energy cost. And for what benefit?

It's one thing to get a game engine running in a web browser. Neat hack / potentially decent way to ship a client.

It's another thing to try to repackage existing working, relatively well engineered, server runtime systems inside it for almost no benefit at all.

TDLR: WASM is not a universal VM appropriate for server apps. It is a solution for shipping a certain kind of application in a certain circumstance. There are other, better, solutions for "containerizing" services.

Finally, after 25 years in this industry, the world I want to head to is higher level, where things are managed declaratively with explicit, visible, well described rules and relations and logic. WASM seems to me to push the other direction. Black boxes of fairly low-level code, each reinventing its own runtime wheel and with almost no visibility from the administration side of what's happening in there. I find that kind of sad.

orf · on Dec 21, 2022

GC semantics are highly specific and coupled to the language. Of course WASM couldn’t (and shouldn’t!) deal with that.

I mostly agree with your overall point though. I’m not making the point that WASM right now (or even later) is the future of deploying backend services, however it is somewhat of a universal VM. If it’s a useful one is yet to be seen.

cmrdporcupine · on Dec 21, 2022

Yes, of course they're highly specific. And that's my point. Why would I run a GC language inside my WASM runtime, at a 2-3x or more performance overhead? Instead of just running that languages' existing runtime which has been tuned for years, and already provides its own virtual machine? What is the actual benefit?

Putting it more clearly: Most services development is done in languages that have their own virtual machine (JS/TS, Go, JVM, Python, .NET). In what world does it make sense to run that VM inside another VM?

Finally, I think the experiences over the last 20 years around .NET and the JVM should have shown there is in fact not really such a thing as a truly universal abstract VM. A well-written VM tends to be written towards supporting the language(s) it is built for.

... Not unless you're willing to throw away almost all added value, and then you start looking like WASM. And then what's your value beyond native code, running on the hypervisor and/or in a container?

(There are in fact proposals for adding GC hooks in WASM. I'd have to spend some time reading up on them to evaluate whether they address my objections.)

pjmlp · on Dec 21, 2022

Well, those that act like WebAssembly is reinvinting the world kind of do.

And applications get tied to the WebAssembly ecosystem, it is also a single one.

orf · on Dec 21, 2022

While it’s true that it somewhat locks you into a single VM type (WASM), that’s very different from being locked into the JVM.

For one, the idea is that it should be fairly simple to compile an arbitrary program to WASM, allowing you to use a far wider variety of languages.

In this case, it’s more akin to “docker with extra steps” as opposed to “docker but you can only hire Java devs”

mike_hearn · on Dec 21, 2022

Can you actually compile an arbitrary program to WASM? I thought they had to be ported to WASI first and can't use any operating system APIs. Otherwise, how can it be sandboxed? Arbitrary programs can call into arbitrary OS-native APIs and execute arbitrary native code outside the bounds of the WASM VM, including things like JITing native code.

orf · on Dec 21, 2022

It depends on the program being compiled, but yes you can.

You can run Postgres using WASM in two different ways[1][2]. That’s a non-trivial codebase.

1. https://www.crunchydata.com/blog/learn-postgres-at-the-playg...

2. https://supabase.com/blog/postgres-wasm

mike_hearn · on Dec 21, 2022

That works by compiling some sort of Linux VM into WASM as well, which isn't arbitrary programs (that would have to include Windows and macOS programs too).

orf · on Dec 21, 2022

I feel like you’re nitpicking and ignoring the actual argument you originally made?

You can compile arbitrary programs to WASM, like you can compile arbitrary programs to x86 or ARM. It’s effectively a CPU target. You can take the whole of Python and SQLite, compile them to WASM and run a web framework on top of that via WASM in the browser[1].

If those programs utilise specific os-level behaviour that can’t be shimmed, or explicitly throw an error when being compiled to WASM then of course they won’t work without modifications.

Meanwhile, the JVM is not anything like a CPU target. It’s a very high-level VM designed for a particular type of gc’d and jitted language.

1. https://simonwillison.net/2022/May/4/datasette-lite/

kaba0 · on Dec 22, 2022

Making a big byte array and using some very basic jvm instructions will give you a very low-level compilation target for basically any language.

pjmlp · on Dec 21, 2022

Well, http://nestedvm.ibex.org/

orf · on Dec 21, 2022

If the best counterpoint I could find is a dead proof of concept project from over a decade ago that only works on an unsupported compiler released 17 years ago, I would reevaluate my position

pjmlp · on Dec 21, 2022

Unfortunately it lacked the buzzwords of VC money to make it a trendy topic.

sitkack · on Dec 21, 2022

You realize nestedvm is compiling native code to MIPS and then inlining the MIPS interpreter into the generated class files? Had the JVM supported unsigned types, none of this would have been necessary.

Wasm is a refinement of the ideas in the JVM, with a couple good JVM on Wasm solutions already existing. The best of which is CheerpJ.

Instead of referencing NestedVM, you should link to GraalVM which I know you are aware of.

layer8 · on Dec 21, 2022

Unsigned types are trivially emulated using signed types. All arithmetic operations except division/remainder and comparison are identical on the bit level, and the latter are supported via `Long.divideUnsigned()` and friends, which can JIT to the native unsigned operations of the underlying platform.

The main difference with regard to compiling “arbitrary” programs between the JVM and WASM is that the JVM doesn’t have untyped linear memory like WASM does. WASM isn’t type-safe within the linear-memory regions (which is what allows C-like languages to be compiled more directly), whereas JVM ensures type safety for all objects.

Saying that “WASM is a refinement of the ideas in the JVM” is simply wrong, they have different design goals and therefore implement different design trade-offs.

sitkack · on Dec 28, 2022

> JVM and WASM [sic] is that the JVM doesn’t have untyped linear memory like WASM does.

As you say, it can be trivially emulated, no reason that linear memory couldn't be implemented on the JVM using an array of primitive (char,int,long).

The Wasm VM is more generic and has a better capabilities security model than the JVM. Had the JVM been more like Wasm (signed and unsigned primitives, capabilities model), it would have been a natural compilation target for sandboxing native code.

As Wasm gets more features it becomes more like the JVM (GC, reference types, component model). Eventually Wasm will subsume all the features that differentiates them.

pbecotte · on Dec 21, 2022

I don't think the statement "fairly simple to compile an arbitrary program" hasn't been true for any ecosystem, ever.

treis · on Dec 21, 2022

>single VM type (WASM), that’s very different from being locked into the JVM.

Why is that different?

orf · on Dec 21, 2022

You can’t run Postgres on the JVM.

WASM is pretty different to the JVM. The JVM deals with a lot of higher level constructs like objects, constructors, virtual methods, the GC etc. Which is fine if the language you’re hosting works in that way.

WASM is more like assembly - the raw intrinsics used by a hypothetical WASM CPU. So you can compile a lot more stuff to it, because it’s a more natural target than a much, much higher level VM like the JVM.

mike_hearn · on Dec 21, 2022

Although that's true (ish... you can run WASM on the JVM now, and also LLVM bitcode), that takes you into the realm of why you'd want to.

On the browser side, you could maybe make a ChromeOS style argument of just wanting to run everything in a browser no matter what, it's nice for it to be sandboxed etc. But if you look at what your Postgres is doing it's sort of a Linux VM, and you can run those already at full speed: that's WSL and there are various solutions on macOS like Docker Desktop.

On the server side, the reason nobody bothered trying to run Postgres on the JVM before Graal is that it's unclear what the benefit is. Security? Processes and kernel isolation seems to work well enough on the server, and anyway Postgres is a highly trusted component anyway. CPU independence? That hasn't been useful since stuff like SPARC died, we are now just starting to see ARM chips appear on the server, but compiling native code for both intel and arm isn't hard and Linux distros have the infrastructure to do it for a long time already. For custom servers, well, not many are writing them in C/C++/Rust anyway these days. RISC-V remains mostly theoretical.

To what extent is this doing it because it can be done, vs delivering real benefits? My mind is open but the low level nature of the WASM instruction set also means the VM can't offer many benefits to the programs inside it, nor the users outside it.

orf · on Dec 21, 2022

Let me rework and clarify my point here a bit.

The JVM has historically and famously sucked at sandboxing untrusted/partially trusted code. The JVM also isn’t a suitable compilation target for arbitrary and existing codebases.

WASM is built to sandbox untrusted/partially trusted code. WASM is built as a compilation target rather than a complete hosted VM with bells and whistles.

There are advantages and disadvantages to this. One advantage is language-agnosticism. Another might be around the ability to run user-supplied untrusted code in a much safer way. See Cloudflare functions.

One disadvantage is the lack of bells and whistles.

So, the answer to “why is this different to the JVM” is that.

mike_hearn · on Dec 21, 2022

"The JVM has historically and famously sucked at sandboxing untrusted/partially trusted code. The JVM also isn’t a suitable compilation target for arbitrary and existing codebases."

We need to drill into this a bit more, because the WASM ecosystem can certainly learn lessons and do better than the JVM but this isn't quite the right set of lessons to learn.

The JVM spec was written from day one to sandbox arbitrary and partially trusted code. The SecurityManager architecture is now being removed, but that's a big project exactly because it was deeply integrated into everything. It's also embeddable and a compilation target (it interprets bytecode), and later it was extended with features designed to make it more language agnostic as well at least for everything at the same level of dynamism of Java or higher (so indeed not C/C++ but yes for most other langs).

So the interesting thing to debate here is not really the design goals, which are very similar, but what concretely will make WASM more successful at achieving these goals.

For example, what made the SecurityManager difficult wasn't something fundamental to the JVM but rather that code which is both useful and sandboxed needs APIs that let it do privileged things in controlled ways, and a lot of the bugs were in those API implementations or on the boundaries. That's especially the case on the desktop where sandboxed code had to call into a lot of OS libraries.

On the web this problem is solved with the browser makers exposing JS/renderers to WASM or just saying do the tricky stuff in JS, and then wrapping the whole thing with kernel sandboxes and IPC to handle the fact that the JS/WASM sandbox itself will inevitably fail in the same way. In other contexts this, well, doesn't seem to be solved, really? The moment you start exposing APIs to the WASM code you face the same problem. Also these days you have spectre to think about, so maybe you need a separate process sandbox anyway. Alternatively you can do what the GraalVM guys are doing (in their EE) and using Intel MPKs but that's pretty advanced and I didn't hear about anyone else doing that.

Now there are still some crucial differences! The JVM wanted to allow sandboxed code to interop smoothly with higher privileged code. This opened up a bunch of reflection-based bugs whereby you could reflect your way to the SecurityManager and switch it off, confused deputy attacks and so on. There's no equivalent in WASM, but that's partly because (current?) WASM doesn't really try to define a fine grained permissions model or a way to mark some bits of code in a program as more privileged than others. It also doesn't provide a large set of pre-implemented APIs. The sandbox is whatever the developer exposes to the context. This doesn't make WASM stronger, it just means it punts the really hard bits to the user i.e. browser devs. GraalVM's new JVM sandbox (not SecurityManager based) works mostly the same way, as do process sandboxes so this is definitely the trend, but of course there was a reason the SecurityManager was created that way and it's because it requires way more code and work by the developer to sandbox code if you don't have support for tight mixing. So maybe the sandboxes that do exist will be stronger, but there'll be less sandboxing overall and permission scopes will be much wider. Is that the right tradeoff? I'm not totally sure it is but eh, people really like all-or-nothing and that's the way the industry is heading now.

At any rate that discussion is a bit academic, because you can't do an OOP capabilities type architecture in C anyway.

What about language agnosticism? Again the hard part here isn't having a common bytecode - CPUs already provide that - it's all the engine bindings and semantic alignment required. If you want to pass a std::time into JavaScript then something has to bridge that gap, if you want to call into a dynamically typed language from a statically typed language, then something has to generate interfaces for the compiler to check against and so on. Here I don't see what WASM has to do with anything really, it's just not in scope. Compiling a Python interpreter to WASM doesn't make it any easier to call Python from C++, or Java, or JavaScript. The SOTA there is Truffle/JVM by far.

sitkack · on Dec 21, 2022

Wasm composes in the way that the JVM/SecurityManager did not. You can implement your proxy functions in wasm, not only defining the set of functions that a Wasm program uses to talk to the outside world, those proxy functions that you pass in can also be implemented in wasm inside of another context.

Wasm can learn from the JVM for sure, but if you went almost 30 years back in time, you would have some great tricks to teach the JVM, learned from Wasm.

orf · on Dec 21, 2022

I don’t really have much else to add to this discussion right now (it would require too much brain power whilst sitting next to a fire), but I would like to thank you for this very detailed and informative comment.

mike_hearn · on Dec 21, 2022

Enjoy the fire, sounds lovely!

whimsicalism · on Dec 21, 2022

LLVM on the JVM is news to me, have a link?

mike_hearn · on Dec 21, 2022

https://www.graalvm.org/latest/reference-manual/llvm/

treis · on Dec 21, 2022

But there's not much point to running Postgres in WASM. It's written in C to avoid the performance overhead of a VM in the first place.

int_19h · on Dec 26, 2022

The problem isn't that JVM deals with higher level constructs. It's that it doesn't deal with low-level constructs.

Consider .NET / CLR for another example. Like JVM, it also deals with objects, methods etc conceptually on bytecode level. But it also deals with raw pointers and pointer arithmetic and other such stuff. As a result, you can efficiently compile e.g. C into that bytecode. So wasm isn't really new in that sense, either.

imtringued · on Dec 21, 2022

What's wrong with a "JVM but for everyone who isn't a Java developer"?

If the model works, why shouldn't there be competitors and wide spread adoption?

espadrine · on Dec 21, 2022

JVM wasn’t just for Java developers (Kotlin, Clojure, Ruby could compile to it); but it was easiest for Java, and other languages didn’t elect to build on the JVM interface by default, rather adopting POSIX (and either a custom bytecode or assembly) as the interface.

WASM likely won’t convince all languages to switch to its bytecode by default. Its adoption story requires enough people to maintain this non-standard compilation pipeline.

Which is fine, but beyond the bytecode, the productivity will only be maintained if it offers an equal or superior interface to POSIX. Right now, WASI is not it. A lot of basic elements are experimental, like file seek, multi-process, threads, SIMD, GPU programming…

People will believe the hype, miss the caveats, use it at work, and have their project fail. Companies will blacklist the technology.

In my mind, it is too early for them to be so publicly dithyrambic. All WASM publications should link to a page that details all WASI features that are still experimental.

pjmlp · on Dec 21, 2022

Nothing, that is after all how CLR was designed, just lets not pretend it is something new.

fulafel · on Dec 22, 2022

"JVM but for everyone who isn't a Java developer" would be good but WebAssembly isn't really suited for it. It works for some static native-code targeting languges (C, C++, Rust, Zig, etc) but doesn't work well for languages that most application developers use (.NET languages, TypeScript/JavaScript, Python, Clojure, Java, Ruby, etc).

dragonwriter · on Dec 22, 2022

> "JVM but for everyone who isn't a Java developer" would be good but WebAssembly isn't really suited for it. It works for some static native-code targeting languges (C, C++, Rust, Zig, etc) but doesn't work well for languages that most application developers use (.NET languages, TypeScript/JavaScript, Python, Clojure, Java, Ruby, etc)

.NET/Java/Clojure, maybe not. Ruby and Python work via an interpeter for the platform, just like they do on JVM/.Net, except that, unlike JVM/.NET, in both cases they can use the normal C-based interpreter, compiled for WASM.

fulafel · on Dec 22, 2022

You can do it, but it remains to be seen if we get a lot of use cases where there are more positive than negative tradeoffs from targeting a VM implementation to WebAssembly and then running a VM inside a VM that way. I guess eg Python could be useful in cases where you want to use it as a glue language between WebAssembly components...

ridruejo · on Dec 21, 2022

One way of looking at it that helped me wrap my head of why “this time is different”, is that Wasm is not so much as a language but a compilation target (as say x86) so it can really run anything

pletnes · on Dec 21, 2022

That’s what JVM promised! They just made Java so there would be at least one familiar-looking (to C++ devs) language for it. Indeed, there are quite a few JVM languages out there, just not as many as some had hoped in 1995.

cle · on Dec 21, 2022

The JVM executes Java bytecode, which is a compilation target for Java and many other languages. In this regard, architecturally it is exactly the same.

acdha · on Dec 22, 2022

Kind of - it was designed for Java so other languages were suboptimal for a long time, especially dynamic ones. Combined with the low quality of Java application servers and tooling, that approach was unpopular by the time things like InvokeDynamic matured and then Oracle’s licensing moves gave a lot of places reason for caution.

int_19h · on Dec 26, 2022

The biggest problem when compiling to JVM bytecode isn't dynamic languages - even if they are slower, it's tolerable. It's stuff like C++, where the whole point is being fast, but JVM simply lacks the necessary primitives to compile to.

pbecotte · on Dec 21, 2022

It can run "anything" ... so long as someone has set up that project to correctly compile to a wasm target. "docker build" lets you build a package out of any software, without having to know much about it. "setting up a compiler for a new project, given the source code and maybe a separate toolchain for some other target that works", is a much more involved task.

There is no world where people are just grabbing an existing app and saying "hey, I'm gonna drop this into my wasm runtime real quick"

FlyingSnake · on Dec 21, 2022

Scala, Kotlin, Groovy proved that JVM can be used as a compilation target. Same was proven with Iron* languages on CLR. How is it different this time?

ilyt · on Dec 21, 2022

Well, having essentially a VM that was supposedly designed to be language-agnostic and easy to sandbox is a benefit over a VM that is designed for single language and never put much thought into embedding.

pjmlp · on Dec 21, 2022

Even being language-agnostic was a path already trailed by Burroughs and IBM mainframes, besides lots of other ones.

ilyt · on Dec 21, 2022

Well, everything in IT is circular, just before you had to pay tens of millions to IBM for the priviledge

threeseed · on Dec 21, 2022

a) Kubernetes is far simpler and more consistent to me than WebSphere/EAR.

b) Kubernetes is the platform as well as the application server.

pjmlp · on Dec 21, 2022

The amount of YAML spaghetti I have to deal with says otherwise.

vidarh · on Dec 21, 2022

The yaml for my home test cluster takes up more lines than the full orchestrator and config I wrote at one of my past jobs to manage a 1000 VM cluster. Kubernetes might well make sense "at scale" but for anything of moderate size I can't help but feel it's massive overkill.

bzzzt · on Dec 21, 2022

Dealing with the XML spaghetti from most Java EE containers isn't much better though.

pjmlp · on Dec 21, 2022

- Schema validation

- IDE code completion

- Can be machine generated/updated via the GUI management administration and graphical tooling on IDEs

Good luck doing that with YAML.

orf · on Dec 21, 2022

Literally every single one of those are well supported by Kubernetes and YAML.

1 and 3 are actually foundational to how k8s works.

Forgive me for saying this, but I’m getting a “i don’t want to invest any effort understanding anything and so Kubernetes is bad” vibes from these comments.

pjmlp · on Dec 21, 2022

So fundamental that those tools don't exist at all.

orf · on Dec 21, 2022

Hey, so I thought I remembered your username. This isn’t the first interaction we’ve had, or I’ve seen you have, that follows this similar pattern. In fact it’s the third example from you under this post!

It’s not a particularly pleasant experience to discuss anything with you, as after you make a particularly vapid comment that is naturally rebuffed you seem to just try to make snarky replies rather than engage.

Please understand that if you post your hot takes here they may be discussed and challenged, and if you don’t want this then I would refrain from initially commenting.

In response to your comment: They do. All Kubernetes resources are typed with JSON-schema definitions. Because of course they are, how else would kubernetes validate anything. https://kubernetesjsonschema.dev/

Anyone who’s used k8s at all knows this, if only from the error messages. From this you get autocompletion and a wide ecosystem of gui configuration tools that work with everything, including custom resource definitions, which is really cool.

I used to like lens (https://k8slens.dev/desktop.html), but now I use the k8s plugins from IntelliJ

pjmlp · on Dec 21, 2022

Well, I learned about the InteliJ plugin, I guess.

Which I don't use hence why I wasn't aware of it.

orf · on Dec 21, 2022

Yes, your lack of knowledge was never in question.

> The Dunning–Kruger effect is a cognitive bias whereby people with low ability, expertise, or experience regarding a certain type of task or area of knowledge tend to overestimate their ability or knowledge

https://en.m.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effec...

pjmlp · on Dec 22, 2022

Lovely Internet discussions.

orf · on Dec 22, 2022

If you want lovely internet discussions, take another look at how you reply.

bzzzt · on Dec 21, 2022

I'm perfectly capable of writing correct code by hand (be it XML or YAML), my problem is with the contents of the configuration files. Especially for the legacy Java EE servers there's loads of boilerplate stuff that has to be there for some reason but you don't have any idea why. Until something breaks and somewhere a magic setting has to be 'fixed'...

pjmlp · on Dec 21, 2022

Ah the typical only others are bad coders.

acdha · on Dec 22, 2022

“Bad” is a value judgement. I’d go more with “taking on problems we don’t have” - J2EE made more sense if you were in a business like Atlassian’s where you sell an app to customers who wants to be able to run in it a bunch of different ways and configure many things without access to the source code. That especially made sense in the older era where apps were managed by sysadmins who didn’t have compilers and wouldn’t have gotten the source code from a vendor or wanted to pull it out of CVS.

That’s legitimate but, especially for open source developers, overkill and it pulls in a ton of maintenance work (e.g. I have several apps with a ton of CVEs in components which were never used but can’t be removed without breaking things, and thousands of lines of XML configuration which has to be analyzed to look for intentional customization to upgrade to the release version which has been patched. Now, maybe that’s doing it wrong but that’s multiple separate Java specialist development shops so it’s not just me being dense.).

pjmlp · on Dec 22, 2022

How is that any different from having Kubernetes experts keeping a cluster alive?

acdha · on Dec 22, 2022

Part of that is scope (Kubernetes does so much more) but most of it is the benefits of decades of experience and having a clean design. Java application servers were designed by and for Java applications so they blur a lot of lines whereas a Java (or Python or Node, etc.) developer can use Kubernetes without learning any Go tools.

_boffin_ · on Dec 21, 2022

I’m failing to see why you can’t do that with yaml. Please enlighten me.

rowls66 · on Dec 21, 2022

You can do it with YAML, and when you do, you end up with something big and complicated like XML became.

lolinder · on Dec 21, 2022

You should really try the k8s plugin in intellij. It does all of the things OP asked for without adding a single extra line of complexity to the yaml file.

pulse7 · on Dec 21, 2022

If this statement is true, there is no need for Docker, because JVM+JAR existed in 2008! Docker can do more than WASM or JVM+JAR: it can run non-WASM and non-JVM apps like PostgreSQL, etc...

batmansmk · on Dec 21, 2022

You can compile any C / C++ app down to wasm. In fact that’s the raison d’être of the technology: to provide a portable safe way to run binaries. Here is a link to Postgres in wasm for instance: https://supabase.com/blog/postgres-wasm. The way it works is that instead of outputting assembly for a given architecture in the backend compiler, it outputs wasm instructions that are designed to map all architectures, not like the jvm which is designed primarily for Java in the front end.

mytherin · on Dec 21, 2022

C/C++ code can certainly be compiled down to WASM, but you cannot interface with the operating system as you would in a normal C/C++ program. To get around that restriction postgres-wasm ships an entire Linux distribution that is run inside the browser. This comes with an immense performance penalty.

To get an impression of the performance penalty, just run the following query:

  SELECT SUM(i) FROM generate_series(0, 1000000, 1) tbl(i);

This simple query completes in 100ms locally on my laptop, but takes 17265ms in postgres-wasm. That is a slowdown of 170x.

Now that is not WASM's fault - when running the same query in duckdb-wasm [1] on my laptop the query takes 10ms using WASM, and 5ms when run locally, with a slow-down of only a factor of 2. But in order to achieve those results we did have to adapt the DuckDB codebase to compile natively to WASM. That is absolutely possible but it does take engineering effort - particularly when it comes to larger older projects that are not designed from the ground up with this in mind.

[1] https://shell.duckdb.org

riddleronroof · on Dec 21, 2022

Thank you for these details. Always suspected these claims but hadn’t dug deep enough.

Seems like some X can now run in wasm should come with disclaimer (includes Linux)

cle · on Dec 21, 2022

> You can compile any C / C++ app down to wasm.

This is incorrect, there is a long list of limitations that your C/C++ code must conform to in order to compile to WASM. There's a whole section dedicated to this in the Emscripten docs: https://emscripten.org/docs/porting/index.html.

The chances your existing C/C++ app will compile to WASM and run correctly are much smaller than with Docker. However, the chances your WASM-compiled code will be able to run in a browser are much higher than with Docker (which is the real "killer use case" IMO).

> not like the jvm which is designed primarily for Java in the front end

This is also wrong, JVM bytecode is explicitly designed to be polylingual and is the compilation target for many non-Java languages like Scala, Kotlin, and Clojure. WASM being a compilation target is not what makes it unique from the JVM.

mike_hearn · on Dec 21, 2022

Ironically you actually can run LLVM bitcode on the JVM these days, including inside an optional sandbox. It can run "anything" in the unsandboxed mode (as long as it can compile with LLVM), because it allows native calls out to the OS. So the universal [J]VM vision has actually happened, it's just nobody really knows about it.

LLVM bitcode isn't all that portable though, and of course, the binary will still be OS specific because C/C++ code relies on native APIs. The primary reason to do this is so the Graal JIT compiler can optimize native code and higher level dynamic script/bytecode together and remove interop overhead.

However, you can theoretically run whole programs this way inside the Graal sandbox. If you do that you get an emulation of POSIX that is reimplemented on top of the Java standard library, so the code becomes portable, and in managed mode there's an additional party trick - the native C/C++ malloc is replaced with garbage collected allocations and memory accesses are bounds checked. So code run this way gets all the memory safety errors blocked automatically. This upgrade comes with two costs though, one is slower execution/more memory usage, and the other is you have to buy GraalVM EE. The community edition can run bitcode, but not in the sandboxed/managed mode.

Oh and GraalVM can also run WASM. So you can have cake and eat it, everything running together via their 'polyglot' interop system.

pjmlp · on Dec 21, 2022

Here, have your C and C++ compiler for JVM bytecode from 2006.

http://nestedvm.ibex.org/

kenjackson · on Dec 21, 2022

But don’t I have to compile all my dependencies too? This seems like a lot to ask, but maybe it’s common in some domains.

jjtheblunt · on Dec 21, 2022

how does pthread_create compile down?

azakai · on Dec 21, 2022

In Emscripten that uses a pthread implementation layer built on top of Web Workers + a shared wasm Memory. Basically memory is shared, and you have atomic instructions, and each thread of execution gets its own Web Worker.

That has some limitations, but for the most part it works just like you would expect pthreads to.

jjtheblunt · on Dec 21, 2022

what's the non-most part that differs?

azakai · on Dec 21, 2022

The main thread is a little special on the Web since it can't block, which can cause issues (like pthread_create doesn't immediately create an available pthread). There are workarounds for most of those issues (like pre-allocating a threadpool), and many applications work well, but sometimes not out of the box. See

https://emscripten.org/docs/porting/pthreads.html#special-co...

jjtheblunt · on Dec 21, 2022

thank you

lucideer · on Dec 21, 2022

> JVM+JAR existed

JVM can only run apps written for it: Docker & WASM don't have that limitation.

> it can run non-WASM and non-JVM apps like PostgreSQL, etc...

but WASM can run Postgres

cle · on Dec 21, 2022

> JVM can only run apps written for it: Docker & WASM don't have that limitation.

Of course they do. You can only run apps on WASM that have been compiled to WASM bytecode. You can only run apps on Docker that have been compiled to whatever bytecode is supported by the container runtime (which can be x86, ARM, x64, etc.).

> but WASM can run Postgres

WASM can run WASM-compiled Postgres. It has to be specifically compiled for WASM, which also means it generally needs to be ported to WASM first (as WASM runtimes have a lot of limitations that arbitrary C programs probably don't conform to).

pjmlp · on Dec 21, 2022

C and C++ compiler for the JVM from 2006, http://nestedvm.ibex.org/

pbecotte · on Dec 21, 2022

JVM can run apps _compiled_ for it. WASM has the _exact same_ limitation.

Someone put in the effort to get Postgres to compile for WASM. That's great :) Maybe someday every application will compile to WASM as the preferred choice over the linux interface.

Compiling apps for different targets is VERY MUCH not a simple, low effort task though. Something like a database that must have an incredible number of optimizations in the way it makes syscalls, will have to a full stream of work to keep each target running well.

It can be done. But if "one of these thigns is not like the other" with your three things- Docker is the odd duck out.

ilyt · on Dec 21, 2022

> JVM can only run apps written for it: Docker & WASM don't have that limitation.

Of course it does; WASM is just a format, still need to create the right functions that the runtime will call to do the useful stiff

pulse7 · on Dec 21, 2022

With GraalVM you can run any LLVM binary on JVM...

lucideer · on Dec 21, 2022

That is true, graal's certainly more readily comparable.

pulse7 · on Dec 21, 2022

With GraalVM you can run WASM on JVM...

ZitchDog · on Dec 21, 2022

Source?

angelmm · on Dec 21, 2022

Here you have the different links:

- Article: https://supabase.com/blog/postgres-wasm

- Repository: https://github.com/snaplet/postgres-wasm

- HN Thread: https://news.ycombinator.com/item?id=33067962

TobTobXX · on Dec 21, 2022

https://lmddgtfy.net/?q=postgres%20wasm

lucideer · on Dec 21, 2022

as well as the sibling linked repo there's also https://www.crunchydata.com/developers/playground

ridruejo · on Dec 21, 2022

"and non-JVM apps like PostgreSQL" PostgreSQL can (and has) been compiled to Wasm! It was on HN a couple of months ago https://news.ycombinator.com/item?id=33067962

pulse7 · on Dec 21, 2022

If this is what is meant, then compile PostgreSQL with CLang and run the resulting LLVM binary with GraalVM on JVM...

pulse7 · on Dec 21, 2022

...and with GraalVM you can run WASM on JVM...

angelmm · on Dec 21, 2022

Docker already have the community, tools and distribution layer. You can think as Wasm as another resource that can be run in Docker as it's today. You continue using the same tools with a new way of packaging and run applications / runtimes.

nkozyra · on Dec 21, 2022

Maybe I'm misunderstanding, but my thought was he was suggesting that the host/VM could be written in WASM, which could then run any arbitrary thing as Docker does today.