We were not able to get good enough performance compared to LMDB. We will work on this more though, there are probably many ways performance can be increased by reducing load on the KV store.
Did you try WITHOUT ROWID? Your sqlite implementation[1] uses a BLOB primary key. In SQLite, this means each operation requires 2 b-tree traversals: The BLOB->rowid tree and the rowid->data tree.
If you use WITHOUT ROWID, you traverse only the BLOB->data tree.
Looking up lexicographically similar keys gets a huge performance boost since sqlite can scan a B-Tree node and the data is contiguous. Your current implementation is chasing pointers to random locations in a different b-tree.
I'm not sure exactly whether on disk size would get smaller or larger. It probably depends on the key size and value size compared to the 64 bit rowids. This is probably a well studied question you could find the answer to.
I learned that Turso apparently have plans for a rewrite of libsql [0] in Rust, and create a more 'hackable' SQLite alternative altogether. It was apparently discussed in this Developer Voices [1] video, which I haven't yet watched.
Keep in mind that write safety comes with performance penalties. You can turn off write protections and many databases will be super fast, but easily corrupt.
Consider that this isn't just a random AI slopped assortment of 9,000 tests, but instead is a robust suite of tests that cover 100% of the HTML5 spec.
Does this guarantee that it functions completely with no errors whatsoever? Certainly not. You need formal verification for that. I don't think that contradicts what Simon was advocating for though in this post.
I actually hate AI in my core, to the point that if it gets too much more advanced I'll likely be in existential crisis, so don't attack me on those grounds. Given it exists, I'm going to find what's good about it though. I do think the problem of AI existing has to be confronted. Maybe one solution is what the human does is produce specs like the HTML 5 one, and what the AI does is implement it in software.
The equivalency here is not 9 billion versus 90 billion, it's 9 billion versus 90 million, and the question is how does the decline look? Does it look like the standard of living for everyone increasing so high that the replacement rate is in the single digit percentage range, or does it look like some version of Elysium where millions have immense wealth and billions have nothing and die off?
Hey Simon, given it's you ... are you concerned about LLMs attempting to escape from within the confines of a Docker container or is this more about mitigating things like supply chain attacks?
I'm concerned about prompt injection attacks telling the LLM how to escape the Docker container.
You can almost think of a prompt injection attack as a supply chain attack - but regular supply chain attacks are a concern too, what if an LLM installs a new version of an NPM package that turns out to have been deliberately infected with malware that can escape a container?
When you use docker you can have full control over the networking layer already.
As you can bound it's networking to another container that will act as proxy/filter. How WASM offer that?
With reverse proxy you can log requests, or filter them if needed, restrict the allowed domains, do packet inspection if you want to go crazy mode.
And if an actor is able to tailor fit a prompt to escape docker, I think you have bigger issues in your supply chain.
I feel this wasm is bad solution. What it brings a VM or docker can't do?
And escaping a docker container is not that simple, require a lot of heavy lifting and not always possible.
Aside from my worries about container escape, my main problem with Docker is the overhead of setting it up.
I want to build software that regular users can install on their own machines. Telling them they have to install Docker first is a huge piece of friction that I would rather avoid!
The lack of network support for WASM fits my needs very well. I don't want users running untrusted code which participates in DDoS attacks, for example.
You have the same lack of network support with cgroups containers if you configure them properly. It isn't as if it's connected and filtered out, but as though it's disconnected. You can have it configured in such a way that it has network support but that it's filtered out with iptables, but that does seem more dangerous, though in practice that isn't where the escapes are coming from. A network namespace can be left empty, without network interfaces, and a process made to use the empty namespace. That way there isn't any traffic flowing from an interface to be checked against iptables rules.
I think that threat is generally overblown in these discussions. Yes, container escape is less difficult than VM escape, but it still requires major kernel 0day to do; it is by no means easy to accomplish. Doubly so if you have some decent hygiene and don't run anything as root or anything else dumb.
When was the last time we have heard container escape actually happening?
Just because you haven't heard of it doesn't mean the risk isn't real.
It's probably better to make some kind of risk assessment and decide whether you're willing to accept this risk for your users / business. And what you can do to mitigate this risk. The truth is the risk is always there and gets smaller as you add several isolation mechanisms to make it insignificant.
I think you meant “container escape is not as difficult as VM escape.”
A malicious workload doesn’t need to be root inside the container, the attack surface is the shared linux kernel.
Not allowing root in a container might mitigate a container getting root access outside of a namespace. But if an escape succeeds the attacker could leverage yet another privilege escalation mechanism to go from non-root to root
Better not rely on unprivileged containers to save you. The problem is:
Breaking out of a VM requires a hypervisor vulnerability, which are rare.
Breaking out of a shared-kernel container requires a kernel syscall vulnerability, which are common. The syscall attack surface is huge, and much of it is exploitable even by unprivileged processes.
They both can be highly unescapable. The podman community is smaller but it's more focused on solving technical problems than docker is at this point, which is trying to increase subscription revenue. I have gotten a configuration for running something in isolation that I'm happy with in podman, and while I think I could do exactly the same thing in Docker, it seems simpler in podman to me.
Apologies for repeating myself all over this part of the thread, but the vulnerabilities here are something that Podman and Docker can't really do anything about as long as they're sharing a kernel between containers.
If you're going to make containers hard to escape, you have to host them under a hypervisor that keeps them apart. Firecracker was invented for this. If Docker could be made unescapable on its own, AWS wouldn't need to run their container workloads under Firecracker.
This same, not especially informative content is being linked to again and again in this thread. If container escapes are so common, why has nobody linked to any of them rather than a comment saying "There are lots" from 3 years ago?
Perspective is everything, I guess. You look at that three year old comment and think it's not particularly informative. I look at that comment and see an experienced infosec pro at Fly.io, who runs billions of container workloads and doesn't trust the cgroups+namespaces security boundary enough so goes to the trouble of running Firecracker instead. (There are other reasons they landed there, but the security angle's part of it.)
Anyway if you want some links, here are a few. If you want more, I'm sure you can find 'em.
Some are covered off by good container deployment hygiene and reducing privilege, but from my POV it looks like the container devs are plugging their fingers in a barrel that keeps springing new leaks.
(To be fair, modern Docker's a lot better than it used to be. If you run your container unprivileged and don't give it extra capabilities and don't change syscall filters or MAC policies, you've closed off quite a bit of the attack surface, though far from all of it.)
But keep in mind that shared-kernel containers are only as secure as the kernel, and today's secure kernel syscall can turn insecure tomorrow as the kernel evolves. There are other solutions to that (look into gVisor and ask yourself why Google went to the trouble to make it -- and the answer is not "because Docker's security mechanisms are good enough"), but if you want peace of mind I believe it's better to sidestep the whole issue by using a hypervisor that's smaller and much more auditable than a whole Linux kernel shared across many containers.
I mean docker runs in sudo privileges for the most part, yes I know that docker can run rootless too but podman does it out of the box.
So if your docker container gets vulnerable and it can somehow break through a container, I think that with default sudo docker, you might get sudo privileges whereas in default podman, you would be having it as a user run executable and might need another zero day or smth to have sudo privilege y'know?
Interesting because the repo only lists a MIT license, with no mention of those requirements. IANAL but those license terms don't seem to be anywhere in the software repository.
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction...
I used Cline+Claude 3.7 Sonnet for the initial draft of this LLVM PR. There's a lot of handholding and the final version was much different than the original.