Open-sourcing Sandboxed API

staticassertion · on March 18, 2019

I tried building something like this for rust, in order to execute closures as a separate process[0].

I could probably automate it for a struct with a macros so that the API was autosandboxed. Also, it was purely an experiment and is not particularly safe - secrets in the "broker" process would leak to the client, for example. I had plans to deal with at least some of these issues using a build.rs to generate a separate binary that could be exec'd but I stopped caring.

I wonder how this sandboxapi gets around that?

The reason I didn't bother was because I found that constructing the sandbox was the hard part, not executing the code in it - how do you detect what the code needs to do in the sandbox? You end up too tight or too loose.

Also, performance is miserable if you have to fork/exec every single time. So you may be tempted to cache the process across executions but that comes with its own pains. Again, curious how this is dealt with in sandboxapi.

Personally, I want to see sandboxing support at a language level. I imagine a language, perhaps like Pony, which has isolated units of execution that connect via some form of RPC/message passing. With a type system that "taints" types based on actions, basically a capability type that propagates upwards, you could generate seccomp filters for every unit of execution. Pony seems like it's actually not far from this, from what I could tell.

Sandboxing in a language like this is a matter of moving the execution unit out of process, which shouldn't be that hard.

[0] https://github.com/insanitybit/sandbox

jagger11 · on March 18, 2019

Hi, one of the authors here.

The sandboxed library is indeed exec'd (it's a separate binary), so the only "secrets" that leak into it, are those which are already there - compiled into the binary.

As for fork/exec: the idea is to preserve the execution context as long as possible - if data leaks between sequences of library API calls are not a problem (e.g. because we just convert some data for internal use), then forking/execve is not necessary. The idea behind 'transactions' (as per Sandboxed API nomenclature) is just that.

If potential data leaks between API calls might be a problem, then a fresh instance is spawned via fork (no execve is needed here though).

staticassertion · on March 18, 2019

Hi,

First off, thanks for the reply and congrats on the release.

Separate binary makes sense - I wanted to get there myself but didn't have the patience. I'll have a look at the code to see how that works.

Sounds like you've considered some of the areas I had concerns with already.

josephg · on March 18, 2019

Neat! I wonder how this would compare to the equivalent WASM implementation.

With WASM, you wouldn't need to run the sandboxed code in a separate execution context. And you shouldn't need any OS-specific code to isolate the sandboxed executable. The whole thing would be conceptually simpler; although you'd need to pull in a WASM runtime. The actual code would run a bit slower in order for the WASM VM to do bounds checks and so on. But I think function calls across the WASM boundary would be way faster than the sandboxed IPC bridge that blog post suggests. Also a WASM version would let you build your sandboxed bundle once and run it anywhere, which is neat.

So I get the sense that a WASM version of this would be simpler and more portable but have a different performance profile. Are there other considerations here? It'd be a cool experiment to play with.

voidmain · on March 19, 2019

If you are at all concerned about confidentiality (as opposed to just integrity) properties, you need to defend against speculative execution timing attacks ("Spectre") and other uncooperative timing channels.

Language based security like WASM can prevent untrusted code from receiving timing channels, but only by executing it absolutely deterministically and denying it anything that can be used as a clock, which definitely includes shared memory multithreading. So you would not be able to run anything originally multithreaded, except perhaps by heroic measures like a continuation passing lowering of fibers. If the code has any way to measure the passage of time, there is presently no known way to prevent it from using Spectre to read the whole memory of the process it is in.

On the other hand process sandboxing can rely on defenses provided by the operating system and hardware. These are not IMO as absolute as deterministic execution, but they are better than nothing if it's not practical to run the untrusted code deterministically.

Thus you see for example Chrome shifting its security model in the direction of assuming that anything mapped in the same OS process as untrusted JS or WASM has no confidentiality. I personally think it would be better, given the historically very limited access of JS to synchronous timing channels, to go for perfect determinism in that context. But definitely there are use cases where that isn't practical, so there is a place for process sandboxing. It also makes for a second line of defense - you can run your wasm vm inside your process sandbox, and an integrity failure of either one is not fatal!

snek · on March 19, 2019

WASM is interesting in that its memory design prevents reading anything outside the memory allocated for the vm. To this point, I don't think you would get much use from using this within WASM.

That's not to say, of course, that WASM VMs don't employ various security measures. V8, at least, uses (mostly) the same optimization pipeline for WASM that it does for JS, so all the years of security that have gone into JS are gained by WASM for free.

nwmcsween · on March 19, 2019

WASM isn't sandboxing, it's just a restricted target that has some guarantees. IMO this project is sort of junk as you have to use some sort of RPC mess on top of encapsulating the library when just rewriting/modifying the library to be a daemon could be simpler and not tied to $RPC/$SANDBOX_IMPL, DJB software was created like this.

AndrewGaspar · on March 18, 2019

Ah, when I got to the code snippet, I was hoping I'd see that you didn't have to change any code, that SAPI could generate the precise interface of the library you're using, but it looks like the code you have to write is a pretty big divergence from the original library API.

jagger11 · on March 18, 2019

Hi, one of the authors here.

It'd most likely lead to symbol conflicts, if you'd like to both use the sandboxed library, and the original (unsandboxed) one, even for constants / defines.

The other problem is memory synchronization: typical programistic run-times don't provide info on whether a given memory pointer points to memory which will be read only by the library, or maybe will be updated, or both (const-like annotations are rarely conclusive here), hence some additional work will always be needed to correctly wrap a library function. Also, file-descriptors are just integers, and it'd require a heuristics to figure out when file-descriptor sync between the main code and the library code is needed or it's just a pure int which is used.

So, no, a fully automatic library interface would be quite impossible, at least without some kind of heuristics, and this would be error prone.

PS: One of the main ideas behind the project is represented by the motto "Sandbox once, use anywhere", so technically only one implementation of a library API is needed for all users using the same set of functions.

catern · on March 19, 2019

Hmm, you list those three problems, and they're true, but even given them the interface seems like it's more different than it needs to be. The symbols could just be all prefixed. File descriptors could be indicated with an annotation or simple function call. Memory synchronization could be managed by requiring the user to use a special allocator for anything that's going to be passed to the sandbox, and putting that memory onto an exchange heap ala Singularity. Is there a reason you didn't use these techniques?

jagger11 · on March 19, 2019

Hi,

Thanks for those ideas - some early comments on those below:

> symbols could just be all prefixed

IMO, this wouldn't be conceptually different from how it works now - i.e. calling function through a C++ object. IOW: code using sandboxing "printf" would have to be changed to use "_sanbdoxed_printf" somehow (via code or linker tricks). Unless you mean something different here?

> File descriptors could be indicated with an annotation or simple function call.

Similarly to the problem above - it's probably in line with how it works now. It can definitely be simplified (even with a simple annotation "somewhere"), but again, that wouldn't probably solve the problem of having drop-in library replacement IMO?

> use a special allocator for anything that's going to be passed to the sandbox, and putting that memory onto an exchange heap ala Singularity.

Yes, that's something we're thinking about, mainly for performance reasons. As you certainly know, it will be more complex than simply having malloc()/calloc() (and friends) operating on a shared mmap(), as memory referenced can also include stack/bss/rodata/direct-mmap etc. etc.

These are all good ideas! Thanks for sharing them - in case you'd like to comment on them more, I'd like to invite you to do that on the project's mailing list.

PS: It might also be here that you might be thinking about something slightly else. I.e. moving all annotations wrt memory/file-descriptors into some middle layer, and then exposing identical-to-the-original library API. This is probably possible, even now, with some extern "C" magic, though there are no examples or tooling on how to do that yet.

saagarjha · on March 19, 2019

> IOW: code using sandboxing "printf" would have to be changed to use "_sanbdoxed_printf" somehow (via code or linker tricks).

I mean, linker tricks à la LD_PRELOAD, DYLD_INSERT_LIBRARIES are a lot less work than rewriting code. They don't even require a recompile most of the time.

I think this does however bring up an interesting point: it seems like this library is trying to solve a somewhat different subset of the sandboxing, which is the question of "how do I maintain separation if I'm writing code on both sides of the sandbox", whereas sandboxing can also include "how do I stop this arbitrary binary from doing malicious things". It seems to me that the latter usually ends up requiring support in the kernel or linker but no cooperation from the sandboxed process itself, while the former requires adoption new API (for example, macOS's XPC).

jagger11 · on March 19, 2019

Hi,

> I mean, linker tricks à la LD_PRELOAD, DYLD_INSERT_LIBRARIES are a lot less work than rewriting code. They don't even require a recompile most of the time.

Thanks. Also, simply providing static symbols during compilation should work (in most cases), as typically a static linker will only pick up the first provided and referenced (yet undefined) symbol.

Probably even a more "stable" (less conflicting) solution would be to use -Wl,--wrap=symbol and then provide __wrap_symbol() sandboxed definition. Agree, this all doable one way or another, and we might take a look at it at some point (we'll be glad to receive any input on that, preferably on the project's mailing list).

saagarjha · on March 20, 2019

Would that be sandboxed-api-users@googlegroups.com?

pjmlp · on March 19, 2019

> So far, only Linux is supported. We will look into bringing Sandboxed API to the Unix-like systems like the BSDs (FreeBSD, OpenBSD) and macOS. A Windows port is a bigger undertaking and will require some more groundwork to be done.

Oh well.