It'd most likely lead to symbol conflicts, if you'd like to both use the sandboxed library, and the original (unsandboxed) one, even for constants / defines.
The other problem is memory synchronization: typical programistic run-times don't provide info on whether a given memory pointer points to memory which will be read only by the library, or maybe will be updated, or both (const-like annotations are rarely conclusive here), hence some additional work will always be needed to correctly wrap a library function. Also, file-descriptors are just integers, and it'd require a heuristics to figure out when file-descriptor sync between the main code and the library code is needed or it's just a pure int which is used.
So, no, a fully automatic library interface would be quite impossible, at least without some kind of heuristics, and this would be error prone.
PS: One of the main ideas behind the project is represented by the motto "Sandbox once, use anywhere", so technically only one implementation of a library API is needed for all users using the same set of functions.
Hmm, you list those three problems, and they're true, but even given them the interface seems like it's more different than it needs to be. The symbols could just be all prefixed. File descriptors could be indicated with an annotation or simple function call. Memory synchronization could be managed by requiring the user to use a special allocator for anything that's going to be passed to the sandbox, and putting that memory onto an exchange heap ala Singularity. Is there a reason you didn't use these techniques?
Thanks for those ideas - some early comments on those below:
> symbols could just be all prefixed
IMO, this wouldn't be conceptually different from how it works now - i.e. calling function through a C++ object. IOW: code using sandboxing "printf" would have to be changed to use "_sanbdoxed_printf" somehow (via code or linker tricks). Unless you mean something different here?
> File descriptors could be indicated with an annotation or simple function call.
Similarly to the problem above - it's probably in line with how it works now. It can definitely be simplified (even with a simple annotation "somewhere"), but again, that wouldn't probably solve the problem of having drop-in library replacement IMO?
> use a special allocator for anything that's going to be passed to the sandbox, and putting that memory onto an exchange heap ala Singularity.
Yes, that's something we're thinking about, mainly for performance reasons. As you certainly know, it will be more complex than simply having malloc()/calloc() (and friends) operating on a shared mmap(), as memory referenced can also include stack/bss/rodata/direct-mmap etc. etc.
These are all good ideas! Thanks for sharing them - in case you'd like to comment on them more, I'd like to invite you to do that on the project's mailing list.
PS: It might also be here that you might be thinking about something slightly else. I.e. moving all annotations wrt memory/file-descriptors into some middle layer, and then exposing identical-to-the-original library API. This is probably possible, even now, with some extern "C" magic, though there are no examples or tooling on how to do that yet.
> IOW: code using sandboxing "printf" would have to be changed to use "_sanbdoxed_printf" somehow (via code or linker tricks).
I mean, linker tricks à la LD_PRELOAD, DYLD_INSERT_LIBRARIES are a lot less work than rewriting code. They don't even require a recompile most of the time.
I think this does however bring up an interesting point: it seems like this library is trying to solve a somewhat different subset of the sandboxing, which is the question of "how do I maintain separation if I'm writing code on both sides of the sandbox", whereas sandboxing can also include "how do I stop this arbitrary binary from doing malicious things". It seems to me that the latter usually ends up requiring support in the kernel or linker but no cooperation from the sandboxed process itself, while the former requires adoption new API (for example, macOS's XPC).
> I mean, linker tricks à la LD_PRELOAD, DYLD_INSERT_LIBRARIES are a lot less work than rewriting code. They don't even require a recompile most of the time.
Thanks. Also, simply providing static symbols during compilation should work (in most cases), as typically a static linker will only pick up the first provided and referenced (yet undefined) symbol.
Probably even a more "stable" (less conflicting) solution would be to use -Wl,--wrap=symbol and then provide __wrap_symbol() sandboxed definition. Agree, this all doable one way or another, and we might take a look at it at some point (we'll be glad to receive any input on that, preferably on the project's mailing list).
It'd most likely lead to symbol conflicts, if you'd like to both use the sandboxed library, and the original (unsandboxed) one, even for constants / defines.
The other problem is memory synchronization: typical programistic run-times don't provide info on whether a given memory pointer points to memory which will be read only by the library, or maybe will be updated, or both (const-like annotations are rarely conclusive here), hence some additional work will always be needed to correctly wrap a library function. Also, file-descriptors are just integers, and it'd require a heuristics to figure out when file-descriptor sync between the main code and the library code is needed or it's just a pure int which is used.
So, no, a fully automatic library interface would be quite impossible, at least without some kind of heuristics, and this would be error prone.
PS: One of the main ideas behind the project is represented by the motto "Sandbox once, use anywhere", so technically only one implementation of a library API is needed for all users using the same set of functions.