Hacker News new | past | comments | ask | show | jobs | submit login
Nsjail – A light-weight process isolation tool for Linux (nsjail.com)
87 points by LaFolle on Oct 15, 2017 | hide | past | favorite | 21 comments



This seems very similar to Bubblewrap: https://github.com/projectatomic/bubblewrap


I don't see anything to suggest that nsjail has the main feature of bubblewrap: It is safe to make bubblewrap setuid-root, and therefore bubblewrap is a safe way for unprivileged users to use containers. (arguably the only safe way at the moment)

Without nsjail making that guarantee, nsjail is just yet another command line interface to namespaces.


How does this compare to firejail?


This tool is lighter-weight than firejail. nsjail seems to be a thin abstraction over Linux namespaces, while firejail contains profiles for common desktop applications and some X hackery to enable jailing of GUI programs.


author here:

Yup, nsjail doesn't have X hacks (I should work on that), though it offers some profiles for Apache-like type of applications:

https://github.com/google/nsjail/tree/master/configs

I believe nsjail uses one of the most advanced (if not the most advanced) seccomp-bpf config language - kafel: https://github.com/google/kafel


bwrap allows passing a FD containing the seccomp rules (--seccomp FD w/ seccomp_export_bpf). If it can export the compiled eBPF it should be trivial to use kafel profiles w/ bubblewrap/atomic/flatpak/etc.


Is this what I should use if I want to intercept filesystem calls (and rewrite them, or generate on the fly the file that is about to be accessed)? Something else I should look into for this purpose?


author here:

Not exactly, you can technically overwrite a file with bind mounts, e.g. use

nsjail --chroot / -R /dev/null:/etc/passwd -- /bin/sh -i

This will make /etc/passwd empty, but nsjail doesn't rewrite syscalls. In order to do that, you'd have to use SECCOMP_RET_TRACE (TRACE(number) in kafel config lang), and then add some C code to nsjail which will use ptrace() to intercept and rewrite your syscall. It's possible, just not implemented, because it didn't seem like something that's required by users.


It doesn't sound like NsJail does that; maybe try FUSE or SECCOMP_RET_TRACE?


Yes, SECCOMP_RET_TRACE works, but nsjail doesn't have code to support that - it didn't seem that useful when mount namespaces can police access to files.

Otherwise, it's possible to make it support that. Though, a word of caution: ptrace() is complex, and sometimes buggy interface with a lot of corner-cases - iow: it's easy to make a mistake with consequences for security of the whole setup.

PS: It's possible to use SECCOMP_RET_TRAP (TRAP(number) in kafel's - nsjail seccomp-bpf cfg language - nomenclature), and rewrite syscalls in-process with help of SIGSYS signal handler.


Is there a minimum required kernel version? How does it compare to proot?

We use proot in our build pipeline and it would be interesting to look into alternatives.


Re kernel versions: Depending on when CLONE_NEWUSER and seccomp-bpf were added to the kernel for different CPU architectures. For x86-64 it was probably around 3.16, for some others it might be even 4.3 (e.g. ppc64). It might even work with earlier versions if you use --disable_clone_newuser and avoid using seccomp-bpf filters.

Re 'proot'. I've never used it (it seems to be a configurator for the mount namespace), but nsjail seems much more advanced: cgroups support, seccomp-bpf via configuration language support, and a few more features (configs, net).


Thanks. I appreciate the response. I guess my only option until moving to a more recent kernel is `proot` as our build boxes are still in 2.6.32, but I am happy to have found out about `nsjail` for the future.


What about older LTS systems that have CLONE_NEWUSER but only allow access to it from uid 0?


You can run it as root, and specifiy users/groups to switch to before executing an app. Though, CLONE_NEWUSER was meant for exactly that - using namespaced without euid==0. Some systems like Debian have a sysctl flag:

kernel.unprivileged_userns_clone

which controls this behavior. Ultimately, it's up to you whether set it to "1", as CLONE_NEWUSER in the past opened many new attack vectors on the Linux kernel. However, I believe that currently the situation is much better, esp. after syzkaller and individual researchers reported and fixed many bugs in this area.



This seems to be almost exactly like systemd-nspawn other than the ability to write seccomp policies in kafel.

Are there any other notable differences?


I haven't been looking at systemd-nspawn for some time, but judging from its man page:

- ability to use config files (in nsjail in protobuf format)

- 3 operational modes: one of them allows to listen on a TCP port and run processes on-demand (inetd-style)

- support for cgroups (pid and mem limiting), here rlimits are not enough

- more expressive seccomp-bpf rules


> ability to use config files

systemd-nspawn supports ".nspawn" files (see --settings=true mode)

> socket activation

systemd can start up an nspawn thing in reaction to a systemd socket-activation request I think?

> cgroups

I guess for that you'd use 'systemd-run --scope -p MemoryLimit=10M -p CPUShares=100 -- systemd-nspawn ...'

> more expressive seccomp-bpf rules

Absolutely!


I've been using nsjail in production with good success lately. It's a solid tool.

Thank you authors! Really appreciate your work on this project.


I have become conditioned by seeing so many Javascript frameworks reach the front page over the years that I parsed this as 'JsNail' on first glance.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: