One needs to assemble the jail. There may be tools for this, although I haven't used them. It's real easy if you can build a static executable, if it's a normal dynamic executable, you can use ldd (or something) to get all the libraries and just remember to pull in ld.so (the runtime linker). If it's dynamically loading modules (like Apache HTTP and PHP often are configured) or running other executables like you asked, that's harder to automate.
Packaging an entire system is more about convenience than anything else. It's also pretty difficult to package just the libs one needs when you are dependent on libc and other C libs.
I suspect that if one was really ok with it, some tooling could be built to copy/link in system libs into the rootfs automatically from the host.
That script depends on bash, bash depends on libc, etc. so those dependencies (and only those dependencies) will be put in the container. (See https://nixos.org/guides/nix-pills/enter-environment.html#id... for an example of what dependencies look like in Nix).
> I suspect that if one was really ok with it, some tooling could be built to copy/link in system libs into the rootfs automatically from the host.
Eww, no thanks! I want my containers to be reproducible.
Can you use jails with, say, and a bindfs / overlayfs / snapshot type of file-system, to run an executable in a jail that's a restricted and read-only view of your currently running system?
I did some reading and it looks like FreeBSD's unionfs or nullfs would handle the file-system part.
Yes. If it's literally read-only then it's safe and easy (just use read-only nullfs); if it's to be writable, but read-only below (like overlayfs), you'll have to use unionfs which is still a bit... temperamental.
It's more secure because you're minimizing the attack surface. The jailed process has no binaries an attacker can use to further exploit the system, not even a shell. It means that there's a minimal to non-existent /etc directory, meaning there is less information about the server and the network it's on. It also means there are fewer places to hide modules to persist, because you control so much of the tree.
Going further, you can place the few executable files you need in a read-only filesystem, so attackers can't copy their own payloads in, further restricting what they can do. You can continue that process with firewall rules that are much more restrictive than what you could use with a more general purpose server, such as blocking any traffic that isn't to or from port 80/443. You can also virtualize the network of a jail, so an attacker wouldn't even get information about the network's layout from a compromised jail.
RCE in the app -> ability to use anything in the OS under the app user privileges -> privilege escalation -> The whole box pwned (including access to all your secrets and management credentials)
Remediation process: wipe the box
Attack path for an app within a jail:
RCE in the app -> ability to use anything in the OS but there is nothing to use, must bring own tools (not always feasible) -> privilege escalation -> JAIL is pwned but not the box
The complete mess that is containers on linux seems a whole lot more complicated to me. Jails do what you want in a simple way that works; containers are endlessly customizable such that you require a whole runtime and management daemon and message-bus integration and goodness knows what else.
FreeBSD Jails were so much better than everything else out there, for a long time. I'll just copy&paste part of a comment I wrote on another HN thread some time ago, since it's relevant here:
[...] In fact, many years ago, when FreeBSD was my main OS (including on notebook) I went as far as to isolate each app that used internet into its own custom-setup jail [0][1].
I had Firefox, Thunderbird, Pidgin and a few others running in complete isolation from the base system, and from each other. I even had a separate Firefox jail that was only allowed to get out via a Tor socks proxy to avoid leaks (more of an experiment than a necessity, to be fair).
Communication between jails was done via commonly mounted nullfs. I have also setup QoS via PF for each of them.
They were all running on the host’s Xorg, which was probably also the weakness of this setup.
It was a pretty sweet setup, but required quite a bit of effort to maintain, even tho I automated most of the stuff.
[...]
Sounds pretty cool! TBH, I don't understand how the "modern" app security models are not deemed to be fundamentally insecure in 2021 where any random app on any random system can simply leak your whole photo library to their servers, or delete all your files. Why do banking apps need full access to my filesystem? Why don't the reviewers of the app stores prohibit such practices of excessive permissions?
Yes, there's always a question of usability, but if you're an advanced user, it's still scary that you're not afforded any control to prevent these incidents unless you go ahead and redesign the whole way all these apps are working all by yourself.
It seems terribly inefficient if every engineer has to do it on their own, and in their own incompatible way. Obviously most people simply give up after a while, since maintaining such a setup might itself be a whole full-time job.
"... I even had a separate Firefox jail that was only allowed to get out via a Tor socks proxy to avoid leaks ..."
I have looked into doing this many times and it's neither simple nor straightforward.
Specifically: jailing a GUI app that you can interact with on your desktop.
I can't remember what the most promising recipe I saw for this was but it wasn't quite promising enough to compel me to built it up ... and this discussion is always (rightly) hijacked with "just use Qubes" ...
Firejail with spawning nested Xorg works fine for me, including text-only copypaste between "host" and "guest" and automatic file synchronization through bind-like mounts. For some firejails I also use Linux network namespaces to control traffic going through taps. My introduction to this approach was the alternative Gentoo handbook by Sakaki[1], but the principles would apply on any distro.
There's also a very interesting read on Qubes-like experience on NixOs with Wayland and XWayland[2,3].
An example would be Xephyr[1]. Archwiki[2] has a decent summary.
In my case, I have the standard xorg session started by my login manager. Then I start Xephyr with a separate DISPLAY, that shows up as just a window in the parent environment. It does look kinda like RDP or VNC.
Is there a reason why FreeBSD doesn't default to running all applications in jails? Seems like this would be a pretty huge advantage compared to the typical Unix system's almost complete lack of sandboxing.
Because if I don't run in a jail, every software I run has full read/write access to all my files. Almost none of the software I run needs even read access to anything except its own files.
(noticed your reply to my other comment too, please consider this a reply to both)
I wasn't aware of HP-UX Virtual Vaults, thanks. However, I'd say FreeBSD Jails still had an advantage, due to being free and running on various hardware platforms, and more importantly on commodity hardware.
Eventually they were replaced by HP-UX Containers (SRP) a couple of years later.
Unfortunately it is hard to still find documentation, given the troubles HP-UX has gone through at HP (which kind of plays into your remark regarding FreeBSD).
Yes, but in a sense that's the essence of why the technology got left behind. Jails were a mechanism for expert admins to play with container ideas.
What the market actually wanted was Docker. And what Docker needed was Linux containers (complicated, flexible, piecewise technology) and not jails, which were higher level abstractions (but yet not high enough) with jargon and framework assumptions that didn't match Docker's needs 1:1.
Well, looking at the following things sort of gives it away:
- brtfs vs zfs
- cgroups vs jails
- SystemTap vs dtrace
- Systemd vs smf
I get it, many of these were due to licensing issues. So they said[1]. Anyways, there are still some things to implement for linux. pf is my favourite (software) firewall. It would be great to see it ported to Linux.
In my experience, both Linux developers and BSD developers don't seem to care too much about porting things to the other's operating system. If you want to do things the Linux way you can use Linux, and if you want to do things the BSD way you can use BSD. That's seen as easier than trying to glue two incompatible things together.
Won't, not can't. *BSDs shipped GPL components for decades before they decided to go for purity. It's a policy decision, not a incapability or mandate.
I don't see why. BSD and GPL are equally compatible, it doesn't matter which way you go. I can see why they wouldn't want a GPL component to be mandatory but it can be made an optional component for Linux compatibility which seems to be the way BSD would want it anyway.
You can just compare the APIs, namespaces are like the individual components of a jail. You can use them to build something like a jail, or something different that has a different security model. This was discussed a lot in an old HN thread: https://news.ycombinator.com/item?id=13982620
This doesn't really answer the question. Yes, the Linux API seems more flexible, but when you think about it, it really isn't, because all the models that actually make any sense can be implemented using simpler interface, which is what jails provide.
One real difference is that you need to be root to create a jail. It'll get fixed eventually - FreeBSD already has unprivileged chroot, jail isn't that much different.
>Yes, the Linux API seems more flexible, but when you think about it, it really isn't, because all the models that actually make any sense can be implemented using simpler interface, which is what jails provide.
Not really, the example of Docker would probably be the most straightforward there. I don't think it's possible to fully port Docker to jails or at least I've never seen a successful port, some of the network topology features seem to just not be possible or straightforward. But I could be wrong, I have not looked into the technical details of this in years, somebody told me it might have been working a while ago but I never heard anything else about it since.
Needing to be root is a major deficiency though and I can't take jails seriously with that, one of the main focuses on Linux containers in the past several years has been to make unprivileged namespaces a good option.
Docker is practically dead, no wonder if disappeared :-)
Many things are hellishly complicated in Linux, due to politics and technical difficulties. Case in point: when I’ve started to work on NFSv4 ACLs, support in Linux was “worked on”, there was a prototype. It was 12 years ago. In FreeBSD, full supper for NFSv4 ACLs, from file systems to userspace tools, shipped decade ago. In Linux it’s still not there.
Docker wasn't dead 5 or 6 years ago when I heard about this. In my experience some things are easier to implement in Linux and some are easier to implement in BSD. I don't particularly care for BSD's internal politics either, such as the licensing issues mentioned elsewhere here.
It might be because it would require reimplementing all the system-specific Docker parts from scratch. Not sure though.
BSD doesn't really have any licensing issues, thanks to BSD license, but politics is directly related to project size. In FreeBSD it's pretty much unnoticeable, but in Linux it can be a huge deal.
That has not been my experience. The issues are with licenses other than BSD but that's the same in Linux; Linux can also use BSD code, a lot of Linux code is actually dual licensed as BSD already.
It’s not - Linux does have problems with licenses which are incompatible with GPL, such as MPL/CDDL. BSD doesn’t, because you can’t have license incompatibility without throwing GPL in the mix - it’s the only Open Source license that can be incompatible with others.
FreeBSD does avoid pulling restrictively licensed (closed source or GPL) code into the base system itself, but a Docker port would be third party (ports/packages), not the base system.
Not really, with Linux it's the same as it would be in BSD if they wanted to avoid conflicts with GPL. You put that code in an optional module and have the user compile it. I am unsure as to why BSD people seem to think that using BSD means you can avoid problems with the GPL, if you use any GPL code for any reason (and there's a lot of it) then you have to pay attention to these things. If you insist on only running BSD and CDDL code then you can avoid it, but that's going back to putting politics over software again, the kind of thing that you were saying you were trying to avoid.
Also, it’s not politics - it’s mostly just that the old GNU cruft is being replaced, and newer, better solutions prefer more liberal licensing, see GCC vs LLVM.
Well the posix conglomerate had such big problems implementing a ACL (mostly because of blabla)... Then Microsoft did one for NT which was then took over to NFS, later the posix peoples decided on a ACL but no one wanted it anymore. However FreeBSD supports both, but the NFS/Microsoft one is the standard.
Note: Linux also needs root for its namespaces. Or at least CAP_SYS_SYSADMIN, which grants enough that it's pretty much as good as root. See setns(2) and clone(2) for details. This is one of the complaints the plan 9 people have always had with Linux namespaces.
I'm using them for several things but the most straightforward one is probably that namespacing can be gradually added to services, you most likely see benefits from this already if you use systemd. That's one way that namespaces can be used in a different way from the docker model.
What are you adding gradually, specifically? Like, a concrete example that names a namespace you may want to use. I'm trying to figure out what problems a half sandbox solves, and a vague "I just want to enable some capabilities" doesn't help here.
The sandboxing and mount-related ones are implemented with namespaces, and the idea with them is to not make any of them mandatory so they can be slowly added to system services. That way you can get some of the benefits without needing to build a full rootfs/container for the service. I am not sure how any of those would be done with jails because jails require you to create a chroot and network interface, whereas in Linux the mount and network namespaces are just optional namespaces and you can still use the other namespaces without using them.
also, with epairs you can do some really flexible networking stuff on freebsd between jails/jails and the host system and even jails and ipsec tunnels.
And how is it mixing and matching these APIs? Given that there's an OCI-compatible runner for jails (runj, compatible with runc -- which is what docker uses to start containers), it seems to me that Docker isn't in actually using the flexibility afforded by the APIs here, but is just using a relatively fixed set of options.
If I'm wrong: what is it using, and what problems is this flexibility solving?
I haven't tested runj but just from looking at it, it seems it is not fully compatible with everything that runc does because the OCI itself specifies a lot of Linux-specific functionality.
runC is literally the abstraction layer docker wrote internally on top of linux containers! It exists as a separate layer now because they spun it out precisely to freeze the API and enable other efforts like runj.
And runj, IIRC (though I'm not an expert in the space) wasn't a trivial 1:1 thing and required changes to the underlying jails layer to enable it.
> The "market" did not want Docker. Docker as a product failed.
Docker as a paid product failed. Docker as a company is failing. "Docker" in the sense I meant (of the software people use to launch containers), is pervasive and dominant. It won. And jails, in comparison, "lost", because jails didn't really do what Docker wanted. And what the market wanted was Docker.
I was about to say the same. iocage is now the default jails wrapper on FreeNAS, which means that there is good documentation and support. The previous jails wrapper used by FreeNAS was not very well documented, but was written by some smart folks.
One thing I like about iocage is how easy it is to grant the jail access to the host ZFS datasets.
On a Jails note, I have had issues creating a jail that can do network inspection. I believe this is an issue with network restrictions of the jails subsystem itself. Eg, I could never run nmap or get mac addresses of remote hosts from within a jail.
(FreeBSD) jails are amazing. I just wish there were easier ways to use them more "cattle"-like, so I can augment or replace Docker/Podman. At the moment tooling and many of the real-world setups remind me a lot of "pet" LXC containers or even VMs in the Linux world.
The tooling is slowly moving in a direction I like, though :)
This is an old post of mine which I happened to find useful. Orchestration of jails moved quite bit forward lately! For example, you can manage your jails quite nicely with containerd today! See great post from Samuel Karp about the topic: https://samuel.karp.dev/blog/2021/05/running-freebsd-jails-w...
I use LXC on Proxmox and I do everything with Ansible scripts. Is there something moving towards docker-like repository in LXC land? Would love to just run the latest pihole or nginx or what have you on LXC
Are they superior to firejail on linux? I kind of always figured they were similar level of "sandboxing" but I never had enough interest in BSD to dig in myself.
They are completely different mechanism for doing different kind of stuff. Firejail sounds like something closer to Capsicum, but without the security model.
I was under the impression they were like jails, to sandbox a program and make it more siloed off and secure? Whatever the underlying mechanisms for obtaining that. I'll research some more I guess.
Could you elaborate on the differences? As far as I understood it firejail, or rather the Linux features that it depends upon, is far more powerful than FreeBSD jails.
From what I understand, firejail is a "syscall filter". This moves it to the same category as capsicum (https://www.freebsd.org/cgi/man.cgi?capsicum), but without Capsicum's security model, instead implementing something ad-hoc, probably by using Linux' seccmp.
Jails, on the other hand, are not a sandboxing mechanism - they are system-level virtualization, like Linux namespaces, but with a simpler interface. You can use it for sandboxing, but it's not what the mechanism fundamentally is.
I always hoped for macOS to borrow FreeBSD jails for itself.
A Docker-like solution with a pretty UI could be really useful for pros. For novices, it could mean a less cumbersome security measure than the restrictions we’ve been experiencing since Catalina.
Okay, I'll admit I just want it to be more like Plan 9. sysctl, ioctl, fcntl, sockets, could all have a file based interface. Maybe it isn't any better.
I would like an auto-jail, based on the directory you’re in (similar to asdf/rbenv).
There’s just too much complexity in software, and so many things run on your machine these days.
You can’t really trust any application anymore
Note that FreeBSD Jails were introduced in 1999, while Solaris Containers and Zones were introduces in 2004. At the time FreeBSD Jails were introduced, probably the only alternative that was wildly available was chroot, which is really far from what Jails offer. Full virtualization was too slow to be practical for most scenarios, back then. 1999 is the year when Pentium III was released.
No, I didn't. I was comparing full virtualization to Jails. However, I realize now that there may not even have been a full virtualization solution available back then anyway, at least not for consumer hardware. I'll have to dig a bit on Wikipedia.
I've never used Solaris Containers / Zones, but my understanding is that the implementation was similar to FreeBSD Jails, so I have no reason to believe the performance was different.
Virtual PC and VMware did full virtualization on commodity hardware at the time (Macs and Windows PCs, respectively). But it was slow enough that it was mostly used for development and testing back then, and certainly not viable for any kind of routine sandboxing.
Back in the earlier days of containerisation Linux had no options (Linux was pretty late to that particular game) and Solaris wasn’t free. So FreeBSD made a lot of sense.
These days the tooling around Linux is better and there are open source forks of Solaris so FreeBSD might seem like an odd choice for some. However I still think FreeBSD is a rock solid operating system and one that doesn’t get taken as seriously these days as it should do.
FreeBSD's appeal for me is the easy maintenance. Everything related to config is in /etc/rc.conf; tunables in /boot/loader.conf. The ZFS implementation is rock-solid. I therefore only have to fiddle with my FreeBSD server, running 80TB ZFS storage pool, once a year or so.
No other OS gives me this kind of comfort and stability.
Perhaps it would! But the userbase is definitely what finally got me into FreeBSD. The fact that hobbyists were contributing such excellent cli tools as `iocage` [0] and `vm-bhyve` [1] really is what got me over my fears.
Way back when OpenSolaris was around, I put it on a server to experiment with. There was a lot to like about it, but one thing I didn't like was there was a lot of obvious cruft that has built up over the years. Things were in places that I didn't expect, but they had been there since 1904 when people wrote C with a quill pen, so SUN couldn't just move it around, because it would break MasterCard or something.
The base FreeBSD system is well thought out, and because it had the ability to, it coalesced into a very coherent system. Man pages are great, everything is where it should be.
To be fair, "a lot of obvious cruft that has built up over the years" also describes old-school Unix as a whole, FreeBSD included. It's just that we got used to many of those things.
While I'm thoroughly a Linux person, I agree. I understand why Docker and the Linux container won the day, but the fact that FreeBSD is one system helps a ton with the overall intelligibility of the system inside that container. Oh, this container is Debian based, this is Centos, that one is busybox... No, every jail is just a FreeBSD system you know what you get for its bones. That's really nice.
Same here, but I never understood how BSD jails or Solaris zones are better over normal hardware virtualization which is used in Qubes OS. In addition, you get a great UX in the latter.
Containerisation makes much more sense than virtualisation in a lot of areas:
* lower runtime overhead
* quicker deployment time
* smaller image foot print
Plus most of the advantages of VMs can still be applied.
However it does massively depend on individual use cases. Qubes aims to be ultra secure and containers in Linux weren’t up to that task at the time (the situation has since improved massively).
Sure. But for every edge case where virtualisation bridges the gap, there will be containerisation solutions that further the gap too.
Also I’m not really in the mood to engage in a dumb flame war. I’m just answering the question as to why some people favour containers for some workflows. If firecracker works for you then keep at it.
With modern hardware support for virtualization, is runtime overhead still a big deal (assuming your virtual environments are not emulating some other CPU?)
They do but not by a significant margin these days.
The kind of domains where this is an issue isn’t what you’re average engineer would be concerned with.
Where VMs do help the average engineer is they have more secure defaults. It’s pretty easy to accidentally run containers insecurely on Linux (FreeBSD jails are a better story though)
> Same here, but I never understood how BSD jails or Solaris zones are better over normal hardware virtualization which is used in Qubes OS. In addition, you get a great UX in the latter.
Tells me you don't know the difference between HW-Virtualisation and OS-Virtualisation.
Of course I do know it. Which is why I'm asking why you would use OS-virtualization, if HW- is possible. (Unless your hardware does not allow that of course).
Also OS level emulation. Linux system calls can be translated to Solaris system calls allowing to run Linux specific workloads without having to emulate hardware and a full Linux kernels. Furthermore before Dtrace was ported Linux it allowed to debug Linux workloads under DTrace.
Unless the software is "vendor locked" with linuxisms. I wish I could run everything only on zones but alas, still have to mix in some bhyves as well for e.g. docker support...
On your phone, a desktop a server? For real-time applications, HPC, analytics, ML/AI? Hardware virtualization has tradeoffs...sometimes massive ones, and then you trust proprietary hardware to make it faster...again.
The issue with zones was the lack in the early days of GNU software and the difficulty in compiling certain things.
For instance, awk can act differently and if you don't know the 25 year old decisions that led to the differences, it can be very confusing. (There are different behaviors and command line switches between GNU's AWK and the version from SVR4/XPG)
I use jails for years, the only thing which is painful are upgrades from ports for all the jails. It's time consuming. Poudriere helps but the whole thing is far from ideal :(
Another cool thing about jails is that they're really easy to convert to bhyve virtual machines if your security needs or general paranoia levels increase at any point.
They're approximately as safe as modern Docker is. Upside to Docker: more security knobs (eBPF, kernel MAC, &c); upside to jails: probably easier to get right out of the box, fewer footguns. Both jails and containers (and Solaris Zones) share a fundamental security weakness, which is a kernel shared between tenants.
Counts what you are afraid against. There's always some side channel attack that could possibly used to gain information, even on VM's this is true. Off the top of my head there could be some timing attack to gain information on which libraries others are using by reading in libraries and seeing if they are warm in the buffer cache, counts if you care about sharing the same kernel. I generally find them secure enough considering how fast they can be brought up and down.
The fundamental thing about those features (and the equivalent on every system except Windows) is that you can never get more capabilities, only less. Once you are in a jail, there is no API for getting out of it.
You can't even see a binary from the rest of your system, and exec won't get you out.
There's Windows Containers[0], which are analogous to Docker containers. They can either be run shared-kernel or in a Hyper-V container.
Also Virtualization-based security[1], which is supposed to be completely transparent so it's not really a developer or deployment tool.
Here's an example from my personal name server:
... and although this jail has a lot of content files in it, the actual UNIX userland is only what is required to run 'lighttpd': So it's an extremely lightweight environment with very little attack surface.You can also share a lightweight environment with multiple commands - here are two other jail commands:
... see how both jailings of 'nsd' and 'unbound' point to the same '/jails/dns' userland ? Once again, that userland is very, very compact: ... so, 97 files total to run both name servers.No 'make world' necessary, no building and maintaining of a full FreeBSD system - just the lightest skeleton required for both 'nsd' and 'unbound'.