Hacker News new | past | comments | ask | show | jobs | submit login
Dockerfile Best Practices (github.com/hexops)
196 points by sidcool on Jan 3, 2021 | hide | past | favorite | 72 comments



In general this article was very disappointing. With the exception of suggesting using tini as an entry point, this article adds nothing beyond the very basic advice of not running stuff at root.

There's no mention of multi-stage builds, labelling images, using env variables to pass ephemeral config settings, setting up tmpfs folders to avoid cluttering containers with temp data, the importance of rebuilding images periodically... In short, the basics.

Edit: this stuff is already covered by Docker in it's official Dockerfile best practices guide.

https://docs.docker.com/develop/develop-images/dockerfile_be...

I was expecting far more from an article on HN about Dockerfile best practices, particularly one which is currently being listed so prominently. I mean, does the HN crowd need a top ranking link to be reminded that we should not be running external-facing services as root?


Do not hesitate to contribute then...

-> https://github.com/hexops/dockerfile/pulls


> Do not hesitate to contribute then...

This info is quite literally made available by Docker itself in the form of it's official best practices guide, which exists for years now.

https://docs.docker.com/develop/develop-images/dockerfile_be...

How is anyone publishing a so-called best practices guide when they aren't even aware of the stuff covered by the official best practices guide?


Because they can post something wrong, get the real experts on HN to fix it, then they can use their now awesome repository to sell consultancy service as one of the biggest Docker experts in the world.

:-)


You're hastily assuming that they "aren't aware" when they could have simply chosen to avoid redundancy.


> Do not use a UID below 10,000

I see the point but this kinda looks like extreme paranoia.

I actually advise developers that are not so strong at docker and containers to use his 1000 in their container. Such uid is usually the default first uid in Ubuntu and thus makes running containers in their development machine easier (since now they don't have to deal with file permissions and uids without a corresponding user on their system).

Someone will boo this and maybe even downvote, but it really helps developer "think operationally", and gets their dev environment closer to prod.


Right. This is great if you are still learning the ropes. However the title is best practices. Using uid 1000 and using rw file mounts exposes your main user to potential exploits. Often 1000 is in the docker group for convenience.

That allows an attacker to potentially append some nasty code outside the container and get password-less sudo. Unlikely? Very. But security is all about layers.

I've seen full-fat Ubuntu containers with uid 1000 which is in docker group on the host and ~ mounted to run some python flask app, in production.


> I've seen full-fat Ubuntu containers with uid 1000 which is in docker group on the host and ~ mounted to run some python flask app, in production

If you mount ~ in your containers you have bigger problems than uid 1000.


Running with uid 1000 in the Docker group in prod is a very bad idea to begin with


On your single-user dev laptop it’s very likely that you are user 1000 and probably put yourself in the docker group so you wouldn’t have to sudo to run docker ps.

So then running your containers as that UID without user namespacing (docker’s default) opens you up to more attack surface than if it was uid 1001.


So what is the right way to configure Docker such that both the container and my host user can alter files in the volume?


Hoping to get a good answer via Cunningham's law here: the way I currently do it is to create a shared GID on the host, add the docker user to that same number GID (via build args), and chmod it appropriately. chmod -R 2775 on folder to be shared seems to do the trick.

This is probably wrong and lazy. I think the rigorous approach is to use namespaces

https://www.jujens.eu/posts/en/2017/Jul/02/docker-userns-rem...


docker daemon has a id remap option called userns-remap

https://www.objectif-libre.com/en/blog/2020/06/30/securiser-...


This is actually worse because the filesystem does not know about user namespaces.


it's not bad advice to use UID 1000, BUT if you're using docker both for development and continuous integration, you might run into problems because jenkins (or whatever you're using) might not run with UID 1000.

At that point, you're gonna to need to change your strategy and properly map the user/group outside the container to the user/group inside the container anyway.


I find it somewhat curious, that the article starts with listing examples where root user is used to run containers (amongst other issues):

> Writing production-worthy Dockerfiles is, unfortunately, not as simple as you would imagine. Most Docker images in the wild fail here, and even professionals often[1] get[2] this[2] wrong[3].

And yet, in some of the linked URLs, people are presenting reasons for why that approach was used in particular, instead of the supposedly safer alternatives.

For example, https://github.com/caddyserver/caddy-docker/issues/104

> We actually originally did run as non-root by default, but simplicity we decided to drop that (see #24, and also #103 for some other related discussion).

> If your Dockerfile works for you, that's great. In most cases where users want to run as non-root, they also don't need to listen to :80/:443 in the container, so the setpcap magic isn't necessary at all.

> It's also worth noting that caddy is an official image, and as such needs to be similarly-shaped to other official images of the same type. At a quick glance, none of nginx, traefik, httpd, or haproxy support running as non-root out of the box either.

> Finally, it's worth considering why you want to run as non-root. What attack vectors are you trying to avoid? Container escape vulnerabilities are pretty much the only real risk, but anyone running a modern Docker version is immune to many of them. It's also worth considering user namespace remapping as a mitigation. In my experience the main reason for running as non-root is to pass compliance checks - not a bad reason, but it's also worth recognizing that non-compliance does not automatically equal decreased security (and vice-versa).

If larger projects, like Nginx, Traefik, Httpd and HAProxy were all creating containers like that, it makes you think about the reasoning behind it. Is it easier to just run containers as root and not worry about the permissions inside of the container? If so, wouldn't it really make more sense for the container runtime to have some sort of mechanisms in place to allow people to do what's easy within the containers while also making sure that it has no harmful impact outside of them?

Because to me it seems like people will continuously take the path of least resistance and from where i stand, it should be up to the creators of the container technologies to make sure that this path is safe by default.


Nginx and the like are starting to provide non-privileged versions of their container images.

Running as root is lazy and equals container escape, especially when running on anything other than scratch and read only file system.

The only reason Nginx and Traefik run as root is to bind to privileged ports (80,443). There is no reason to do that inside of a container, since you can remap exposed ports outside of the container.

Containers are not VMs and must be handled differently. You are always one RCE away from having your entire container platform compromised.


You don't need root to open ports 80 and 443 but instead use CAP_NET_BIND_SERVICE that you can also grant to the container.


CAP_NET_BIND_SERVICE is a root privilege, a distinct one provided by kernel capabilities, granted to a process. In order to use it the container must be permitted to allow its processes to elevate their privileges.

If the container is running as root permitting it is redundant, since the kernel doesn’t filter root for kernel capabilities anyways.

If a privileged user sets CAP_NET_BIND_SERVICE on an executable binary using setpcap to allow a non-root user the ability in a container to bind to a privileged port, elevated privileges are still required for execve to create a process that is permitted to use the kernel capability. Think sudo but for processes.

The argument with containers is that binding to a privileged port isn’t necessary, so you shouldn’t do it. And by not doing it you improve your security posture.


Don't even need that on newer kernels and Docker 20.10: https://github.com/moby/moby/pull/41030


> Is it easier to just run containers as root and not worry about the permissions inside of the container? If so, wouldn't it really make more sense for the container runtime to have some sort of mechanisms in place to allow people to do what's easy within the containers while also making sure that it has no harmful impact outside of them?

This is correct in principle, but very hard in practice. This is because kernel support for containers were kind of "tacked on" and more-or-less scattered across the code-base. And although they're getting a lot better, there's still no easy way to reason about their security. So a lot of the advice around permissions management and access control are a kind of defence-in-depth.

I'd love it if we get to a point where containers can make strong statements around security.


Before containers everyone used VMs. VMs had a much bigger overhead and were harder to manage. But they were also secure. Now everyone other docker tutorial explains containers as lightweight linux VMs, while in terms of security they are far from it.

I don't like the idea of running my (internal, not meant to face the internet) containers so that they are publicly accessible or running their processes as root. Yet this seems to be the default with docker and I'm sure a lot of people don't bother fixing that.

There is so much confusion regarding users and user namespaces. I think something needs to change in the way docker documents those things and also in the way defaults are chosen for various configuration options.


I think it's better to say VMs are _more_ secure. You can never leave out the _more_.

Example: https://www.vmware.com/security/advisories.html


VM's offer different security. That is all.

It is more secure to run a process with seccomp filters than it is to run it without.

It is more secure to run a process with seccomp and vm isolation than just one or none of these.


You can use VMs to isolate your containers, Amazon developed Firecracker [0] for that.

[0] https://firecracker-microvm.github.io/


Were they really that significantly more secure? You still needed to do regular maintenance on the underlying image, etc. Same with Docker. The only big difference I see is that yes, breaking out of a container is easier than out of a VM. But are there any other significant vectors I should be aware of?


They were not more secure, just more isolated. The challenges are different.

Containers are just namespaced processes that share the same kernel as the host. A host has access to all container processes, uids, gids, file systems, and networks. Cgroups are used to limit resource access.

To run containers securely you need to understand how to protect running processes. You need to use unprivileged users where possible, drop all kernel capabilities not required, run Linux Security Modules (AppArmor, SELinux) to prevent processes from doing things they shouldn’t; and, run containers based on the smallest image possible, since a container should only have files that are absolutely required to run a process, and nothing more.

Even when you do it all right, in a multi tenant environment, it’s not safe to run all containers on the same hosts.


So if I am understanding this correctly the challenges of setting up a secure linux VM and a container are more or less the same?

The point about multi-tenancy is absolutely understandable. Isn't this an old story from the PHP world with multi-tenancy? I think a good generalization is: don't run on multi-tenant systems if you do anything (!) critical (e.g. authentication or payments)?

But that of course disregards the fact that when people _can_ do something, they _will_ do it even though they shouldn't (like running E-Commerce systems in multi-tenant environments).

Another thought regarding isolation: aren't VMs essentially just running on one host as well? Is that why you said "VMs are _more_ isolated"?


That's a very important difference, because isolation and the associated increase in overall security of the system is a core purpose of any virtualization technology. Docker promises a lot here, but a lot of those promises remain unfulfilled in reality. Yes, containers are inherently easier to break out of than VMs, but even with that caveat there is room for improvement in container security. That alone is enough reasons for me not to be a big fan of docker in production.

But there are other vectors. With a VM you get a whole linux distribution, which of course increases the attack surface, but at the same time you also get much better isolation and that distribution's team of maintainers looking over your software, providing security patches, advisories, a simple way to update the system and so on. On the other hand there exist 'docker best practices' tutorials (not the posted one) that recommend not updating your base system at all in the name of reproducibility. Docker's solution to update management is manual image tagging and manual updates, possibly with help of external tooling. I don't think that's a good solution for that problem.

Imo the overall best solution is to run stuff in VMs and pick a lightweight distro for that.


Just to be sure, isn't a container a whole linux distribution as well depending on your base image? With the same distribution team, etc.?

That not updating part is of course just plain and simply bad advice.

What solutions for update management would you recommend in the VM space?


> It's also worth noting that caddy is an official image, and as such needs to be similarly-shaped to other official images of the same type. At a quick glance, none of nginx, traefik, httpd, or haproxy support running as non-root out of the box either.

That does not sound like an adequate excuse, does it?

I'm afraid none of your remarks makes a reasonable point. Even if you believe that running random stuff as root is ok, if you do not have any reason to do that then why should we mindlessly follow bad practices? If we aim at running a safe system.and there is absolutely zero drawback in following best practices, then why should we continue to make the mistake of intentionally using poor practices?


But that's the thing - if so many projects are using these bad practices, then doesn't that mean that there's something fishy going on i.e. it's difficult to do things the "proper" way and therefore most people simply don't? Why must we follow these best practices at all - why aren't they simply the default way of doing things, e.g. having to specify a user for the process in the container to run, instead of defaulting to root?

For example, look at SSL/TLS certificates - before Let's Encrypt ( https://letsencrypt.org/ ) and tools like either Certbot ( https://certbot.eff.org/ ) or even web servers like Caddy ( https://caddyserver.com/ ) which automate both provisioning and renewing certificates, people used to simply run HTTP. But now, it's easier than ever to use them for transport level security, and the stats seem to vaguely back this up, for example: https://www.welivesecurity.com/2018/09/03/majority-worlds-to...

Why should users inside of containers be any different? What are the factors that prevent safe defaults from being implemented? That's what i don't understand.

Disclaimer: i'm not advocating for running things as root, but rather my claim is that if things are hard to do, they simply won't be done unless absolutely necessary. Any tech vendor should acknowledge this and make sure that doing things the "right way" is as easy as possible.


I think the fishiness is exactly what your intuition is pointing to: root is easy, "proper" is not. There are very few applications where you need true root at runtime, and most can get by with correctly configured user space.

The issue is complexity and lack of appropriate abstractions.

The "Let's Encrypt" of dockerfile safety would be something that makes it trivial to 1) create a user 2) chmod/chown an appropriate spot on the fs, 3) ideally let the author defer these actions to always finalize the image in user mode. That way, you declare at the top that you will do these user actions, but RUN stays as root. Or just provide an SRUN, SCOPY,SADD directive which acts like running with sudo. Then, you can easily extend a layer or base image without being concerned with the details of how user space is implemented.

Also there is no standard or idiomatic protocol for setting up user space in a dockerfile.


To me the argument was pretty simple: if running software as non-root is important--and you clearly think it is--but it is even slightly annoying to write software in a container that doesn't run as root (as maybe it now has to have a special user and internally manage these users; the comment you responded to showed this with de facto evidence) then it seems pretty obvious this shouldn't be an abstraction that people who make containers have to deal with... let them write everything to run as root and then add a damned command line flag to the container runtime to run the container not as root. I don't use containers (I am too deep into security to pretend they are a security boundary and I know too much about toolchains to feel they solve actual deployment problems), so I am frankly shocked this isn't already how they work :(.


This is exactly how the work.

You either specify the user when creating the container (which can have certain implications, like being able to bind to < port 1024... solveable but still something to deal with) or you can for everything to run with a uid mapping such that uids in the container are mapped to higher uids on the host.


It is definitely possible to run nginx as non-root in a container: https://www.rockyourcode.com/run-docker-nginx-as-non-root-us...

Also, all the sentences that begin with '>' were comments from https://github.com/caddyserver/caddy-docker/issues/104, and not from the user gbrindisi


It's also possible to run Caddy in Docker as non-root, but it requires jumping through some hoops when configuring Caddy, for example changing the HTTP and HTTPS ports in global configuration to something else, since those ports are necessary to use for ACME.

If the official Docker image did that, then most users would be very confused and we would get lots of support complaints. A cost-benefit analysis told us it was not worth the headache to run as non-root since the Caddy project values highly user experience.


I believe we setup containers in Docker 20.10 such that containers can bind to < 1024 by default without giving it cap_net_bind: https://github.com/moby/moby/pull/41030


Even if you believe that running random stuff as root is ok, if you do not have any reason to do that then why should we mindlessly follow bad practices?

But the argument is, should it be a bad practice?


> But the argument is, should it be a bad practice?

Are you really asking if running external-facing services as root should be considered a bad practice?


It's a container. The point of Docker is to contain and encapsulate. Lock down Docker.

Maybe not a bad practice, but it should be redundant.


Another useful resource is hadolint (https://github.com/hadolint/hadolint), which not only gives additional recommendations, but also a way to enforce this.


Putting the binary in entrypoint instead of CMD seems like a misuse of what CMD was intended for, no? What if you want to run the image to do some one-off task instead of the normal CMD? (e.g. override the binary)


> Putting the binary in entrypoint instead of CMD seems like a misuse of what CMD was intended for, no?

No. From Docker's official reference:

> The main purpose of a CMD is to provide defaults for an executing container. These defaults can include an executable, or they can omit the executable, in which case you must specify an ENTRYPOINT instruction as well.

https://docs.docker.com/engine/reference/builder/#cmd

ENTRYPOINT should point to the entrypoint, and CMD should store default command line arguments. This allows a container image to be executed as a command line application.


Then you override entrypoint. docker run --entrypoint=foo

Docker compose has the key word as well.


I suggest running hadolint, which scans for most of this.


+1 for hadolint. There's also https://github.com/goodwithtech/dockle and https://github.com/aquasecurity/trivy if you want more security emphasis.


Does anyone have an example of adding a non-root user in the Dockerfile as mentioned, that they like to use in a cookie-cutter approach?


Docker's official article on best practices has a section dedicated to showing how to add user accounts.

https://docs.docker.com/develop/develop-images/dockerfile_be...


I have one here: https://github.com/nickjj/docker-flask-example/blob/ca1d4849...

The basic idea is you create a user in your Dockerfile, switch to that user with the USER instruction and now future instructions in your Dockerfile will be run as that user.

Also when COPY'ing you'll want to add --chown myuser:myuser too.

The above Dockerfile shows examples of all of that.

I'm not a fan of customizing the UID / GID because then in development with volumes you can get into trouble. Technically you could set UID + GID as build arguments but in practice I never ran into a scenario where this was needed because 99% of the time on a dev box your uid:gid will be 1000:1000 and in production chances are you are in control of provisioning your VPS so your deploy user will be 1000:1000 too. Also you probably won't be using volumes, but if you did they will work out of the box with the above example.


Maybe you could be interested in this article: https://www.rockyourcode.com/run-docker-nginx-as-non-root-us...


Here's my approach: https://www.codrut.pro/snippets/docker-images-basic-ruby-set...

[Edit] This is what I use for local development docker images.

Note: I disagree with this article, none of the things listed are "best" practices in my opinion... more like random practices somebody prefers. Use what you like.


OP's version:

> UIDs below 10,000 are a security risk on several systems, because if someone does manage to escalate privileges outside the Docker container their Docker container UID may overlap with a more privileged system user's UID granting them additional permissions.

> [...] there may sometimes be reasons to not do what is described here, but if you don't know then this is probably what you should be doing.

Your version:

> Use what you like.

> UID=1000


I should probably also mention this example is for docker images that I use for local development with a shared file mount having the same UID/GID as your host, so you don't have to `chown` all the time while developing.


> more like random practices somebody prefers

Surely, if "X" has posed a security flaw in the past, and "Y" achieves the same as "X" without being vulnerable to the exploits (e.g. the UID advice), then "Y" is objectively better and not a matter of style, is it not?


Podman solves the UID issue quite nicely with user namespaces.


I agree with never using latest, but disagree with the use of `major.minor` over `major.minor.patch`. Subscribe to all of your core dependencies release/security notes (OS, language, framework) and bump to the most recent patch versions as soon as possible. This gets you in the habit of updating dependencies regularly which is one of the most important things to do when trying to run secure services.


Very nice! Taking care of Dockerfiles is such a low hanging fruit to prevent so many issues down the line.

If you are interested I’ve collected more best practices to prevent more common security issues: https://cloudberry.engineering/article/dockerfile-security-b...


You give examples of what not to do, but not what an alternative is. This makes the guide useless for any intended purpose of educating.


I find this comment a bit harsh as the blog was informative for me. If there are multi ways to achieve a target, informing about which path won’t work is really helpful.


It covers WHY not to do things which is IMO more valuable at least than the many blogs "How to do X" in 6 steps the have no documentation _at_all_


Shameless plug: if you want to learn more about building better Docker images (beyond using hadolint, which is an excellent quick win) I put together a comprehensive free email course about that topic over at https://vsupalov.com/courses/


On recommandation, from the official Docker documentation, I frequently see people ignore is the ‘do not use a supervisor process in your container’. Stuffing something like Tomcat into a Docker container is trick, you risk having your application crash, but Docker doesn’t handle it, because the Tomcat process is still running. You also see people use different supervisor processes to have multiple application in one container, e.g. a uwsgi app and nginx in one container, while that should be two seperate containers.


> (...) but Docker doesn’t handle it, because the Tomcat process is still running.

...and that's why no one uses it that way, and instead uses Docker's built-in support for healthchecks.

Check out the HEALTHCHECK entry in Docker's reference

https://docs.docker.com/engine/reference/builder/


All our containers expose meaningful healthcheck endpoints. We practice STONITH for fencing. And we define service health purely using SLIs that reflect actual service usage (or ability to use the service). As such, we do not rely on Docker HEALTHCHECKs. We don't want to rely on a container measuring its own health; we often had issues with developers putting the equivalent of "return true;" inside their checks…


> And we define service health purely using SLIs that reflect actual service usage (or ability to use the service). As such, we do not rely on Docker HEALTHCHECKs. We don't want to rely on a container measuring its own health

I'm not sure you fully grasp the issue or understand how Docker works. Docker's healthchecks are not "container measuring its own health". Docker's healthchecks are a standard interface that was designed to allow container orchestration services to poll containers to check if they are still in working order.

From your own description, it sounds like you tried to reinvent the wheel, and did it poorly.

And I'm sorry to break it to you, but if you have developers faking health checks in production then your choice of container runtime or container orchestration system is not the problem you need to worry about.


But they don’t.... they should, but don’t. I’ve helped plenty of client either redesign their containers, or implement healthchecks.

Note that adding the healthcheck isn’t enough. Docker won’t actually do anything if a container is unhealthy. You need a seperate process to restart the unhealthy container.


We use hashicorp nomad, which handles the restart of unhealthy container. I think if ppl are using large number of containers (scale of more than 1000) then they would have some sort of tool that manages them


Is tini still considered best practice? I remember reading that it was included and now you can just docker run --init (init: true in compose)


From TFA:

> Unfortunately, although Docker did add it natively, it is optional (you have to pass --init to the docker run command). Additionally, because it is a feature of the runtime and e.g. Kubernetes will not use the Docker runtime but rather a different container runtime it is not always the default so it is best if your image provides a valid entrypoint like tini instead.


This is what I get for skimming, thanks for pointing it out.

I have noticed that after moving away from tini, some of my containers take a lot longer to shut down, despite the init flag. I think this might be related.


What about consolidating `RUN` stuff to as few lines as possible so that your images aren't gigantic?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: