... Why?
The scalability difference (in terms of disk space) between having multiple services per container and many containers is, in practice, zero.
That's because of layered filesystems and most of the language runtimes and other images pulling from a small set of base images.
For example, if you have two node applications, the base "nodejs" image is 633MB in size. If both of your applications have "FROM node" at the top and proceed to add different applications (say 2MB each), you'll find the "node" portion is totally shared and only the diff, you copying in your application's files, is different. You'll only store a single copy of the 633MB node image + your two application image's diffs, for a total of 637MB.
That number is the same if you bundle the applications (vertically scale as you say it).
This is also sorta true across language runtimes too!
With some creative diffing, I experimentally tested the difference between the ruby and nodejs images. It turns out there's just under 600MB in common. The ruby image has 116MB of data the node one doesn't, and the node one has ~40MB of data the ruby one doesn't.
Again, you'll end up using a very small amount of disk space.
These savings also apply to downloading images after the initial base layers are downloaded (e.g. docker pull ruby; docker pull node will only download roughly 750MB of data, even though that's two >600MB images).
In any case, it's strange that disk space on the order of a couple gigs is even a scaling concern at all; I'd imagine memory or your application's disk needs would far outweigh such concerns.
My main point, however, is that your discussion of one service per container vs multiple services per container being any different in terms of disk space is rubbish and utterly false.
I would assume the smaller images would also result in a smaller memory footprint for running the images and a general reduction in time of starting images. You seem to know a lot of about Docker, is that a wrong assumption?
The scale which I'm discussing is in the order of at least several hundred docker images per second. Previous attempts at making this work involved keep a warm elastic pool of Dockers. I'm working with at least 11 environments, ( which all have separate dependency requirements ).
Instead of trying to manage a very large pool of Dockers, I opted for a smaller pool with several larger servers to scale the microservices vertically ( using tools like chroot to help try to isolate each service per silo ).
My main issue with using Docker for this was the bulk of the containers. Startup time, RAM consumption, and the size of the images were all causing me issues.
Docker isn't a VM, so the memory usage should be pretty much on par with chroot. The only difference is shared libraries will need to be duplicated in each container (as nothing is shared) and loaded into memory multiple times, but that should be on the order of a few megabytes.
The duplication is worse than that. It's a data structure problem. Docker deals in opaque disk images, a linear, order-dependent sequence of them. The data structure is built this way because Docker has no knowledge of what the dependency graph of an application really is. This greatly limits the space/bandwidth efficiency Docker can ever hope to have. Cache hits are just too infrequent.
So how do we improve? Functional package and configuration management, such as with GNU Guix. In Guix, a package describes its full dependency graph precisely, as does a full-system configuration. Because this is a graph, and because order doesn't matter (thanks to being functional and declarative), packages or systems that conceptually share branches really do share those branches on disk. The consequence of this design, in the context of containers, is that shared dependencies amongst containers running on the same host are deduplicated system-wide. This graph has the nice feature of being inspectable, unlike Docker where it is opaque, and allows for maximum cache hits.
> The duplication is worse than that. It's a data structure problem. Docker deals in opaque disk images, a linear, order-dependent sequence of them. The data structure is built this way because Docker has no knowledge of what the dependency graph of an application really is. This greatly limits the space/bandwidth efficiency Docker can ever hope to have. Cache hits are just too infrequent.
This is only true when you're building your images. Distributing them doesn't have this problem. And the new content-addressability stuff means that you can get reproducible graphs (read: more dedup).
> So how do we improve? Functional package and configuration management, such as with GNU Guix. In Guix, a package describes its full dependency graph precisely, as does a full-system configuration. Because this is a graph, and because order doesn't matter (thanks to being functional and declarative), packages or systems that conceptually share branches really do share those branches on disk. The consequence of this design, in the context of containers, is that shared dependencies amongst containers running on the same host are deduplicated system-wide. This graph has the nice feature of being inspectable, unlike Docker where it is opaque, and allows for maximum cache hits.
For what it's worth, I would actually like to see proper dependency graph support with Docker. I don't think it'll happen with the current state of Docker, but if we made a fork it might be practical. At SUSE, we're working on doing rebuilds when images change with Portus (which is free software). But there is a more general problem of keeping libraries up to date without rebuilding all of your software when using containers. I was working on a side-project called "docker rebase" (code is on my GitHub) that would allow you to rebase these opaque layers without having to rebuild each one. I'm probably going to keep working on it at some point.
Your assumptions are wrong. Glibc is faster (and better) than musl. Systemd is faster (and better) than SYSV init scripts.
Moreover, for example, I can update my running containers based on Fedora 23 without restarting container, by issuing "dnf update", which will download updated package from local server, which is much faster that to build container, publish it to hub, download it back, restart container (even when only static files are changed).
Faster is objective, and in most cases correct. Glibc has a lot mor optimization over the years. Better is subjective and completely depends on your use case:
Similar point to systemd, it is kind of misleading to say that it's faster, it is parallel and event driven, which definitely makes it's end to end time shorter on parallel hardware. And again better is subjective, it's so much more complex that it might not always be the right choice.
Also, why use systemd inside a container at all? There's just one process in there usually.
Docker has very little overhead (apart from all of the setup required to start a container). In principle it has no overhead, but Linux memory accounting has implicit memory overhead (this is a kernel issue, not a Docker issue).
That's because of layered filesystems and most of the language runtimes and other images pulling from a small set of base images.
For example, if you have two node applications, the base "nodejs" image is 633MB in size. If both of your applications have "FROM node" at the top and proceed to add different applications (say 2MB each), you'll find the "node" portion is totally shared and only the diff, you copying in your application's files, is different. You'll only store a single copy of the 633MB node image + your two application image's diffs, for a total of 637MB.
That number is the same if you bundle the applications (vertically scale as you say it).
This is also sorta true across language runtimes too! With some creative diffing, I experimentally tested the difference between the ruby and nodejs images. It turns out there's just under 600MB in common. The ruby image has 116MB of data the node one doesn't, and the node one has ~40MB of data the ruby one doesn't.
Again, you'll end up using a very small amount of disk space.
These savings also apply to downloading images after the initial base layers are downloaded (e.g. docker pull ruby; docker pull node will only download roughly 750MB of data, even though that's two >600MB images).
In any case, it's strange that disk space on the order of a couple gigs is even a scaling concern at all; I'd imagine memory or your application's disk needs would far outweigh such concerns.
My main point, however, is that your discussion of one service per container vs multiple services per container being any different in terms of disk space is rubbish and utterly false.