Immutable Infrastructure is the future

falcolas · on May 12, 2015

> Ultimately in the end it means more time for everyone to focus on business logic versus thinking about deployments, so I’m a fan of that.

I find this amusing: our company is migrating to Docker right now, and over in operations I'm spending more and more time thinking about deployments than I've had to since I started doing operations work.

I have to think about "what major functionality has changed in Docker in the last three months, and how does that impact any/all of our images?"

I have to think about creating frequently updated base images, and how to deploy these updated base images across our regions.

I have to think about making the image layers as small as possible, to limit the bandwidth impact of updating images.

I have to think about making as few image layers as possible to avoid DevMapper IO overhead.

I have to think about externally mapped volumes to keep necessary IO as fast as possible.

I have to think about asymmetric TCP, docker vs host network interfaces, natting, and service discovery.

I have to think about private registries, token based authentication systems, and redundancy.

My developers have been able to focus on business logic since they hired an ops team. Will the amount of thinking go down over time? Some of it: the bits which can be automated fade to the back (at least until the core Docker functionality changes yet again), but there are bits and pieces will necessarily change with every deploy.

dasil003 · on May 12, 2015

The difference is the pain you are experiencing with Docker is potentially transient, whereas the problems with configuration management on the wild internet are intractable, and even unquantifiable. With Docker the scope of changes you have to deal with is at least knowable.

I say "potentially" because I think the jury is still out whether Docker can live up to its promise. Clearly it has a real purpose and value add above traditional configuration management, but the churn is an indication of how hard a problem it is. Plus, even in an ideal case, you still have to deal with security updates and other inevitable version updates and things that invalidate your current images.

KaiserPro · on May 12, 2015

The big problem is that docker/container image creation is now buried deep inside an app's build system. This means that your build system is now a critical part of your infrastructure.

parasubvert · on May 12, 2015

Isn't that the case with any continuous deployment pipeline?

Best to use the CD system to do the deploy, and avoid "hold my beer, I've got to emergency deploy this via SSH" which is high risk for minimal gain.

KaiserPro · on May 16, 2015

No I don't mean pushing patches to your App, I mean critical system libraries.

Because they are baked into the container, to do critical patches you need to rebake the image.

The Docker bit will be mostly painless, unless you've don't something stupid, however the bits around the app can and will changes.

The amount of times I've tried to use a build job from 3-6months ago and it fails horrifically is far too often.

mpdehaan2 · on May 12, 2015

author here.

I'm assuming you have your organization's docker files in soure control and Jenkins, MAYBE you are buildling on top of the Ubuntu base image. So it shouldn't be buried - you should be able to update in source control and rebuild.

(Also, this post is mostly applicable to AMIs).

And 100% absolutely, your build system SHOULD be a critical part of your infrastructure. Not mission critical in that it being down is a problem, but it's how you role out changes.

This also allows you to continuously test the software going into builds, have dependencies, and all of those things.

This is "continuous integration + continuous deployment", but it doesn't have to be continuous. But continuous is a (cough) continuum and there are steps down that road that yield benefits without going all the way.

marcosdumay · on May 12, 2015

What do I gain compared to creating a puppet deployment setup, and set my machines from there?

I see no benefit from Docker (but that's why I don't know it well). Any code that does not come directly from the source control is risky, and docker incentives the worst possible building workflow, that is the developer builds everything on his machine, and pass the binaries along for deployment. That's as bad as developing on production.

dasil003 · on May 12, 2015

Well your build system should all be in version control, so that a red herring.

The issue with a developer building something and then throwing it over the wall to ops is one possible workflow, but I think it's a stretch to say that's what Docker "encourages". It only encourages that if you have inexperienced people doing your build system. Docker actually dovetails nicely with DevOps movement to break down those kind of throw-it-over-the-wall silos, and if you have someone experienced and skilled in charge of your build, this poor workflow won't happen.

So what do you gain? Well, you gain lightning fast, reproducible server deployments. You gain production / dev parity. You gain the ability to develop multi-node distributed systems locally without crippling performance overhead. Obviously this all comes at a significant complexity which may exceed your gains (hence why I say the jury is still out), but the problems it solves are very real, and not acequately addressed by pure configuration or VM technology.

falcolas · on May 12, 2015

> lightning fast

Depending on your bandwidth to the registry (and its availability).

> reproducible

As long as the tag wasn't overwritten by someone.

> production / dev parity

Which has been enabled for years via Vagrant, and by plain virtual machines even before that.

> develop multi-node distributed systems locally

Which won't match production, unless you put in a lot of networking effort in both regions: a set of linked docker containers will behave very differently than a set of docker containers on potentially separate hosts.

dasil003 · on May 12, 2015

I'm not a cheerleader, you don't need to cherry-pick words to off-handedly dismiss. If you don't see the potential advantages that Docker brings to the table in terms of parity, immutability and performance, then you are being intellectually dishonest and there's no point having a discussion.

mpdehaan2 · on May 12, 2015

Yeah, storage seems somewhat rough in particular.

This wasn't meant to be a Docker post. AWS or any cloud that allows messing with load balancers gets you pretty much there too.

However, I like that it provides a localized image builder a lot, and I like docker files themselves a lot, and the idea that it's cross-cloud. I think the other edges can get smoothed out.

Rapzid · on May 12, 2015

Curious this isn't it? It seems that to most application developers docker is merely a faster, different vagrant.. That they have to run in a virtualbox vm because they are on OSX.. The best PaaS experience is pushing code that the PaaS provider then checks out to construct the docker container for you :|

This isn't to say there are not benefits and as a "DevOps Engineer", yay fun stuff people need me to work on!(in addition to security, performance, automation, etc), but hmmm.. There are wins in some areas and added complexity too. Plenty of work to go around.

mpdehaan2 · on May 12, 2015

I don't see this. I think application developers are largely still running out of source control, whether that be in VirtualBox or Fusion or whatever... and Docker files are possibly replacing package build steps.

Maybe it's the case though.

I still think most of the press that Vagrant gets is really "yay, cheap virtualization" attributed to Virtual Box, rather than the workflow - but maybe some people's development environments really are that hard to set up. I like VMware Fusion a bit more.

Still, when a dev env is hard to setup, I like to see automation to do this that doesn't presume I'm running it in a virtualized container. So this could be the same script that a Docker file calls to deploy in production -- whether that's bash, some config tool, etc -- but at least then you are not assuming someone chooses to adopt Vagrant, or is running virtualized, in all cases.

My point in not adopting Vagrant is decidedly minor - I like VMWare Fusion, and I didn't really want to pay for the Vagrant plugin, because my developer machines don't need to be purged that often, and developer-env setup is pretty much running a script, and often the application just runs out of source control. So there's not a lot of dev-env machine rebuild churn.

falcolas · on May 12, 2015

> Docker files are possibly replacing package build steps

Tomato, Tomato. Both are feeding a configuration file into an external tool, and uploading the results.

> I like VMware Fusion a bit more.

So pay for a Vagrant license and get VMware Fusion as a box provider. Vagrant is a lot more than a nice wrapper around virtual machines.

> I like to see automation to do this that doesn't presume I'm running it in a virtualized container

Which is out, unless you develop on a Linux machine.

> I didn't really want to pay for the Vagrant plugin

And here's the meat of the argument. You don't want to pay for a tool, and so you have a fundamental misunderstanding of what workflows are available once you have that tool, so you make do with the development environment you have.

Sorry to hear that.

parasubvert · on May 12, 2015

Or you could adopt a platform that has made a lot of those opinionated decisions in a pre-integrated config.

There's a chance it might not be a fit but then it's a matter of whether it's easier to start somewhere and tweak it vs. starting ground up.

geerlingguy · on May 11, 2015

A couple relevant snippets:

  Caveat - I think images must come from a build system.
  Using anybody’s canned images as a baseline is reckless.

  Config tools still have a place - in building the automated
  systems that allow folks to focus on business logic, but I
  think there are going to move down-stack to doing things
  like automating OpenStack.  The longer term future is about
  cloud/virt systems like this provisioning themselves, but
  they are still yet a bit too complicated in the way they are
  going about it, but I feel it’s coming.

(The post's author, Michael DeHaan, wrote Ansible in 2012, but left the project earlier this year to pursue other interests.)

My thoughts on the subject are much the same—but I think it'll take a few years for the tooling to reach the level of polish we currently have with best-of-breed CM and general provisioning tools. It's not as easy to go from 0-to-infrastructure with a Docker-based workflow (especially if you have your own registry, etc.) as opposed to a VM-plus-CM-based workflow.

mpdehaan2 · on May 12, 2015

author here.

"but left the project earlier this year to pursue other interests" - Hi Jeff! I haven't really shared reasons for leaving the company to date and can't go into that.

I also don't think it will take years and think immutable is there today.

I think we're there today with a bit of skill and minor bit of work on top, but a little bit of work from cloud/virt solutions will push us over the hump - to be able to codify the upgrade-flip idiomatically within cloud/virt systems, so you don't need something else to do it for you. I do look forward to that being surfaced.

Things like Amazon's ECS in particular strike me as particularly interesting.

geerlingguy · on May 12, 2015

> I think we're there today with a bit of skill and minor bit of work on top, but a little bit of work from cloud/virt solutions will push us over the hump.

I think that's true, but my timeline is based on how long I see it taking 'mainstream' enterprise companies to adopt immutable infra—for many of the orgs I've worked with, they've only recently completed a sea change from dedicated servers to virtualized 'cloud' deployments.

So it really depends on who you're working with; many businesses are making the move today, but cautiously. Once the early adopters complete rolling things out to production and working out the kinks, we'll finally get to the late majority/stability, and I won't have to commiserate with the thousands of sysadmins stuck with dozens, hundreds, or thousands of VMs :)

KaiserPro · on May 12, 2015

I quite like immutable infrastructure. It means you have a rollout/roll back with a flip of a dns/connection.

However I see a lot of things about docker and think two things:

1) most people really wany a mainframe 2) shared filesystems really are brlilliant.

Let me unpack that:

a mainframe provides a massive amount of hardware, a hypervisor and scheduler. This means that you can control a bunch of isolated processes through one one system easily. Something that you can't really do in docker yet (fleet/kubernetes arn't there yet. in a mainframe you can use a single script to control an entire system)

Quite a lot of web config management is about shipping packages to different machines. Almost all of this config, and therefore complication can be removed by using a shared filesystem.

For example we have many version of java. For some reason they are in seperate repos. This means that if you want to change version (openJDK to oracle etc) you have to change config. With a shared file system, in the $PATH, you'd just say oracalejava-1.8 $programme or java-1.7 $programme

If you make your deploys to a similar shared filesystem, then to change config is basically a case of ssh machine, kill old process, launche new process. Job done.

This makes each machine a dumb execution system, minimal state is contained in each node.

With NFS coming to AWS, that'll be the model for my new infrastructure. the reason why? because it works. In HPC thats how we've been managing high scale workloads for years.

runlevel1 · on May 12, 2015

I'd like to see a polished common configuration API.

Currently, we have Chef, Puppet, Ansible, and Salt (amongst others) all working to create a sane common abstraction on top of a rainbow of configuration files and command-line arguments.

Each of them have their strengths, and frankly they're all impressive in their own right. But there's only so much you can do to reign in the chaos of the bazaar.

The Docker workflow certainly has its merits. It's great for narrowing dev-prod parity and repeatability. I've embraced it, but I'm a bit anxious that we, as a community, might end up creating bigger black boxes -- Dockerfiles upon Dockerfiles.

mpdehaan2 · on May 12, 2015

Hmm, I think your boxes can be still pretty transparent in that case - if anything more so, as you've got single-bash files versus perhaps directories and directories of content?

In most organizations, all those Dockerfiles should be yours, and in source control.

Relying on the blobs from others shouldn't be neccessary for most applications. Yes - in large organizations, a management problem probably does arise, but I think if you find copy/paste rising up you have to push that down into the base image, basically?

I don't think a configuration API helps to be honest - what you would do is turn them all into a "greatest common divisor" type scenario, where as they feel different because they were trying to be different.

I think you're best solved by organizationally mandating one of them within your company, as painful as that may be.

zdw · on May 12, 2015

> I'd like to see a polished common configuration API.

https://xkcd.com/927/

atsaloli · on May 12, 2015

What effort is this? URL please?

nickbauman · on May 12, 2015

> While PaaS has not become a reality for everyone...

A telling aside. Much of what he wants already exists in Google App Engine (and probably in Elastic Beanstalk, too, but I haven't tried it) — sans the need to manage the underlying OS.

Interesting Google is going the other way with Managed VMs running on top of Kubernetes. Perhaps they will meet in the middle somewhere?

w4tson · on May 12, 2015

I thought Google was running containers in kubernetes? AFAIK you can install Kubs on VMs

kevinschumacher · on May 12, 2015

"Managed VMs" is a service offering within Google App Engine that lets you use any runtime (e.g., Python, Node.js, Go, C) but get the benefits of the GAE PaaS auto-scaling, versioning, etc.

It actually works by shipping a docker container to the Google Container Registry (privately scoped to your project), spinning up a "VM" that shows up in your Compute Engine instances list that you can then SSH into if you want (this is why you're both right; they call it Managed VMs, it shows up in your GCE instance list, but they literally are, I think, running your docker container, probably with Kubernetes under the hood).

nickbauman · on May 14, 2015

This is exactly what's happening as I've seen it.

nickbauman · on May 12, 2015

We're both correct.

ABS · on May 12, 2015

For those looking into this topic only now we interviewed 6 people who have been doing this for some time, collected their answers and published them + did a hangout with them here: https://highops.com/insights/immutable-infrastructure-6-ques... (pure content, no sales pitch anywhere)

akurilin · on May 12, 2015

Very interested in this subject. Is there a comprehensive guide on how I can pull this off TODAY with e.g. AWS?

Darragh_Hayes · on May 12, 2015

At nearForm we're also very interested in this, which is why we've been working on nscale - our own deployment solution. v0.16 was released yesterday featuring autoscaling support.

"nscale is an open toolkit supporting configuration, build and deployment of connected container sets. nscale is ideally used to support the development and operation of microservice based systems." https://github.com/nearform/nscale

Here's an interesting blog post written about the deployment immutable systems. http://www.nearform.com/nodecrunch/deployment-how-to-do-it-f...

eropple · on May 12, 2015

It's only one piece of the puzzle, but at my last company I wrote deploy_thing[1] for exactly this. An immutable config bundle and an immutable artifact tag combine to create a deploy that can be launched and put into production with an AWS auto-scaling group. Need to change a configuration, you burn down the auto-scaling group. This was a transitional step for us before moving to Mesos, but it's a good example, I think, of how to approach the problem from a minimum-viable level.

(I'm currently playing with moving it to OpenStack.)

[1] https://github.com/eropple/deploy_thing

bender80 · on May 12, 2015

The URL is throwing 404, probably due to it being private.

Would love to see it. I am working on something similar.

zerocrates · on May 12, 2015

It looks like it was supposed to a hyphen, not an underscore:

https://github.com/eropple/deploy-thing

eropple · on May 12, 2015

Whoops, yeah. My bad, I switched from an underscore to a hyphen halfway through the project. It's still inconsistent, because we had a case of company-going-unders and I haven't needed it since.

parasubvert · on May 12, 2015

With containers, there's http://lattice.cf , which deploys Docker containers in this style with routing and load balancing; it's a simplified version of Cloud Foundry.

With VMs, there's http://bosh.io which has a bit of a learning curve but is ground up designed to be about designing & deploying immutable software releases

olalonde · on May 12, 2015

Deis is relatively simple to install and get started with (http://deis.io). They have a CloudFormation config to deploy on AWS.

yo-code-sucks · on May 12, 2015

Make a dockerfile, make sure it works and you understand it. Then go into Elastic Beanstalk and click new application and there is an option for loading a dockerfile, boom. For you this may work, for me it's more complex.

axelfontaine · on May 12, 2015

If you are on the JVM, our service does exactly this today: https://boxfuse.com

No base images, minimal immutable images generated on the fly with only what your application needs, directly deployable on VirtualBox and AWS, with zero downtime updates using ELBs or Elastic IPs.

andrewchambers · on May 12, 2015

Stateless servers with kubernetes seem like a great thing to do this sort of thing with.

nickbauman · on May 12, 2015

Why go that low, though? Why not go with Managed VMs first?

andrewchambers · on May 12, 2015

Kubernetes gives you freedom about where to run your stuff since it works on multiple providers and your own hardware. Once you learn how to set it up, it isn't too bad.

Google managed kubernetes cluster gives you the best of both worlds anyway.

KaiserPro · on May 12, 2015

A decent VM system will give you HA and load balancing (ie migrating VMs from overloaded hosts) for free.

Something docker/kubernetes still can't do.

Its also worth bearing in mind that the "vm penalty" is almost trivially small compared to the cost of dev time.

deathanatos · on May 12, 2015

> […] (ie migrating VMs from overloaded hosts) for free. > Something docker/kubernetes still can't do.

This doesn't solve issues arising from the VM outright crashing though, does it? (I don't see how it could.) At the end of the day, I still need to solve that myself, and if my application can gracefully withstand being crashed, then docker/kubernetes can migrate from overloaded hosts: just crash container.

I could probably also throw in something to gracefully die on a SIGTERM, but I feel like once you can withstand a crash, throwing in a SIGTERM is fairly straightforward. The only difference is that one trains traffic, the other just outright 500s it. (Though, perhaps there's more work involved there than I realize.)

Ultimately, I want crash resistance. I want to run Chaos Monkey. I can't, right now, because I know all too well what would happen.

KaiserPro · on May 12, 2015

In HA mode, you specify that this VM must be running. If its not do a list of certain actions.

But the thing is, making your app crash gracefully can be incredibly hard, and can almost always take a lot of dev time.

its cheaper in terms of planning and execution to have two have an HA pair defined in the cluster, and another HA pair defined in software (i.e. behind a varnish node or the like)

You can have virtual IPs so that if a node is not responding the HA partner picks up the traffic. If thats properly locked up, the hypervisor can kill the machine and restart it.

as for: >migrate from overloaded hosts: just crash container.

try doing that with a clustered something, the host becomes overloaded, kills the app. not only does the traffic shift off to other nodes(which might already be started to overload), it takes resources to sync back into the cluster.

You don't want this behaviour as it causes failure hysteresis. (that is, as soon as you start to reach breaking point, the whole thing collapses and refuses to come back up, without stopping incoming traffic)

nickbauman · on May 12, 2015

FWIW you can run GAE apps on open source implementations such as AppScale.

https://github.com/AppScale/appscale

dorfsmay · on May 12, 2015

Yes, I agree, but the issue is still how do you build that infrastructure (be it an AMI, a docker image etc..)?

Docker file sare simplistic, so you end up using the same tools that were available before (ansible, salt, bash etc...).

What we're doing right now is move what we used to do at "configuration time" to "build time".

mpdehaan2 · on May 12, 2015

My point was you don't need to use the tools you use today, except to do the version flip (and probably to deploy the subtrate - unless you are doing a cloud-image solution and not trying to do something on-premise). And this will get better when cloud providers embrace higher level cluster operations more natively.

And build time is the way to go for everything you can push down into build time. There still may be some need for service-discovery type applications.

dorfsmay · on May 13, 2015

Agreed, moving more and more stuff to the build, lighting up an image should be nearly instant, so we can do auto-scala etc... so no more "pip install -r" when bringing up an image, that's now part of the build.

But the landscape for tools to build that image, be it a docker image or an AMI, right now, is the same. I find myself "going back" to ansible but now using it to build images rather than change online VMs (could be any tool, I know ansible and it suits my way of thinking).

tomc1985 · on May 12, 2015

So by immutable he means... systems you don't patch? That's it?

kolev · on May 12, 2015

Wrong! It's the present!