Hacker News new | past | comments | ask | show | jobs | submit login
Scalable and resilient Django with Kubernetes (harishnarayanan.org)
96 points by hnarayanan on April 2, 2016 | hide | past | favorite | 54 comments



I'm really struggling to understand why most developers feel they need all this deployment complexity.

I've run django and other web deployments with simple shell scripts and occasionally some python to glue it together.

Most recently I'm running a django web server and some other custom stuff (a real time stateful server, database, nginx, etc).

A somewhat complex setup (or at least as complex as it needs to be), and I have no need of docker, kubernetes. If I want a new server, I just change a few parameters and run the script to deploy. It's straight forward and without magic.

The deployment flavor of the month before were DSLs like ansible, puppet, etc. Did these solve real problems? I found they added complexity without adding much. How is the new generation different?

Caveat: I'm not google, and I don't deploy huge server farms. There is a use case but the vast majority of us aren't that.


I'm the author of the original piece.

I echo exactly what you say in a giant caveat way up top in the piece. :)

Most of the first half of the article beyond that point basically tries to motivate why you'd want to try this beyond just using a classical VM approach. But the basic idea is that it raises your level of abstraction from working with machines to working with your application components (on abstracted hardware).


There are a lot more advantages on abstracting the hardware from the app. First of all, you don't need one machine per service for scaling the app. All services run on the cluster, and you can add or remove machines, as needed. (On night you can reduce your cluster size, and grow it during the day).

But also, you can use the same cluster to run your preproduction enviroment, or use it for CI/CD. (Check deis) As an example, we have an small cluster with two machines and on it, there are:

- the main app, - the preprod environment, - two more feature branches (that had to be reviewed) - And also commercials can deploy playground environments to make demos and trials.

All are independent apps, sharing resources, on the cluster.

As a summary, on daily, we maintain 7 to 10 independent apps instances, and we do regular updates (as new revisions arrives). As an example, the trials or feature branches, are single pods (all included, db, redis, app and worker).

Kubernetes is not for deploying pet projects, but as soon as you start working on a real project is a must. You can manage complex deploys with a lot of services.. keeping the costs as lows as two n1-standard-1 machines. And as a plus on gke, you got monitoring, and logging aggregation.


I think the questions was "is this complexity necessary?"

If the same (deployment and upgrades for production, staging and testing environments, for any number of hosts) could be achieved with a small shell script - doesn't this mean that cluster thing isn't really useful?

I think a lot of "real" projects (smaller ones) work perfectly well without any complicated cluster management tools, and aren't really hindered by lack of what those offer.


Excellent points.


Seems like there's an opportunity there. It doesn't need to be "all this deployment complexity". Just currently it is. In fact it shouldn't be; the kubernetes level of abstraction has far less going on than the VM level.

My guess is that 2-3 years from now it will be easier to set up a kubernetes cluster than to set up even the single VM at the top of your piece, and a corresponding dev env will run seamlessly on Windows, Linux, OSX. The average user won't even know or care what OS they're running in production; it'll just work (think Heroku++)

Of course once it becomes ubiquitous, then containers become the new (language agnostic!) way to deploy "libraries", accessed by HTTP, and DLL hell is just moved up an abstraction level. So what have we really achieved....


Right - I saw that.

The problem I have is that most people feel like theyre doing 'serious work' so they need a serious solution when the reality is that it really is just a small percentage of companies - not usually even successful startups, for instance - that require this sort of automation.

Mostly, it adds complexity rather than reduces it.

I know that sounds like we're getting into the weeds, but if you consider that most companies are taking on additional complexity because of this, it becomes a problem.


Again, I don't generally disagree with you. But as our team is growing at my day job, I am finding more and more that we're getting bitten by some developer's "It works on my machine" or a random npm failure at deploy time. And this makes me believe that having a prebuilt (immutable, hermetically sealed) artefact that I can later deploy is the way to go.

Containers solve this problem nicely.

And once you're going down the container route, you need something like a Kubernetes to easily schedule and network them.

My personal site -- the one you're reading the article on -- is literally just a static nginx server on an old Linux box (with the files copied in by rsync). Even the curmudgeon in me now thinks it makes sense to use solutions of different complexity for different situations.


I've run production at a 'successful startup,' a million node corporate fleet, and a personal toy project or six, and consider Kubernetes the single most important work being done in the field right now. Docker, no (rkt is better architected). Kubernetes, yes[0].

Don't think of Kubernetes as a replacement for Puppet. Think of it as distributed init/supervisord with service discovery, which you are clumsily building with Puppet. Your automation ends up writing init, units, monit, supervisor, and so on, right? (You shouldn't be deploying apps with Puppet anyway because CD is not CM, but nothing like Kubernetes or Mesos really existed in the open source world until now so a whole generation learned to do it with Puppet.)

Kubernetes exists for when your scripted machine fails or you exceed the capacity of your scripted machine and need to horizontally scale. It is not a deployment tool per se; that's only part of what it does as a scheduler and resource manager. Every project, from personal to Fortune 500, needs to plan for those scenarios. I'm not saying Kubernetes is always the answer, I'm saying it solves a problem that every project universally has, despite your claims. The bonus of working that way is now you have an API to your machines and can treat them as a single unit of resources. Kubernetes is about half of building your own mini Heroku.

You've heard snowflakes versus cattle, right? Your way is for snowflakes. Kubernetes is for cattle. The threshold where one becomes more productive than the other is the constant debate, and as an SRE, my opinion bucks must people who talk about this on where that threshold lies. If you self-identify as a sysadmin you'll likely have one opinion, devops another, and then you have my group of crazies that generally want to crush all snowflakes. I can speak with experience that our crazy SRE ways scale pretty well to the hundreds of thousands of nodes case but also work quite well for toy systems.

Subscribe to everything CoreOS is doing. There's a spectrum of quality there, but they're running with the programmable infrastructure ideal and have largely bet the company on the Kubernetes ecosystem.

[0]: Kubernetes makes some interesting choices versus Borg and I think they're going to have trouble scaling it to that, and Borg really shines at Google because of the global filesystem layer that exists on every machine (they don't realize this and are doing gymnastics around storage in the open source side), but if they can mature Kubernetes it'll be a solid building block for platform development and eliminate a lot of the clumsy scripts and automation that you are defending and push us toward programmable infrastructure.


So Kubernetes is another layer on top of basic AWS primitives?

I already have AMIs, autoscaling groups, health checks, ELBs, etc. Why am I adding another unnecessary layer of abstraction?

Disclaimer: Devops/sysadmin who would like to get off the hype train and get real work done.


Not everybody is on Amazon and some of us touch our gear. You should be excited because it gives you a rapidly-improving path off of Amazon if you've gone all in on their vendor stuff. Kubernetes helps build that stuff for any vendor. Multiple startups are working on AWS in a box. That was the promise of OpenStack but now you don't need virtualization so it's automatically an order of magnitude better. Maybe two.

People don't realize how deep Amazon's hooks are at scale. We were dropping half a million USD a month at that successful startup I mentioned and could do the same work with four or five racks of hardware, storage included. I bet you could cut your opex in half on something else, and it's a safe bet because (a) I've seen it be resoundingly true even coming from three year reservations and (b) half is actually a conservative estimate. I am floored by how much money the industry throws at the fundamental inefficiency of multitenant virtualization atop AWS.

What a strange argument, considering Amazon a primitive. That's flirting with vendor koolaid. Physical and DigitalOcean/cheapo deployments are an immediate counterexample, too. Hell, if network latency is low enough, you can straddle both and schedule Kubernetes pods on whichever provider is cheaper on a given day. Spot instances, too. Lots of possibilities.


> What a strange argument, considering Amazon a primitive.

Note I said "AWS primitives", not AWS as a primitive. I think they're overly expensive unless you're using them for prototyping, as you should eventually move off to your own gear once you know what your load/compute/storage profiles look like.

I appreciate you pointing out that this is most valuable when you're _not_ running in AWS (I work at a startup that wants to use this tech in AWS, hence my not understanding why we'd waste additional abstraction for little additional benefit).

Disclaimer: I _want_ to move off of AWS to our own physical gear at some point, but our dev team is married to it because they're afraid of physical infrastructure (but its okay when critical services like EC2, autoscaling, and IAM aren't working for several hours apparently).


I've been tempted more than once to found a consultancy just to help get fleets off Amazon, and take payment as one month's difference in opex. I feel your pain.


I had exactly the same thing in mind... I think there's a market for that.


If you ever interested in a partner...


The real beauty of a system like k8s or mesos is that you can do cross cloud and/or on-premise stuff with the same management system / tooling. Amazon is great, but what about when you want to move to GCE to arbitrage price differences for certain services?


https://libvirt.org/

> but what about when you want to move to GCE to arbitrage price differences for certain services?

Already doing so without additional (unnecessary?) tooling, like Docker or Kubernetes (build images at respective cloud providers, use existing orchestration tools to spin up or terminate spot (AWS)/preemptible (GCE) instances).


It's essentially legos vs play-doh. I'd not put kuberbenetes on top of AWS, but instead of, on whatever cheap linux boxes you can find. That said, AWS has well thought-out, integrated tools you need. Kubernetes is still a bit of a joy ride. That should change over the next year or two.


>Don't think of Kubernetes as a replacement for Puppet. Think of it as distributed init/supervisord with service discovery, which you are clumsily building with Puppet.

It's a step up from puppet for sure but I don't really see the benefits of using it over ansible.


cmon jed, just because you spent 3 months at google doesn't make you someone who ran "a million node corporate fleet." stop waving this around everywhere you go.


As someone who does pretty much this for my personal sites, and has a level slightly above at work, I can say anecdotally that your caveat is important. The way you describe works well for me on personal projects, but at work we're really starting to see the benefit of the automation and reproducibility that things like Docker and Kubernetes provide. Ansible is great, but it's still easy to end up with non-reproducible environments, and that's something we've been bitten by quite a few times.


Can you expand a bit on how you still end up with non-reproducible environments? Not that I doubt it! Just would like to here some details, or a specific example.


Usually when you ssh into the machine to fix something broken then forget to backport the changes to the ansible script.


The same thing could happen with docker, too.


You usually can't ssh into docker container because:

1) it usually doesn't have sshd

2) you change would be thrown away at nearest container restart or migration.

The only way for it to persist is to do the right thing, i.e. apply the change where you should and have the container rebuilt.


You don't need ssh to get into a docker container.


Definitely what luos said, but also you typically run repeatedly against the same machines, with changes on each run. This can lead to having a different environment depending on whether you ran once or multiple times. Conversely with Docker, you're essentially recreating from scratch every time you build the image, with some caching to make it fast though.


How about just when your current machine (e.g. Mac) has a different supported version of Python?

Disclosure: I work at Google on Kubernetes.


Umm, you install the version of tools and libraries you've standardized on, and check them in your unit tests?


It's an interesting question, but the general problem is that even small projects have problems when bringing things to production. For example, let's use Django:

- How do you make sure the version of Django on your machine matches the one in production? - What if you do brew update on your Mac and the package for your repo in production is out of date? - How do you roll out a new version of your app (even allowing for downtime) and make sure it did it "completely"? - How do you make sure your app keeps running even if your VM goes down (let's say you need to do a kernel upgrade)? - How do you use one password for your local development database and one for your production database (and not hard wire it into code)?

These are super common problems - you're going to have to solve them no matter what. Lots of folks do this with scripts, but one small change can cause all kinds of annoyances.

This is why containers have taken off, and all of these are addressed via the combination of containers & Kubernetes. It's a different way of thinking about development, but I've been able to accelerate my velocity substantially once I adopted them. I'd argue that the complexity that you accept as 'normal' for a small site is an unnecessary tax, but I'm biased :)

Disclosure: I work at Google on Kubernetes.


All of the things you've listed can be done with simple, small, and straightforward scripts.

To check for versions of Django, and other libraries, you write simple scripts to test for them. Run them in unit tests, and as a pre-check stage in deployment scripts.

Our apps do keep running if any single VM goes down. We don't need Kubernetes or anything to handle that. Just a load balancer and an ElasticSearch cluster.

Dev hosts obviously have different passwords read from config files only present in development systems.


I think, the only difference between Puppet/Salt/Chef and shell scripts, is that CM tools already have a collection of common building blocks for stuff like "check that package uwsgi is installed" or "check that uwsgi config for application 'mywebsite' matches this template", plus dependencies between those blocks (including dependencies on config file templates and automatic reloads on changes). I tried both approaches and can say that I feel that Puppet and Salt actually save a bit of time, by not having to inventing grep/sed/awk invocations to manipulate files.

They're a bit more verbose than `scp ./files/nginx.conf $HOST:/etc/nginx/` or `ssh $HOST sed -ie 's/.../.../ file.conf' for most basic stuff, but way less verbose for anything even a bit more complicated.

I don't use Docker or Kubernetes, so no idea about those. But I think configuration management tools can save time. But their usefulness is debatable if, say, all you need to deploy is to scp and untar a simple archive then poke init system to spawn or reload a daemon.


The problem with these tutorials is they still tell you to run the database yourself to emphasize the 'component' or one solution fits all idea of kube/mesos.

However, in practice, if you're using AWS or GCloud this is usually a bad idea - just use the managed database solutions provided. They have things like backups, snapshots, restores, upgrades, HA, monitoring, and alerting baked in. These are non-trivial to do yourself.


I'm not sure I agree. The managed database services provided by Google and Amazon are both expensive, and lack configuration options. I'm sure they're already relatively well tuned, but we have tuned our Postgres installation to our workload more than they would be able to.

Running our own Postgres install costs roughly half what it would on AWS, takes very little engineering time to maintain, and can be tuned as needed.


Agreed.

When the costs go down and persistence-as-a-service is provided by more cloud hosts, it will make more sense but right now it often doesn't.

I wonder about opportunities to provide some of these services as a white label to hosting providers like digital ocean.

Is anyone doing that? What stops them?


If we're talking about databases specifically, it's either a really easy problem to solve that it's difficult to charge a lot for (i.e. just a Postgres/etc running, with some way of modifying the config), or it's really difficult and requires a lot of resources to get right (i.e. scaling to many nodes, still not really solved well in Postgres/MySQL).


The selling point to the provider would be that it increases stickiness with their clients - that's big - they don't have to charge more.


I suspect that -

If the service is easy enough to provide (this means not just ease of deployment but also up-time, maintenance, etc) while fitting into their margins without issue, it doesn't provide stickiness.

If the service is sufficiently complex to provide stickiness, they either need to charge for it, or it will likely eat heavily into their margins.

Neither of us know the specifics of Digital Ocean's business plan, but I strongly suspect that were it as simple as you put it, where it's hardly any cost to them, and a huge gain due to increased stickiness, they'd already be doing it, or be moving toward it. And they might be. However, if they aren't, then it's hardly a mystery as to why they're not providing such services.


I don't know if it is a "really bad" idea. I've run stacks on both AWS and GCP, and used both RDS and CloudSQL. They are good solutions and I definitely agree that looking at them first is sensible. They will fit the bill for a lot of applications. However you can get a pretty high level of fault tolerance using kubernetes services mounted on persistent volumes. You don't get all the advantages of a containerized solution but you do get dependency management and fault tolerance.


I'm the author of this piece, and I agree with you in general.

I am personally experimenting with this within the context of containers because I'm trying to construct something like Vitess[1] from first principles as an intellectual exercise.

[1] https://github.com/youtube/vitess


You did a nice job on this walk-through, so thanks for the effort! I was interested to see that you ended up in the same place I have in the past when deploying databases to kubernetes clusters. A container mounted on a PD is a bit of a hybrid but it works well in practice. Hopefully something like glusterfs or flocker will end up giving us a better way to manage and deploy data volumes.


Thanks for the blog post, it was really informative, will definitely be sharing it at work.


Thank you! And please let me know of ways I can improve it. I already have a couple of ideas that I've indicated in the conclusion, but more the merrier. :)


You likely also don't need to run Django yourself either: https://cloud.google.com/appengine/docs/python/cloud-sql/dja...


The built-in Django libraries for Django on App Engine are severely outdated. I have new guides that just vendor in recent version:

https://cloud.google.com/python/django/appengine

However there are a lot of tricky issues with Django on App Engine so I would also look into Djangae:

https://github.com/potatolondon/djangae

As far as Kubernetes, the OP and I had some independent discovery and started working on similar material at the same time. I also have some Django/Kubernetes content, both CloudSQL (MySQL) and Postgres in Kubernetes:

https://cloud.google.com/python/django/container-engine (CloudSQL)

https://medium.com/google-cloud/deploying-django-postgres-re... (Postgres)

My biggest reason for running the database in Kubernetes is a) CloudSQL doesn't currently support Postgres and b) it's interesting. Seems like OP had similar motivations.

Obvious disclosure: I work for GCP.


Parallel explorer and OP here. :)

I cited your work as additional reading because I really enjoyed watching your talk on this recently! While I'd figured out a bulk of this stuff out, there were a few things I learnt from it too.


Oh cool, I missed your citation. I am still figuring some things out as well so I am going to check out what you've done.

Definitely happy to have you contribute to the ecosystem. It seemed like a content gap that I wanted to fill in, but at Google we would of course prefer to have a healthy external ecosystem of people writing content like yourself!


If you're running a vanilla Django site with no esoteric libraries with special requirements (like numpy/scipy that need compilers etc. to get installed), this is definitely the correct way to go. But if you do have non-standard requirements that fall out of the rails of your PaaS, you need to consider other solutions.


If you are interested in mitigating this take a look at Flynn [1]. It's analogous to Kubernetes but is actually much simpler in this case as it can do the Heroku style deployment but also ships HA database appliances with backups out of the box.

[1] https://flynn.io


Putting the 'should we or shouldn't we' aside I'd just like to say thanks for putting together an article that goes all the way from a 'classic' setup to something running on a host.

I often find that posts like this are a bit too meta and in trying to write things in a general way they leave out some critical step which makes replicating their ideas difficult.


OP here. That was the plan and I'm glad you appreciate it. :)

I wanted to start from something I assumed people knew, and tried to motivate why one might want to improve on that. Only then did I introduce the new solution.

A lot of tutorials I found jumped too quickly to the 'how' without spending enough time on the 'why' I should care.


Hey Harish, thanks for your talk at the March London Django Meetup.

I'm starting to play around with Google Container service as you suggested, having a good time.


Welcome!


It's interesting that there seem to be two parallel and opposite trends in applications deployment going developing right now.

One is containerizations, where developers are responsible for maintaining fat application stacks that can easily be redeployed and moved around.

The other is towards serverlessness with things like [Django-Zappa](https://github.com/Miserlou/django-zappa), where scalability is handled automatically by cloud providers.

My bias is quite clear - deploying apps should be easy.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: