Show HN: Gitkube – Deploy to Kubernetes Using Git Push

praseodym · on March 13, 2018

We’ve been using a similar solution for a while now: pushes to GitHub trigger Travis CI to build a Docker images. These get pushed to Docker Hub or Quay, which triggers a webhook to Keel.sh[1] running in our Kubernetes cluster. Keel then updates the Kubernetes deployment to perform a rolling update of the application.

1: https://keel.sh

deforciant · on March 13, 2018

Thanks for the shout-out! Keel author here. I have created Keel exactly for these workflows where something (Travis CI, Docker cloud, Google Cloud Builder - my favourite due to the pricing and proper speed) builds an image and then we would want to trigger a deployment update.

I was running workloads on GKE and I found that while it's easy to get images being built, there was no straightforward way to automate deployment updates. My goal was to tag a release on Github and get a new version deployed in my environment. After searching for similar tools and couldn't find anything lightweight (spinnaker or jenkins were way too big!) I ended up building Keel.

Here's an example if anyone has this problem https://keel.sh/v1/examples/enabling-auto-updates.html.

antoncohen · on March 13, 2018

Keel looks nice, thanks for making it!

Does Keel include any ability to rollback to a previously deployed version? How to you recommend handling that use case?

Keel looks similar to Weave Flux (https://github.com/weaveworks/flux). Have you looked at Flux, are you able to compare them? At first glance it looks like Keel is probably easier to setup, but Flux is more feature rich (edits k8s .yaml files in git with updated image tags, has CLI that supports listing deployments and images, can release an old image).

dankohn1 · on March 14, 2018

CNCF is currently tracking 27 offerings in the CI/CD space, of which 13 are open source: https://landscape.cncf.io/grouping=landscape&landscape=ci-cd

cheez · on March 13, 2018

What is it we can achieve with Keel that we can't with Terraform?

alexnewman · on March 13, 2018

Or by awk for that matter. Sorry to be silly but terraform solves a completely different problem, which is what happens when you have disparate cloud resources you want to manage. For instance cloud network resources. Using it to manage containers is... Not it's use case.

alberteinstein · on March 13, 2018

Gitkube is simpler since everything is contained in the cluster itself. Image is built and rolled out from within the cluster right now. (gitkube contributor here)

tirumaraiselvan · on March 13, 2018

Hey!

Just curious, why not have Travis CI do a kubectl set-image to rollout after pushing the docker image?

praseodym · on March 13, 2018

Our Kubernetes API server is only reachable from the internal network, which complicates things. Also, setting up Kubernetes RBAC to allow Travis CI to do a kubectl set-image isn’t trivial either.

The best alternative would be to have Travis CI SSH into a box with kubectl. In any case it’s bit more complicated than the setup with Keel we currently have.

ecthiender · on March 13, 2018

> Also, setting up Kubernetes RBAC to allow Travis CI to do a kubectl set-image isn’t trivial either.

Wouldn't setting up Keel on a Kubernetes cluster be the same effort (permissions, RBAC etc.)?

praseodym · on March 13, 2018

Keel has permissions to update all deployments in our cluster. Because it runs within the cluster instead of outside (like Travis CI), and the only interaction with Keel is through webhooks, there is no need for finegrained RBAC.

The webhooks only tell Keel that a new image is available, they cannot modify other parts of the Kubernetes deployment directly.

tirumaraiselvan · on March 13, 2018

On that note, Gitkube supports RBAC as the Remote objects are namespaced. This means two things:

1) Remotes are isolated - have different authorizations for say dev and staging remotes

2) Deployments are isolated - A Remote can only deploy apps in its own namespace.

hashfyre · on March 14, 2018

We are doing the exact same thing with Drone CI, however RBAC on the cluster is needed.

RBAC isn't too complicated though:

- get on specific app namespaces

- get, list, watch, patch, update on deployments

sph · on March 13, 2018

Do you know of a tool like Keel, but for Docker Swarm?

deforciant · on March 13, 2018

Hi, if you would want Keel to support Docker Swarm, please create an issue here: https://github.com/keel-hq/keel/issues.

I planned to add Docker Swarm support from the beginning but a lot of people said that it's unnecessary. It's quite simple to add new providers, they need to implement this interface:

  // Provider - generic provider interface
  type Provider interface {
 	Submit(event types.Event) error  // <- this is where new image event comes in
	TrackedImages() ([]*types.TrackedImage, error) // <- provider needs to tell some triggers which images to track (registry watcher)
	GetName() string
	Stop()
  }

Even though Keel was created for Kubernetes, later on Helm was added where Keel talks directly to Tiller and doesn't really use K8s API.

BretFisher · on March 15, 2018

Maybe https://github.com/v2tec/watchtower, but Keel looks cool.

sph · on March 15, 2018

Doesn't support Swarm, but thanks for the pointer.

lima · on March 13, 2018

Red Hat's OpenShift[1] Kubernetes distribution can do this too - you even get a webhook for GitHub copy and paste.

Kubernetes is a toolbox, not a product. There's so much you have to set up yourself (build system, registry, security, logging stack, metrics...). That's fine if you're in the business of selling Kubernetes itself, but otherwise, most teams should use something like OpenShift and not re-invent the wheel.

[1]: https://github.com/openshift/origin

merb · on March 13, 2018

well the problem with openshift on redhat/centos/atomic/whatever is that you still need to regulary update your cluster os's. however when you setup a bootkubed/self-hosted kubeadm cluster you can basically run on coreos and only need to update your self-hosted control plane and kubelet/(etcd, if not self hosted, too) this makes it way, way easier to have an up-to-date os + kube than any other solution (kubelet and etcd can be updated via etcd-wrapper and kubelet-wrapper, so this is even more insane).

BUT i think that red hat pushes for an easier upgrade path anyway, that's why they basically bought coreos. for CoreUpdate + Tectonic + MatchBox, so I think that in the end Redhat will have a similar simplier upgrade mechanism, but only time will tell. What would in fact be awesome if CoreUpdate could also update Kubernetes Kubelet's and Etcd and than it would be a no-ops os if something broke you just boot a new system and delete the broken node.

And yes kubectl is the ssh/ftp of k8s. https://www.youtube.com/watch?v=07jq-5VbBVQ

snuxoll · on March 13, 2018

You already have an easy upgrade path with Atomic: atomic host upgrade - that's all it takes.

As far as upgrading OpenShift itself, you already wrote the Ansible inventory to deploy it (unless you're a masochist and deployed it manually) - just run the upgrade playbook against your existing inventory to upgrade to the latest point or major release.

I run OpenShift Origin 3.7 on CentOS Atomic at work, in production, it's basically been set and forget - I have an ansible playbook I run once a month to do OS upgrades (have to drain pods, upgrade atomic, reboot and then allow pods back on the node and do it across all of them in sequence) and then I just run the upgrade playbook periodically as new point releases come out.

merb · on March 13, 2018

But it is still a "manual" operation, or at least a cronjobable. however with CoreUpdate it's basically automatic. No manual intervention - at least for the os part (as of now).

zwischenzug · on March 13, 2018

I'm a masochist who helps maintain this :)

https://github.com/IshentRas/cookbook-openshift3

more seriously, we couldn't use Ansible because ssh keys are verboten where I work, and Chef is the strategic CM tool.

philips · on March 13, 2018

Would this work? https://github.com/philips/ansible-kubernetes-daemonset

oblio · on March 13, 2018

Isn't there also Rancher in this space? https://rancher.com/rancher2-0/

tirumaraiselvan · on March 13, 2018

This is supposed to be very light weight and independent of any distribution or vendor.

tango12 · on March 13, 2018

(I'm from hasura.io) In fact, this is the component that powers a portion of the "git-ops" on the hasura.io platform.

Like one of the commenters said, we also do some pre-push magic right now (esp. for client-side kubernetes templating) for a more complete and opionionated CI/CD workflow.

eikenberry · on March 13, 2018

Isn't OpenShift for when you want to host your own Kubernetes? Most K8s users will be using an externally hosted solution, so tools that can build on any K8s are much more versatile and valuable.

dev1n · on March 13, 2018

Google released a tool to help with continuous deployments called skaffold [1]. Haven't used it myself (just using bitbucket pipelines) but looks like it's a relevant tool to this. Nice.

[1]: https://github.com/GoogleCloudPlatform/skaffold

paulsmal · on March 13, 2018

I really like when deployment tools rely on git hooks and doesn't require any other installations in dev environment.

truesy · on March 13, 2018

Close to opening up a service that does this for pull request branches, from GitHub to AWS ECS/Fargate.

https://conjure.sh

Glad to see this kind of workflow gaining momentum.

dlor · on March 13, 2018

I wonder how this tool handles performing the docker build itself. It looks like it runs a controller inside the cluster, but by default apps running inside kubernetes don't get access to the docker daemon.

There are a few existing workarounds, but none of them are terribly appealing.

See this longstanding issue for some more context: https://github.com/kubernetes/kubernetes/issues/1806

tirumaraiselvan · on March 13, 2018

Docker sock is being mounted from the host [1]. This could be a concern if you are running multi-tenant apps etc. One (slightly hacky) workaround is to have a Persistent Volume mount the hostPath and restrict hostPath mounting in the pod through PSPs.

[1] https://github.com/hasura/gitkube/blob/7d0e7f25b811d48089b78...

lewq · on March 13, 2018

Looks a lot like http://github.com/weaveworks/flux which we're happy users of at dotmesh.com.

Glad to see gitops getting some traction! https://www.weave.works/blog/gitops-operations-by-pull-reque...

pmarreck · on March 13, 2018

I deploy via git-push to Travis CI which then runs my test suites and if they pass, continues on with the deployment to the (virtual) hardware server via https://www.gigalixir.com/ (it's an Elixir Phoenix web app).

A properly set up CI is awesome and a huge timesaver.

linkmotif · on March 13, 2018

I use https://github.com/dminkovsky/kube-cloud-build for this, which uses Google Cloud Container Builder to perform builds. It’s a really nice workflow!

pcnix · on March 13, 2018

This certainly does seem to be a fairly easy way to deploy. I'm assuming you'd integrate standard ci/cd tools in via webhooks? Or even pre-push scripts?

tirumaraiselvan · on March 13, 2018

Yes, that is definitely in the roadmap and should be simple to do. Checkout this issue https://github.com/hasura/gitkube/issues/20

zackify · on March 13, 2018

I’ve been using drone.io to do this for a month now. Loving it:

- git push starts the process - build and deploy images to google cloud with the plugins/gcr tool - set the image in kubernetes in the next step using the google cloud sdk image

I run unit tests as a docker build step. If they fail, it won’t deploy. If the readiness probe fails after setting it in kubernetes, it won’t switch over!

lifeeth_ · on March 13, 2018

I was thinking of hasura.io while reading the page only to find this is a hasura project - Keep up the great work

tirumaraiselvan · on March 13, 2018

Haha :) This is the first of a few projects open-sourced out of the Hasura platform

minieggs · on March 13, 2018

Interesting, and real nice! I built something similar for my raspberry pi cluster with docker swarm (albeit no testing and tailored to my use case).

Kicking myself for not making "production ready" and posting here now.

bonesss · on March 13, 2018

Wow! This is highly relevant to my troubles this morning, and solves a highly relevant pain-point. I would not be surprised to find something like this mainlined into k8s over time... Nicely done :)

_5meq · on March 13, 2018

Keel does a really good job of this, good to see people are starting to have options with their k8 deployments!

My two cents:

So far my deployments have been pretty painless, so changing over to something like Gitkube / keel isn't my top priority currently.

Configuration management / Creating my kubernetes resources ( deployments / services / ingress / etc. ) in a repeatable and elegant way is my biggest pain point by far.

# Helm

I've been very unhappy with Helm.

Helm is insanely verbose. Go ahead and take a look at the kubernetes/charts repo and look at the various stable charts. Note the huge amount of copy-pasta code hanging out.

Too much of my life has been spent wrangling YAML templates that are rendered using the under-featured gotmpl library.

Also, helm's client library support is less than stellar. Also, schlepping my configurations around is a pain.

In Helm, it's a crazy amount of code to implement a `ingress -> service -> deployment` pattern that is standard for 99% of kubernetes resources. You have to write the same things over and over, which means that making changes takes forever and is brittle.

I believe this is due to gotmpl being a poor templating engine for this use case. It doesn't have enough features to allow one to develop decent abstractions over declaring k8 resources

The helm tiller doesn't expose a restful API or proper tooling to allow other resources to interact with it effectively, making automated deployments a chore. My only option is to call out to the shell and try and be devensive.

Finally, Helm doesn't really do that great a job of validating my resources are going to be valid. `helm install --dry-run` will tell you everything is great, and then break in the middle of an actual installation, leaving a half-configured mess in it's wake.

# KSonnet

KSonnet looks like an attractive option, it's certainly less verbose. But the documentation isn't there yet, and if you peek under the hood there is a MOUNTAIN of ksonnet code waiting to be read and understood. I belive it's going through some churn currently, so features and impovements have been slow to appear.

Also KSonnet doesn't have enough example implementations to hit the ground running when trying it out on my clusters.

Targeting individual contexts in Ksonnet provides some nice additional safety, and some very simple implementations using google KMS or something similar could be a really special way to safely store my secret configs at rest, similar to ansible-vault.

# Terraform

Terraform has this NAILED in the VM space. I declare my environment: `terraform plan` tells me what is going to happen, `terraform apply` checks to ensure I've not protected any of my resources with a flag, and applies the changes it listed if not.

I'd love to write my k8 resources in terraform, but it doesn't currently support modern k8 resources, and I don't LOVE how terraform handles storing my tfstate files either way.

voidfunc · on March 13, 2018

Yea I think I would love if Terraform supported modern Kubernetes concepts. The plan-apply model is IMO the best. I don't really grok why people don't think Terraform should be used this way as I've seen in other comments here and outside of HN. At the end of the day I fail to see how managing Kubernetes resources with Terraform would be any different than AWS or GCP or whathaveyou resources.

philsnow · on March 13, 2018

> The plan-apply model is IMO the best.

What would be even better: a begin-commit model where you tell AWS (or $PROVIDER) that you're beginning a transactional change, you issue a bunch of API commands, and along the way the provider is modeling what that would cause the deployment to look like, and it's also checking subsequent actions' invariants / constraints against the "running total" of all the changes (the model) instead of the snapshot of how the deployment was before the "begin".

I ended up having to create a "staging environment" at $WORK that's just another AWS account. The deploy flow for this one terraform project is:

1. run 'terraform plan' against production; this shows you syntax errors and _some_ semantic errors.

2. run 'terraform apply' against the staging AWS account; this will shake out more semantic errors, like an ec2 security group having too many rules. this step helps prevent you partially applying whatever change you're working on to production, since the production deploy would have failed when it gets to the ec2 security group change.

3. finally, run 'terraform apply' on production. I've yet to find an error that made it to this step that didn't get caught in one of the previous two steps.

None of this would be necessary if AWS had some way of modeling deployments and checking invariants/conditions against models instead of checking them in situ.

edit: something like:

  aws change create-changeset
  aws ec2 authorize-security-group-ingress --changeset-id c-12345abc ...
  aws ... --changeset-id c-12345abc
  aws change commit-changeset --changeset-id c-12345abc

antoncohen · on March 13, 2018

> 3. finally, run 'terraform apply' on production. I've yet to find an error that made it to this step that didn't get caught in one of the previous two steps.

Oh, I've seen a few...

1. Account limits that are different between accounts, e.g., number of ELB, ASGs, EIPs, etc. allowed.

2. Access that is different between accounts. The oddest one was a bug in Packer that caused an AMI to get built with an "AWS Marketplace" version of Ubuntu. To use an AMI based off a Marketplace AMI you need to "buy" the Marketplace AMI per account (even though it is free). Someone had clicked the button in the staging account to "buy" the AMI, but then it fails to deploy to productions.

3. Random other failures. Like hitting naming length limits because "prod" is longer than "qa". Or like API rate limiting slowing down deploys in one account, causing them to timeout and the TF apply to fail, even thought plan passed and the apply worked in staging. This is can then land you in an odd state where re-running apply will destroy all of a resource, instead of updating it (taking down the whole service). Which then leads you to automate `terraform taint`...

TF just wasn't designed to do app deployments.

philsnow · on March 14, 2018

> Account limits that are different between accounts, e.g., number of ELB, ASGs, EIPs, etc. allowed.

ah yeah, I had forgotten that I had to go through our TAM to get the staging account set up with all the same limits so that it was an accurate reflection.

> "prod" is longer than "qa"

Ugh, painful.

> TF just wasn't designed to do app deployments

agreed, I've gotten more and more interested in cloudformation over TF (for AWS-only deployments) over the last two years of using TF.

antoncohen · on March 13, 2018

Terraform is pretty awful for Continuous Delivery. I inherited a deployment system were new app versions were deployed with Terraform, and I did quite a lot of automation to make it not totally suck from a workflow perspective, not it's still not the right tool.

Devs should be able to merge a PR, and have it safely rollout to production (e.g., canary), possibly with a staging step and manual promotion if desired. They should have a simple ability to rollback to a previous version.

Terraform typically would involve editing TF files to update a version, committing that TF change, opening and merge a PR for the change. The running plan, reading the output of plan, then applying the plan. If you have staging and stable environments, repeat the steps. If you want to rollback, repeat the steps.

And it gets oh so much more complex. TF binary versions need to be globally consistent, because the version is stored in the state file. You basically need to only run TF from the CI/CD system, people can't be allowed to run apply locally because they might use a newer version of TF which would then break CI/CD which has the older version. So then you wrap TF commands with a Makefile that checks TF version, or you pin the TF version within the TF code.

How many resources are you managing? Hundreds? Thousands? How many apps? Are they all deployed using the same state file?

How long does it take to run a plan and apply? 10 minutes? 20 minutes? Same for a rollback...

What if Bob want to deploy version 13 of app 'foo', but when he runs plan it says it will also update app 'bar' to version 27. But Bob doesn't know anything about bar, his team only works on foo.

So you break up the TF files so every app has its own state file. Cool. But now you need to upgrade TF itself. And it is a major upgrade. Now you need to update the TF version pinned in the source files, and update the build servers to have the new TF, and run a TF command to upgrade on every state file.

Then you start adding symlinks into the TF source tree, because TF doesn't have any sane way to reuse code. But there are slight changes that need to be made for each app, and TF doesn't allow variables in some parts of the code, and modules are OK be still too verbose, and you still end up with places you can't use variables. So no you start using templates to write you TF code, so you can reuse code, because 99% is the same for each app.

Ugh.

Terraform is great for defining infrastructure that rarely changes, and is managed by an infrastructure team. When you are trying to use it for app deployments, which happen multiple times a day, and are handles by app devs, it is a royal PITA.

heavenlyhash · on March 13, 2018

I'm not of a dissenting opinion on most of these experience reports, but

> TF binary versions need to be globally consistent,

honestly, shouldn't that kind of be your baseline goal for all tools that are on multiple hosts and taking part of your CI or development processes?

antoncohen · on March 13, 2018

Yeah, sorry if I wasn't clear. I meant that the CI build servers do have a consistent version installed. But it is pretty hard to control what devs install on their laptops, and it is a little hard to write TF code without running `terraform plan` to test it, so a lot of devs may end up with TF installed. If they run `terraform apply` (and have access to write state) it could mess up the state file version.

Of course that is solved by using `terraform { required_version = "= x.x.x" }`, but that plays into my point about having tons of state files, not being able to use variables, editing tons of files, and updating tons of states.

_5meq · on March 13, 2018

Oh man I FORGOT about how bad code sharing is in terraform! Great point!

chrissoundz · on March 13, 2018

Why not just `git push`, `docker build`, `kubectl set image`?

lincolnq · on March 13, 2018

Lots of reasons:

- A local Docker build might have issues due to your local computer's environment that cause mysterious or hard-to-replicate problems.

- You might not have permission to push new images to your organization's Docker repo from your computer.

- You might be on slow internet, or your docker images large enough, such that pushing on the cloud is 10-100x faster than pushing locally. (this is my pain point)

- You need to remember, and use, whatever Docker image tagging scheme your organization requires.

arianvanp · on March 13, 2018

> A local Docker build might have issues due to your local computer's environment that cause mysterious or hard-to-replicate problems.

This always comes and bites us back isn't it? The whole promise of Docker was that we would be freed form these pains... But yet here we stand :(

Wondering what kind of issues you're running into. The first one that comes to mind for me is Kernel version mismatches. Though those shouldn't always cause that much trouble due to the "Dont break userspace mantra" that Linus has

Promarged · on March 13, 2018

> This always comes and bites us back isn't it? The whole promise of Docker was that we would be freed form these pains... But yet here we stand :(

Exactly. Using a separate build host makes builds more independent but on the other hand you'll be running images that the dev did not check.

By the way, are docker image identifiers derived from the actual contents? (Are they reproducible?)

This would suggest they are not:

> Docker images are non-reproducible: each "layer" identifier is a random hex string (and not cryptographic hash of the layer content),

Source: https://blog.bazel.build/2015/07/28/docker_build.html

dlor · on March 13, 2018

That's a bit out of date - today layer IDs are for the most part sha256's of the contents of the layer tar.gz.

However, it's still very difficult to generate the same layer ID twice. Timestamps permeate the layer contents itself in the form of file mtimes. The final image metadata itself also contains several timestamps.

Docker has a local cache of layers that helps simulate reproducibility. But if you clear that cache or use a different build machine, you will have a very hard time ever generating the same layer ID again.

tirumaraiselvan · on March 13, 2018

Because why not just `git push` :)

ecthiender · on March 13, 2018

I'm not the author but I would imagine - this allows more room for automation. You should be able to setup a CI/CD system around it with webhooks from your git repo.

asadm · on March 13, 2018

This looks exactly what I had been looking for these days. None of existing solutions fit my use case. Thanks for making this, I will give it a try soon.

vijaykodam · on March 13, 2018

Good work from Team Hasura! Also liked the HDFS on k8s tutorial posted sometime back.

deepaksahoo · on March 13, 2018

This is awesome! Thanks for open-sourcing :)

vegancap · on March 13, 2018

This is a beauty! Excellent work.

anirudhmurali · on March 13, 2018

Sexy stuff!

dominotw · on March 13, 2018

Hyping ur own stuff on HN?