You aren't a real K8s admin until your self-managed cluster crashes hard and you have to spend 3 days trying to recover/rebuild it. Just dealing with the certs once they start expiring is a nightmare.
To avoid chicken-and-egg, your critical services (Drone, Vault, Bind) need to live outside of K8s in something stupid simple, like an ASG or a hot/cold EC2 pair.
I've mostly come to think of K8s as a development tool. It makes it quick and easy for devs to mock up a software architecture and run it anywhere, compared to trying to adopt a single cloud vendor's SaaS tools, and giving devs all the Cloud access needed to control it. Give them access to a semi-locked-down K8s cluster instead and they can build pretty much whatever they need without asking anyone for anything.
For production, it's kind of crap, but usable. It doesn't have any of the operational intelligence you'd want a resilient production system to have, doesn't have real version control, isn't immutable, and makes it very hard to identify and fix problems. A production alternative to K8s should be much more stripped-down, like Fargate, with more useful operational features, and other aspects handled by external projects.
It's kind of the modus operandi of Kubernetes since inception. The core model is okay, but ops was always a barely constructed afterthought. And the network stack (kube-proxy) was literally a summer of code project.
I'm thinking a lot of that was by design - both Redhat and Google had incentives to get you onto their value-add to get an actual production ready system.
It also created an entire cottage industry, although much of this has faded as everyone moved to purely managed solutions. Because anything else is absolutely insane.
In the bad old days of self-managing some servers with a few libvirt VMs and such, I’d have considered a 3-day outage such a shockingly bad outcome that I’d have totally reconsidered what I was doing.
And k8s is supposed to make that situation better, but these multi-day outage stories are… common? Why are we adding all this complexity and cost if the result is consumer-PC-tower-in-a-closet-with-no-IAC uptime (or worse)?
I've been running Kubernetes in production for two years and have never experienced anything remotely close to this. The worst is a node dies every now and then and, on a rare occasion, a workload doesn't happily migrate.
Of course, my experience is in no way authoritative, but referencing this type of incident as common is pretty foreign to me and may be mostly relegated to self-managed clusters.
GKE since 2017 here. Healthcare. I think we had one major outage that involved the cluster itself. It resolved itself and we never discovered what caused it. That was in the early days, so I recall very little.
Now I'm using Fly.io. They both have their advantages. Folks tend to make kubernetes sound way more difficult than it is. It can be overkill but it can also solve so many challenges out of the box. At least when it's managed. It'll cost you though.
> may be mostly relegated to self-managed clusters.
Foreign to me too, but not surprising people report issues as common. there are a lot of footguns in kubernetes that come from a lack of understanding.
You can build a robust kubernetes cluster that hosts an application that’s nearly impossible to bring offline without an act of god, it just takes some know-how and a tiny bit of effort/experience.
> And k8s is supposed to make that situation better, but these multi-day outage stories are… common? Why are we adding all this complexity and cost if the result is consumer-PC-tower-in-a-closet-with-no-IAC uptime (or worse)?
I'm honestly convinced it's half CV-driven development, and half just the fact that it's become the standard workaround for Python dependency hell. Python is still the easiest way to write software, and it's still basically impossible to make an application that works reliably on more than one machine because of how Python dependency management works (or rather doesn't), so you have to use Docker, and apparently Kubernetes is the standard way you deploy Docker containers.
Anecdata, but in my experience, it's been podman for new deployments. Plenty of old stuff on Docker though. It's easier to grow out of Podman and into k8s than it is to go from compose, to swarm, then k8s. Easier to get buy-in for the ease of Docker from ops, easier to get leadership buy-in on the security of Podman. Such is life.
There are good things about dotnet (I'm more of a Scala person these days, but I have plenty of respect for F#), but there's nothing in there that lets you get up and running remotely as quickly as Python. (I mean, you don't even get a REPL without doing some messing around)
> and it's still basically impossible to make an application that works reliably on more than one machine because of how Python dependency management works (or rather doesn't)
I remember we were running 500 solaris zones and 5000 vmware VMs over 2 datacenters with 0 major outages over 4 years. I remember a (single) VM crashing and it was a really big deal (turned out it was a config issue, in retrospect a funny one although our (internal) client lost some data). And I remember we were in "crisis mode" for a couple weeks because of SAN storage issues but there was no client interruption of more than 1 minute over those 2 weeks. One of our client was running our app in a cross-datacenter cluster on bare metal with no interruption for over 20 years.
I'm not advocating for any of those specific solutions and given the choice I would probably use something else, but when I see that my previous CTO wanted kube for single-VM deployments, and a former architect collegue wanted kube for apps that were going to be used by 3 to 5 clients maximum (and in both cases to be run by very small and untrained teams), I think the kool aid has been more than drunk, and I'm now avoiding it like the plague.
Complexity and cost aren't bad when they help produce something of value that we wouldn't have otherwise.
For $150 I can fly round trip from New York to San Francisco, on a massively costly and complex giant noisy metal tube with two blades sticking out the sides that are so strong you could put a tank on each one and the blades still wouldn't droop. Why does it have to be so costly and complex, if I could do something simpler, like take a bus? Well, mostly to keep me from dying. But also to carry lots of luggage, keep costs down, and get me there 15x faster.
K8s does provide great value (as a dev tool), but lacks value in production features, and its design is shit. So I wouldn't say complexity and cost are the downside; it's the lacking production value that's the downside.
Personally, I’m a big fan for QA review sites. Deploy multiple low traffic full site clones to a cluster and spin them up and down as needed. Manual review, automated scans, etc. It’s great for that use case IMO.
In production I always want dedicated resources though.
Honestly in this day and age rolling your own k8s cluster is negligent. I've worked at multiple companies using EKS, AKS, GKE, and we haven't had 10% of the issues I see people complaining about.
I've picked my fair share of outages on managed k8s solutions. The difference there is once it's hosed, your fate is 100% in the hands of cloud support and well... good luck with that one. The cloud apologists in this thread will ofc try to shame you for not buying into their marketing
if your fate is in the hands of one of the cloud gods, what right does anyone have to blame you for what transpires?
mere mortals are not privy to all of the internal downstream impacts from that public-facing service outage. it would be like shouting into the void and expecting an answer, and, more, liking it.
no, it is easier to recognize one’s place, pay the tithes, and enjoy one god’s blessings and curses alike. do not stray and attempt to please two, it will only end in misery. (three is right out.)
Once your team has upgrades down, everything is pretty rote. This submission (Urbit, lol) seemed particularly incompetent at managing cert rotation.
The other capital lesson here? Have backups. The team couldnt restore a bunch of their services effectively, cause they didn't have the manifests. Sure, a managed provider may have less disruptions/avoid some fuckups, but the whole point of Kubernetes is Promise Theory, is Desired State Mamagememt. If you can re-state your asks, put the manifests back, most shit should just work again, easy as that. The team had seemingly no operational system so their whole cluster was a vast special pet. They fucked up. Don't do that.
The Root CA certificate, etcd certificate, and API server certificate
expired, which caused the cluster to stop working and prevented our
management of it. The support to resolve this, at that time, in kube-aws
was limited. We brought in an expert, but in the end, we had to rebuild the
entire cluster from scratch.
I can't even imagine how I could explain any of my customers such an outage.
I am somehow puzzled on how this goes into a downtime as implied in the article.
The control plane can be fully down - as in: you can shut it down - and everything continues to run. I've been in that situation multiple times with large clusters. E.g. one etcd node having disk issues, pretty much turning it into a lame service (even worse than true down). Kubelets got randomly regarded as non-healthy due to update latency. But everything continued to run. Another time API servers were leaking memory, and with that crashing, causing some herding that would crash each server as it comes up. No issues whatsoever, migrate to larger instances.
It's a pretty cool feature of kubernetes.
I am wondering what was done to let this cascade like this. The only thing I could imagine is that someone _wiped_ the etcd state, then brought it up, casing all things to go down.
It goes into downtime because the pods churn their containers, nodes come and go, and attempting to "remediate" in ways that cause deployment churn then cause services to go down without them being able to come back up. Same for any internal k8s component that relies on a certificate. It may "stay up" for a bit, but the cluster is still broken, and it gets increasingly more brokener. It's like trying to fix a flat tire on a truck that is dangling over a cliff.
Just in last couple of years I can recall DataDog being down for most of the day and Roblox took something like 72h outage. If huge public companies managed, you probably can too. I'd argue that unless real monetary damage was done it's actually worse for the customer to experience many small-scale outages than a very occasional big outage.
Well the industry analysts and consultants who develop metrics have decided that multiple outages is the way to go as it keeps people on toes more often. And management likes busy people as they are earning their keep.
If most infra I worked on was a single region one, sure. :) DR is so much easier in the cloud. You can have ECS scale to 0 in the DR site and when us-east-1 goes down just move the traffic there. We did that with amazon.com before AWS even existed. With AWS it became easier. There are still some challenges, like having a replica of the main SQL db if you run a traditional stack for example.
The challenge with these editorials is that you can never really capture the opportunity cost from the author - they didn't build the monolith without the complex infrastructure as an alternative in parallel.
I wonder how much 'sunk cost' and other psychological factors play into the statement:
> We started with Kubernetes a bit too early,
It feels like a "it wasn't that bad". But if you consider the dedicated resources, costs of pain and suffering that may have been avoided with a simpler architecture and infrastructure, I'm forced to wonder if this short comment hides a lot under the covers.
Engineers can have very high pain thresholds sometimes.
It also sounds like they have several staff just dedicated to managing their Kubernetes cluster, even though they're using a managed Kubernetes service (AKS) for the last few years.
Wonder if they're including those engineers in the cost calculation?
I also didn't understand what they were building. Is it a simple database-backed website or something truly complicated? How many users do they have? What's the scale of operations? Why wouldn't a handful of dedicated hosts in a datacenter have worked?
I felt like I got on the k8s bandwagon relatively late, due to my natural distrust of new things, in this case containers.
So I started setting up k8s clusters on-prem at work in 2019.
It's 3.5 years later and my takeaway is that k8s should only be used after multiple resource planning meetings have established that it is absolutely necessary, or you're scaling up an existing application.
The mistake we made was to first of all take orders from a lead developer who thought that service mesh was the solve all. And secondly to not estimate the load our system would generate, the resources it would require and how to best manage those.
In retrospect we now have 3 clusters (dev, staging, prod) for a service that could be hosted on 2 servers, with a job manager in the cloud.
And IaC is no excuse because you can achieve the same IaC with container hosts and quadlets set to auto-update.
Completely agree. Kubernetes makes some problems easier to solve, but it's rare that anyone asks "do we actually need to solve these problems". It's like buying something you don't need because it's on sale.
I've found a happy middle ground to be Hashicorp's nomad. It's a single static go binary you can run so you can define some of your setup in a repeatable way as well as provides things like rolling updates, task monitoring, scheduled jobs, etc. And it's not limited to only running containers, but can run executables directly on the host, VMs with qemu, etc.
If I'm running a single server, I usually get a lot of value out of throwing nomad on it.
Ha, some time this century for sure. To me it's not 'unsophisticated' exactly, but it's definitely a certain sort of person - it's the 'Hi team - just sharing some learnings - please do reach out if you have any questions' sort of corporate speak.
Yeah I love those meetings where we have to double-click on a pain point and unpack what's going on so that we can do the work and stop treading water.
Not to discredit your experience, but I'm a native English speaker and I've never had the perception that it's unsophisticated. I think they can have a very slightly different connotation from one another, but in a lot of usage I think they're interchangeable.
It's corporate-speak. There are all sorts of these things.
Lessons/learnings
Requests/asks
Solutions/solves
Agreement/alignments
It definitely sounds weird if you don't spend a lot of time in that world. It's like they replace the actual noun forms with an oddly cased verb form, i.e., nominalization.
"Aligned" is probably one of my least favorite pieces of corporate lingo to become popular. Makes me imagine a D&D alignment chart and all the black and white thinking that comes with that.
And lessons is academy/state school-speak. Can't stand the word. Take your lessons home Ms Teacher, this is a place of business.
Edit: funny how this is down voted but the corporate speak comment not :) we use the word learnings at home. It means what you have learned yourself, as opposed to getting a lesson about it.
This is why people have adopted corp speak. They're traumatized by words and need to replace them to help with their ptsd. In 10 years "learnings" will be replaced by something like "considerations" or "updates" when zoomer managers get to set the rules.
I’m a native English speaker too, and my immediate reaction when hearing another native speaker say “learnings” is to think they’re an idiot. I know they might be a non-idiot who just happens to talk that way so I try herd not to judge.
Still, the bottom line is that making nouns of verbs for words for which more commonly used nouns already exist makes a poor impression on many speakers.
I’m a native English speaker and don’t use the phrase, but I’ve always thought that a lesson is something taught, but a learning is something learned. The former does not always imply the latter.
Thanks for your feedback on the title of my article. English is not my first language, and in my native tongue, the distinction between “learnings” and “lessons” isn’t as pronounced in this type of context. I appreciate the nuanced perspective and will probably update my title . My main goal is to share the experiences we’ve gathered over the years, and I hope that the essence of our journey with Kubernetes shines through, regardless of the terminology.
As others said, I'm not sure it's quite unsophisticated but you're not too far off. It's a specific jargon that comes from people I might perhaps consider unsophisticated if I'm in a bad mood, but more likely they're just happy to be politically correct in a fairly harmless manner.
I feel like it's used because "lessons" may imply judgment to some people?
I feel like the word unambiguously describes exactly what it is, which is all I can really ask for from a word.
"Lesson" by itself might connote a more concrete transmission of knowledge (like a school lesson). Which is a meaningful distinction if the goal of the article is merely to muse about lessons they've learned rather than imply that this is a lesson from the writers to the audience. "Lesson learned" could imply the same thing, but is longer to say ¯\_(ツ)_/¯
I get what the comments here are saying about it sounding corporate, but I think this is a unique situation where this word actually makes sense.
Those are really American though.. like "co-worker", that isn't a word which was used in England. We'd use "colleague". It came from American English as part of the corporate lingo.
This particular case may not be a good example, but Brits tend to forget that they actually invented some of the words that they blame Americans for. Soccer is a perfect example, though I think the "Lost in the Pond" YouTube channel has a video or three with several more.
I’d say your likelihood of being misread or of giving a poor impression by using the word increases with the distance to the nearest person (other than you) holding an MBA.
It's not that. I'm American and I've never heard anyone use "learnings" instead of "lessons". It has to be coming from a specific subculture, though I have no idea where.
It's very much just bro corporate speak, if I heard someone use "learnings" instead of "lessons" irl they would definitely fall into the slot for a specific type of person in my head. Very LinkedIn.
The DX of k8s is deceptively simple. You might be forgiven for believing that you have got your own private Heroku in your own backyard which you totally own and control.
But - Oh boy! The complexity over complexity of moving parts that themselves are a moving target sometimes.
First, there's PKI which you should know all about certificates, signing, expiring, issuing, reissuing or no part of the cluster talks to the other. If you think you can get away with that, the post above had two outages both related to certs.
Next is the etcd - you should know how to configure that. A totally different separate product in its own right, a distributed key store that has whole memory of the system. Like what's where and such.
Then you have whole DNS running. That again is a whole separate product in its own right whose administration you must master or else.
And then comes Networking, the CNI plugin and their internals and if you think you can skip that part, either you have to pay likes of weaveworks (defunct) or Cillium etc or be ready for an incident.
And yet I have not talked about ingress controllers, cloud controllers, their configurations and other issues.
To top that all, you need to manage all that configuration and package it so now you need Helm and Flux - templated (Helm uses Go templates, if not worst out there) and layered (kustomised) YAML upon YAMl, thousands of lines and that's not just all, hold on!
All that configuration language is constantly a moving target from Helm Charts to k8s manifests. Sometimes totally incompatible (like flux to flux2 was almost not so portable) and such so upgrades are going to be so much painful, you just can't imagine even if you're on a managed k8s platform.
I say this from my own experience of setting up k8s self managed from scratch across different clouds. I have my scars.
Certs expiring is a common occurrence and a source of many RCAs. Not keeping definitions of your configuration separate from running servers (and no baseline) is a big issue. Not keeping secrets in a secret storage and syncing them is another red flag.
Thing is, none of these are kubernetes issues. They’re poor practices, these aren’t lessons of running kubernetes they’re poor management of a system.
Very similar story for my team, incl. the 2x cert expiry cluster disasters early on requiring a rebuild. We migrated from Kubespray to kOPs (with almost no deviations from a default install) and it's been quite smooth for 4 or 5 years now.
I traded ELK for Clickhouse & we use Fluentbit to relay logs, mostly created by our homegrown opentelemetry-like lib. We still use Helm, Quay & Drone.
Software architecture is mostly stateless replicas of ~12x mini services with a primary monolith. DBs etc sit off cluster. Full cluster rebuild and switchover takes about 60min-90min, we do it about 1-2x a year and have 3 developers in a team of 5 that can do it (thanks to good documentation, automation and keeping our use simple).
We have a single cloud dev environment, local dev is just running the parts of the system you need to affect.
Some tradeoffs and yes burned time to get there, but it's great.
My question when looking at Kubernetes for small teams is always the same. Why?
In the blog, there are multiple days of downtime, a complete cluster rebuild, a description of how individual experts have to be crowned as the technology is too complex to jump in and out of in any real production environment, handling versioning of helm and k8s, a description of managing the underlying scripts to rebuild for disaster (I'm assuming there's a data persistence/backup step here that goes unmentioned!), and on, and on and on.
When you're already using cloud primitives, why not use your existing expertise there, their serverless offerings, and learn the IaC tooling of choice for that provider?
Yes it will be more expensive on your cloud bell. But when you measure the TCO, is it really?
My experience with Kubernetes has been mostly bad. I always see an explosion of complexity and there is something that needs fixing all the time. The knowledge required comes on top of the existing stack.
Maybe I'm biased and just have the wrong kind of projects, but so far everything I encountered could be built with a simple tech stack on virtual or native hardware. A reverse proxy/webserver, some frontend library/framework, a backend, database, maybe some queues/logs/caching solutions on any server Linux distribution. Maintenance is minimal, dirt cheap, no vendor lock-in and easy to teach. Is everyone building the next Amazon/Netflix/Goole and needs to scale to infinity? I feel there is such a huge amount of software and companies that will never require or benefit from Kubernetes.
Company CTOs in my experience get sold very easily the idea of infinite scalability. In practice not many companies reach that point, but many that go down this road have to build on top of dozens of layers of compute/networking abstractions that only few experts on the team can manage, if any, competently.
I think the cost of self-managed Linux VMs and monoliths is smaller than the cloud vendors made it seem.
Containers are nice when you have to deal with a language like Python and it's packaging ecosystem, but when Go/Rust/.Net/etc binaries are placed in containers as well... I think sight of what we're trying to solve in real life has been kind of lost.
Monoliths are so much easier for smaller teams. No additional tooling needed, no service discovery, instead of networks calls you have function calls, can share resources, etc. Much less overhead as well, so you may not even need to scale. The amount of requests a single Go/Rust server can handle on a dedicated machine is insanely high with modern hardware.
Same exact question I ask every single time. We just decided against k8s, again, in 2024. We are going to go with AWS ECS and Azure Container Apps (the infra has to exist in both clouds).
ECS and Container Apps provides all the benefits of k8s without the cons. What we want is a to be able to execute container (Docker) images with autoscaling and control which group of instances can talk to each other. What we do not want to do:
- learn all of the error modes of k8s
- learn all the network modes of k8s
- learn the tooling of k8s (and the pitfalls)
- learn how to embed yaml into yaml the right way (I have seen some of the tools are doing this)
- do upgrades of k8s and figuring out what has changed the way that is backward incompatible
- learn how to manage certificates for k8s the right way
Do people try to push it that strongly for small teams? Lots of us work on bigger teams and enjoy more of the benefits.
However, I also still use Kubernetes for my personal projects, because I really appreciate the level of abstraction it supplies. Everyone always points out that you can do all the things k8s does in other ways, but what I like about it defines a common way to do everything. I don't care that there are 50 ways to do it, I just like having one way.
What this allows is for tools to seamlessly work together. It is trivial to have all sorts of cool functionality with minimal configuration.
> because I really appreciate the level of abstraction it supplies
which are?
I am seriously asking. I use docker-compose of some of the things I do but it never occured to me during my 20 years in systems engineering that k8s offers any kind of great abstraction. For small systems it is easy to use docker (for example running a database for testing). For larger projects there are so many aternatives to k8s that are better, including the major cloud vendor offerings that I have really a hard time justifying even to consider k8s. After years of carnage that they left, seeing failures after failures, even customers reaching out to me in panic to help them because there are timeouts or other issues that nobody can resolve after selling them the idea that k8s has "great level of abstraction" and putting it to production.
> I don't care that there are 50 ways to do it, I just like having one way.
>> because I really appreciate the level of abstraction it supplies
> which are?
When I am creating a new service/application, I just need to define in my resource what I need... listening ports, persistent storage, CPU, memory, ingress, etc... then I am free to change how those are provided without having to change the app. If a new, better, storage provider comes along, I can switch it out without changing anything on my app.
At my work, we have on premise clusters as well as cloud clusters, and I can move my workloads between them seamlessly. In the cloud, we use EBS backed volumes, but my app doesn't need to care. On the on-prem clusters, we use longhorn, but again my app doesn't care. In AWS, we use the ELB as our ingress, but my app doesn't care... on prem, I use metallb, but my app doesn't care.
I just specify that I need a cert and a URL, and each cluster is set up to update DNS and get me a cert. I don't have to worry about DNS or certs expiring. When I deploy my app to a different cluster, that all gets updated automatically.
I also get monitoring for free. Prometheus knows how to discover my services and gather metrics, no matter where I deploy. For log processing, when a new tool comes out, I can plug it in with a few lines of configuration.
The kubernetes resource model provides a standard way to define my stuff. Other services know how to read that resource model and interact with it. If I need something different, I can create my own CRD and controller.
I am able to run a database using a cluster controller with my on prem cluster without having to manage individual nodes. Anyone who has run a database cluster manually knows hardware maintenance or failure is a whole thing... with controllers and k8s nodes, I just need to use node drain and my controller will know how to move the cluster members to different nodes. I can update and upgrade the hardware without having to do anything special. Hardware patching is way easier.
The k8s model forces you to specify how your service should handle node failure, and nodes coming in or out are built into the model from the beginning. It forces you to think about horizontal scaling, failover, and maintenance from the beginning, and gives a standard way for it to work. When you do a node drain, every single app deployed to the cluster knows what to do, and the maintainer doesn't have to think about it.
>> I don't care that there are 50 ways to do it, I just like having one way.
> Seeing everything as a nail...
I don't think that is a fair comparison, because you can create CRDs if your model doesn't fit any existing resource. However, even when you create a CRD, it is still a standard resource that hooks into all of the k8s lifecycle management, and you become part of that ecosystem.
These exists without k8s. I do not need a compex abstraction hiding the ways I need to talk to persistent storage. If fact, I believe it is impossible to create such abstraction without very serious compromises.
> In AWS, we use the ELB as our ingress, but my app doesn't care
Your app does not care without k8s. Running python -m http.server does not even know what ELB is. I get it though. You are using k8s as IaC.
That was exactly my earlier point... of course you can do everything in k8s in other ways, but in the end you have to pick ONE way your company/team is going to do it... why not pick a well defined way, that new hires can already know, that has a ton of tooling available, and works together cohesively?
Yes, I can build each part myself, but why?
> Your app does not care without k8s. Running python -m http.server does not even know what ELB is. I get it though. You are using k8s as IaC.
Sure, but I still need a way to deploy my app, and to move it to a different location when I do hardware maintenance, and a way to get a DNS address that routes to my app.
At my shop, using k8s, I can deploy a brand new service, with a cert, a url, and a place to run it, in a few minutes. I don't have to talk to anyone, I don't have to use any other tools or have to click on any buttons, i just helm install or kubectl apply and my service is running. I don't have to ask the datacenter ops people to find me a server, or get budget for a new AWS instance. I can deploy to an existing cluster and use a small bit of the infrastructure. I don't have to scale my individual service, I can scale the whole cluster for all services.
It is just so much easier to be a developer in this world.
> I do not need a compex abstraction hiding the ways I need to talk to persistent storage. If fact, I believe it is impossible to create such abstraction without very serious compromises.
That's pretty interesting take considering EBS is itself a block device abstraction over network attached storage and pretty complex at that too with a huge price premium
Yes. Whenever I look at a company with less than 20 people with EKS in their stack, I don't go any further. It is such a colossal waste of velocity for a small business or early startup.
As someone who is very pro cloud -- one of my worst experiences working at a cloud provider was a push from on high to sell our customers on a 'cloud modernization initative' that centered on managed kubernetes. At the time, most of my customers were struggling with creating a stateless app, much less horizontal scaling and managing an enterprise-grade compute abstraction layer.
I think K8S is a great tool with a dedicated team and a platform built around it to meet the way that your company ships infrastructure. But what I've just mentioned only makes sense fiscally in the high X00's count or more of engineers.
> Do people try to push it that strongly for small teams?
Yes. You have to understand that a lot of people without the benefit of experience will often base their technology choices on blog posts. K8S has a lot of mindshare and blog attention, so it gets seen as the only way to run a container in a production environment, while all the important aspects of it are ignored.
I get that, but I just get frustrated in the same way I get frustrated with all the "you don't need it" responses to any topic... what about all of us that DO work for bigger companies and DO need to use this stuff? Where can we gather to talk about it without being constantly told we don't need the features?
They don’t read those blogs. And if they do, the decision makers have enough experience to know that “your dog blog doesn’t need k8s” doesn’t apply to their 100000 mau app
I am literally one of the decision makers at a larger company, with more than 10000 servers in hundreds of data centers around the world.
Yes, I am experienced and smart enough to know the statements that don't apply to me. My frustration is that I want to discuss the best tools and techniques the industry is exploring, but every time I start to have those conversations, someone comments that I don't need it.
You're in the wrong spaces. I don't know where you should be to have those conversations, but I imagine it involves (social / interpersonal) networking. You need to be talking to people in the same role or at the same level as you.
Places like hacker news, or reddit, or twitter, are all full of random people, many of whom are just beginning their journey. Recommending multi node orchestration when they'd struggle to get nginx running on it's own, would be inappropriate. They don't need k8s. There's a significant danger of cargo culting here.
I'm in the ML space and every small company I try to avoid EKS. Then I hate my life. Sagemaker, for example, is a giant abstracted away mess with random holes (ie: these types of jobs don't work on this GPU type, etc.) compared to just running things on EKS. The same goes to trying to deploy a more complex third party application. I could just deploy their Helm chart or I could spend a lot of time deploying it somehow in our environment.
I see it all the time at different layers of the stack. At some point some knowledge is lost due to people turnover and the solution is to change the technology, instead of paying somebody full time to re-understand it. Why not rewrite XXX part in YYY language as nobody understands XXX anymore ? Linux VMs require a good sysadmin with a taste in digging into existing scripts and playbooks. With Kubernetes we can start from scratch and we only needs a kubernetes expert ! (or so they say).
Right now I'm working a lot with an oversized maven configuration that nobody understands ; I'm paid only to dig into it and maybe refactor some parts. It's made way too complicated for the task and does a lot of non-standard stuff to work around problems it created itself. But when I arrived people were blaming jenkins and wanted to move to gitlab because jenkins was becoming too complicated to work around maven (also !). Next thing you know somebody could try kubernetes or moving to the cloud or switching from RedHat to NixOS or whatever, and the problem would still be maven.
I work for a US subsidiary of a very large oil company. We are migrating from Azure to AWS for many things (it is deemed "OneCloud"). A very large number of our new EC2 instances, and even our EKS instances, were provisioned within the last 6 months as T2 instances. Some, if we were lucky, were T3. T3 was released 10 years ago. Copy + paste indeed.
I would think it's more dependent on technology requirements more than the size of the team. If all you need is some variation of LAMP stack, then you'd probably be better off with a paas like render, fly or the like.
I think size of team matters as the impact of k8s ownership as a fraction of your development velocity changes immensely as you're able to afford a platform team who can build tooling to remove the cognitive load of deploying to and managing k8s. At an ~400 engineer company I worked at, k8s bugs that actually impacted our team were in the single digits over a year, but a large part of that was the platform team that managed the ecosystem around k8s deployments.
Especially considering that the author seems to be using some Azure specific features anyway:
> While being vendor-agnostic is a great idea, for us, it came with a high opportunity cost. After a while, we decided to go all-in on AKS-related Azure products, like the container registry, security scanning, auth, etc. For us, this resulted in an improved developer experience, simplified security ( centralized access management with Azure Entra Id), and more, which led to faster time-to-market and reduced costs (volume benefits).
We're starting to use k8s as a small team because the simpler offerings with GPUs available don't meet our needs. It's clear they're either built for someone else or are less reliable than an EKS cluster would be.
I'd encourage you to look at the problem space and evaluate if ECS or an external abstraction layer (like Ray) meets your needs.
I've seen both work in completely separate domains (e.g. inference on real time video streams vs. model building) -- but obviously ymmv, tech is a big domain and pretending I understand exactly what you're doing would be silly. Sometimes there is a real answer to the why!
Ah, well today I learned about ECS. I guess we’ll migrate to that once I need to add complexity to our EKS setup.
I’m new to this stuff, so it’s hard to dig through all of the possible different solutions.
I looked into Ray a bit but it seemed a little too complicated vs. just running a CUDA accelerated docker container. Most of the streamlined solutions in this space are not made for full stack web developers deploying a service that happens to need a GPU. They’re for ML devs who are trying to own the production side of their part of the product.
What I really miss in articles like this - and I understand why to some degree - what the actual numbers would be.
Admitting that you need at least two full-time engineers working on Kubernetes I wonder how that kind of investment pay’s itself back, especially because of all the added complexity.
I desperately would like to rebuild their environment on regular VMs, maybe not even containerized and understand what the infrastructure cost would have been. And what the maintenance burden would have been as compared to kubernetes.
Maybe it’s not about pure infrastructure cost but about the development-to-production pipeline. But still.
These is just so much context that seems relevant to understand if an investment in kubernetes is warranted or not.
Having worked in two separate teams/companies with both regular VMs (that includes AWS EC2) I think it comes to the same to the number of people needed.
I would even say that there is less work with Kubernetes, but maybe that's my preference. I don't even think that you need two full time engineers working on it constantly, or more to say, if you're working on it constantly you have bigger problems but not with k8s. Sure, you need people to own it, but the work is periodical (mostly for cluster upgrades which are too frequent IMO) and in non-toxic companies there is always good work to be done.
Sorry for being unclear on this. In our case, we needed a couple of engineers who, in addition to their regular duties, would devote their time to Kubernetes as the go-to experts whenever necessary. Some weeks there was nothing to do; other weeks, particularly during cluster updates, they needed to focus exclusively on that work.
Thanks for clarification. On one hand I’m happy about you sharing this experience. On the other hand, I still feel we can’t assess the true value as it would require disclosing important information that you’d likely not be allowed to share.
Let me know what you're interested in learning more about, and I'll see what I can share. If you're looking to dive deeper into specific details, we could discuss them further in a video call or similar.
I appreciate the offer to me personally but I was thinking about another public blogpost that goes into great specifics such as requests per second, cpu load, user base, but it's so app-specific that even then it's difficult to assess.
What I do notice - and I'm not saying this is true for you (I don't know) - is that people get excited about technology like kubernetes or something equivalent, but it creates an additional burden that is totally not proportional to the benefits of said technology.
Stupid load-balancing proxy servers with a bunch of hosts behind them is extremely uninteresting and boring. But it's dead simple, so reliable and easy to scale horizontal or vertical as well.
And most of all: how much do you save by dynamically upscaling and downscaling as opposed to just keeping a static environment.
If you want to share that kind of perspective, I'd rather see a public post about it that others may benefit from.
k8s is simply a set of bullet proof ideas to run production grade services forcing "hope is not a strategy" as much as possible, it standardises things like rollouts, rolling restarts, canary deployments, failover etc. You can replicate it with a zoo of loosely coupled products but a monolith which you can hire for with impeccable production record and industry certs will always be preferable to orgs. It's Googles way of fighting cloud vendor lockin' when they saw they're losing market share to AWS. Only large companies need it really, a small 5 person startup will do on Digital Ocean VPS just fine with some S3 for blob storage and CDN cache.
If anyone has any tips on keeping up with control plane upgrades, please share them. We're having trouble keeping up with EKS upgrades. But, I think it's self-inflicted and we've got a lot of work to remove the knives that keep us from moving faster.
Things on my team's todo list (aka: correct the sins that occurred before therealfiona was hired):
- Change manifest files over to Helm. (Managing thousands of lines of yaml sucks, don't do it, use Helm or similar that we have not discovered yet.)
- Setup Renovate to help keep Helm chart versions up to date.
- Continue improving our process because there was none as of 2 years ago.
One technique is to never upgrade clusters. Instead, create a new cluster, apply manifests, then point your DNS or load balancers to the new one.
That technique won't work with every kind of architecture, but it works with those that are designed with the "immutable infrastructure" approach in mind.
There's a good comment in this thread about not having your essential services like vault inside of kubernetes.
Any state that a container uses, such as databases or static assets, should be mapped to something outside k8s, no? I thought container orchestration was only for app later
I never understood gitops. You introduce a whole new class of problem- syncing desired state from one place to another
Kubernetes is a perfectly good place to keep your desired state. I think it would be in most people’s best interests to learn to maintain, failover, and recover Kubernetes clusters, so they can trust the API, rather than trying to use tools to orchestrate their orchestration
How do you deploy workloads to your clusters then? `kubectl apply -f`? Another form of CI/CD?
Assuming you have some sort of build pipeline that also pushes to your cluster, Flux does the same thing whilst ensuring whatever was pushed never diverges.
We either install a new version of a helm chart, or we roll back. we have rollback jobs to roll back, and our CI/CD pipelines or our maintenace jobs do the install of the new version, depending on whether it's our app or a dependency
It's not the EKS upgrade part that's a pain, it's the deprecated K8S resources that you mention. Layers of terraform, fluxcd, helm charts getting sifted through and upgraded before the EKS upgrade. You get all your clusters safely upgraded, and in the blink of an eye you have to do it all over again.
We address this by not using helm, and not using terraform for anything in the cluster. Kustomize doesn't do everything you'd want from a DRY perspective, but at least the output is pure YAML with no surprises.
We upgrade everything once a quarter. Usually takes about four hours per cluster. Occasionally we run into something that's deprecated and we lose another day, but not more than once a year.
Such a pity that helm makes this so awful. I suppose one could keep using it to package up complex deployments and tweak them with a values.yaml as long as you just use that to writeout kustomize and install that.
Go get a cluster manager like Rafay or Spectro cloud. There are a lot of footguns in cluster management: cert management, ingress controller, IaC (TF versioning is a pain), etc.
a cluster manager isn’t cheap but it sounds like you are getting buried. If you’re on 1.23 or up though, you at least have a year now to fix it.
In 2000s we were talking that `snowflake` servers are bad.
New generation is re-learning the same with k8s, which can be summarized as 'snowflake k8s clusters are bad'.
Fundamentally it's the same problem.
Even if control plane is hard down your kubelet wont be evicting anything so unless you need to surge or everything crashes you have some time to fix things. I’ve dealt with multi-hour cp outages and while stressful it didn’t have any customer visible impact whatsoever
I'm curious: what do you do for developer environments? Do you have a need to spin up a partial subgraph of microservices, and have them talk to each other while developing against a slice of the full stack?
> Do you have a need to spin up a partial subgraph of microservices, and have them talk to each other while developing against a slice of the full stack?
Yeah we do. We use k3s, and use kustomize to scale down or disable services from prod. minio replaces S3, postgres runs nicely in a container, environment variables for service discovery. Add in a liberal sprinkling of NodePort services to allow connections from web browsers or whatever.
Been using it for about three years now. Works great. Upgrading is easy, rebuilding is easy.
We even use it to run our one-box testing environment on an EC2 instance, rather than spinning up an actual EKS cluster. Also works great.
I love that we use the exact same tools to manage all environments -- yaml+kustomize for manifests, kubectl+k9s for ops -- so we get really used to using them.
Can’t speak for everyone but I have worked in this environment. It can work fine if you allocate a sub slice of CPU time (.1 CPU for example) and small amounts of (overcommitted) memory, and explicitly avoid using it for things that are more easily managed by cloud provider sub accounts and managed services. IE don’t force your devs to manage owncloud or a similar stand in for S3 - use something first party to stand in or S3 itself.
This doesn’t always work and the failure mode of committing to this can be doubling your hosting bill if it won’t run locally and densely packed small instances can’t handle your app.
I would recommend Tilt + kind clusters (via https://github.com/tilt-dev/ctlptl) - minimum headache setup by a large margin and runs well on linux and macs
I love tilt, but it feels like I'm doing something wrong when I write a whole new tilt deployment yaml.
I _think_ this space just hasn't matured enough to have settled on standard solutions. Between dev, CI, and staging/prod, it feels like you're defining variants of the same objects and topologies over and over. Some things try to take on dev+CI or CI+deploy (dev+deploy anyone?), but I haven't seen an answer for the whole thing that "feels right".
That’s YAML problem not Tilt. There have been attempts to throw more template yaml at the problem (kustomize, etc) I myself prefer Starlark-based DSL like this one: https://github.com/cruise-automation/isopod
We've got pretty much a significantly scaled down mirror of production as a developer enviroment. The k8s development cluster looks the same, has the same workloads, tools, services etc. The AWS infra is the same but also scaled down. Its easy to have your service running in both enviroments and rapidly iterate on that. Everyone is actually pretty happy with that and we've never been let down by Kubernetes itself in at least 4 years now.
It's worked great for us. Every developer runs a dev cluster on their own machines. Services like s3 are transparently replaced with mock versions. We have two builds that can be run, which really just determines which set of helm charts to deploy: the full stack or a lightweight one with just the bare necessities.
Nah, nothing that complex. The only AWS service we use that isn’t just their clone of an existing tool is S3, which we just use fakes3 for. Everything else is easily deployable in a cluster already, because it’s mostly standard systems like mysql, elasticsearch, redis, etc, that AWS has versions of, but that don’t require any special treatment. The ingresses, etc are obviously different, because it doesn’t rely on the aws loadbalancers in dev, but that’s all abstracted away and automatically handled anyways.
> The Root CA certificate, etcd certificate, and API server certificate expired, which caused the cluster to stop working and prevented our management of it.
I've run into this and learned my lesson/gained my battle scars, but it just seems like unnecessary pain. Would it have been so bad for k8s to use something simple for securing communications other than the full TLS stack, right from the beginning?
It's so cumbersome and so many people run into this footgun that also happens to be proper security practice.
A symmetric key setup is simple and if it was available as a fallback all this pain could be avoided. It's not as secure, and you have to be careful with nonces and things but I'll take some somewhat distant insecurity (if someone is already inside your network and reading your asymmetric secrets you have other problems) for the better ergonomics and lower likelihood of blowing off my own foot.
Rolling your own key management system is not to be taken lightly. I've done it, and you really, really only want to do it when you really know other systems won't work.
Yeah but this isn't rolling your own key management system. This is the stupid simple every machine/program has the same shared secret approach.
The difficulty is securing comms between components (assuming they can reach each other, just making sure that the payloads are secret) and making sure you don't leak secrets unintentionally (forgetting nonces) and all the other hard crypto things.
But, it's not impossible to make a reasonable to use fallback system that does this, just no one does because of fear of being mocked for not just accepting the pain and bad ergonomics of TLS.
Other systems do work, but they have the footguns mentioned in the article that everyone seems to hit.
I’m in the very unusual situation of being tasked to set up a self-sufficient, local development team for a significant national enterprise in a developing country. We don’t have AWS, Google or any other cloud service here, so getting something running locally, that they can deploy code to, is part of my job. I also want to ensure that my local team is learning about modern engineering environments. And there is a large mix of unrelated applications to build, so a monolith of some sort is out of the question; there will be a mix of applications and languages and different reliability requirements.
In a nutshell, I’m looking for a general way to provide compute and storage to future, modern, applications and engineers, while at the same time training them to manage this themselves. It’s a medium-long term thing. The scale is already there - one of our goals is to replace an application with millions of existing users.
Importantly, the company wants us to be self sufficient. So a RedHat contract to manage an OpenShift cluster won’t fly (although maybe openshift itself will?)
For the specific goals that we have, the broad features of Kubernetes fit the bill - in terms of our ability to launch a set of containers or features into a cluster, run CICD, run tests, provide storage, host long- and short lived applications, etc. But I’m worried about the complexity and durability of such a complex system in our environment - in the medium term, they need to be able to do this without me, that’s the whole point. This article hasn’t helped me feel better about k8s!
I personally avoided using k8s until the managed flavours came about, and I’m really concerned about the complexity of deploying this, but I think some kind of cluster management system is critical; I don’t want us to go back to manually installing software on individual machines (using either packaging or just plain docker). I want there to be a bunch of resources that we can consume or grow as we become more proficient.
I’ve previously used Nomad in production, which was much simpler than K8s, and I was wondering if this or something else might be a better choice? How hard is k8s to set up today? What is the risk of the kind of failures these guys hit, today?
Are there any other environments where I can manage a set of applications on a cluster of say 10 compute VMs? Any other suggestions?
Without knowing a lot about their systems, I suspect something like Oxide might be the best bet for us - but I doubt we have the budget for a machine like that. But any other thoughts or ideas would be welcome.
> I doubt we have the budget for a machine like that.
Before even thinking about budget,
> for a significant national enterprise in a developing country.
I suspect we just aren't ready to sell in your country, whatever it is, for very normal "gotta get the product certified through safety regulations" kinds of reasons. We will get there eventually.
buuuuut also,
> Are there any other environments where I can manage a set of applications on a cluster of say 10 compute VMs? Any other suggestions?
Oxide would give you those VMs, but if you want orchestration with them, you'd be running kubes or whatever else, yourself, on top of it. So I don't think our current product would give you exactly what you want anyway, or at least, you'd be in the same spot you are now with regards to the orchestration layer.
Hey Steve, thank you for this comment, I did wonder if Oxide systems did container orchestration; now I know :)
Totally get it re certification etc. There is probably some kind of bilateral standards arrangement with one of the neighbouring countries but I agree with your general thrust - we’re a long way from the point where that’s actually a consideration.
Have you checked out Proxmox? 16 large servers in a cluster config could possibly be powerful enough for your needs (Proxmox lets you cluster 16 servers in each cluster); if you need more, split up each part of the services into 16-server chunks.
I've really been liking what I've been seeing with the [Ubuntu micro cloud](https://canonical.com/microcloud) product. It's basically a well coordinated effort of a deployment on lxc/lxd cluster using a (micro) ceph and ovn implementations. I think I like it because it attacks the problem at a level I can understand. (Proxmox does lxc/lxd etc also and has already been mentioned).
Again this is really more of a vmms setup so you need to orchestrate on top. (So perhaps your nomad cluster can sit on this (assuming there is some customer/org infrastructure splitting)
I personally would love to setup a new infra with Oxide systems. But for 10 VMs anything will work. If you've looked into Oxide maybe you will like SmartOS to bootstrap a small infra. Otherwise Nomad, proxmox, LXC, even VMWare is fine. If I think I need something more "serious" (understand supported/scalable with some robust cloud-like api) I would look into cloudstack from apache, which seems a lot cleaner than openstack [Removing the part about VMWare as I saw the comment about self-sufficiency]
Thanks. I expect we will eventually have a rack of servers - currently they operate their applications from a couple of full and very busy blade chassis - but I’m aiming low for the time being since we’re just getting started.
I had a pretty bad run with docker swarm a few years ago - the network stack was flakey as hell, we had to manually restart nodes quite often, and it resulted in customer downtime on multiple occasions. So I probably wouldn’t go there, even though I liked it in theory.
Well Amazon CEO himself said, there is no shortcut to experience. I am sure gaining experience in developing infrastructure solution will give you respectable return in long term. Of course Cloud vendors will be happy to sell turnkey solutions to you though.
Yeah - I do agree, over the long term, that deploying this internally would be the best outcome for them, and would give them some great skills. But this article did put the Fear in me a little.
One of the main drivers of the project is to reduce our reliance on software vendors and move to open source solutions, so a hybrid/on-prem cloud vendor is probably not on the cards for us either.
> Also, did we even face the problems Kubernetes solves at that stage? One might argue that we could have initially gone with a sizable monolith and relied on that until scaling and other issues became painful, and then we made a move to Kubernetes (or something else).
> During our self-managed time on AWS, we experienced a massive cluster crash that resulted in the majority of our systems and products going down. The Root CA certificate, etcd certificate, and API server certificate expired, which caused the cluster to stop working and prevented our management of it. The support to resolve this, at that time, in kube-aws was limited. We brought in an expert, but in the end, we had to rebuild the entire cluster from scratch.
That's crazy, I've personally recovered 1.11-ish kops clusters from this exact fault and it's not that hard when you really understand how it works. Sounds like a case of bad "expert" advice.
25% of the first 4GB of memory,
20% of the next 4GB of memory (up to 8GB),
10% of the next 8GB of memory (up to 16GB),
6% of the next 112GB of memory (up to 128GB),
2% of any memory above 128GB
"AKS reserves an additional 2GB for system process in Windows nodes that are not part of the calculated memory."
Yes, I am an old boring fart that isn't cloud native. And this post is going to offend a lot of people. I am sorry for that, but I still believe this is an important point to make:
From my perspective, the root issue is that people are using the wrong tools for their projects, namely the wrong programming language.
The problem started when people began abusing scripting languages for something else than scripting.
Python was meant as a teaching language for kids. JavaScript was meant to for some gimmicks on Websites. .net was meant for UI applications. PHP stood for "Personal Home Page Tools".
But somehow people started using these tools for something completely different than what they were meant for.
Due to people abusing those languages to write server backends, it suddenly became a problem of those languages creating code that's 1000 to 75.000 times slower than native code.
That then brought the need for clusters, load-balancers, etc. And that the need for tools to manage those. Then they needed tools to manage those tools. And tools to manage the people who manage those tools. And every year I look at the industry, another layer of complexity is needed.
So, the author writes: "Our platform back then was 50% .Net and 50% Python" - and here is the ACTUALY learnings he should have taken away:
"Eight years ago, after our 'developers' had drafted our products in scripting languages, we hired a senior C/C++/Pascal/Rust programmer. That developer re-wrote our drafts in a clean way. He used a profiler and checked for memory leaks, and did some optimizations. Afterwards we bought three servers in two different data centers with two upstream providers each. Since then we had zero downtime, our servers are only consuming 800 Watts in total, our operating costs are minimal, and we are contributing towards having a greener planet. Looking at our company growth rate, those six servers will be good for another 8 years."
Think that's hyperbole? No, it's not. If your interpreted scripting language is 1000x slower than a compiled one, you'll simply need 1000x the resources. You are burning our planet and your money just because you weren't able to accept the fact that it's OK to use a scripting language for quick hacks and... scripting, but that it's the wrong tool for high workloads.
So, for your next project, please consult this checklist:
[ ] Is the project about teaching kids how to program?
[ ] Is the project about adding a blinking button to your website?
[ ] Is the project about running a desktop application on a Windows PC?
[ ] Is the project about doing your personal home page?
In case you have not checked any of the above boxes, you might want to consider having your code re-written in native code, avoiding 90% of your management layers and dependency hell.
And as a bonus, with the CO2 you have saved the planet you are able to buy another two SUVs! ;)
"Eight years ago, our engineers drafted our products in scripting languages in frameworks like Django or Rails, and we had a product up and running in a few months"
vs
"Eight years ago, our engineers decided to settle on Rust, and six years ago we went out of business, because it was much harder to write web services quickly that met what customers actually wanted"
This brings so much painful memories of an entire weekend I had to spend renewing cluster certificates one by one whilst being yelled at that we were not making any money.
Good thing our control planes are now managed. I learnt so much about kubeadm and the inner workings of k8s but I'm not sure I wanna go over it again
To avoid chicken-and-egg, your critical services (Drone, Vault, Bind) need to live outside of K8s in something stupid simple, like an ASG or a hot/cold EC2 pair.
I've mostly come to think of K8s as a development tool. It makes it quick and easy for devs to mock up a software architecture and run it anywhere, compared to trying to adopt a single cloud vendor's SaaS tools, and giving devs all the Cloud access needed to control it. Give them access to a semi-locked-down K8s cluster instead and they can build pretty much whatever they need without asking anyone for anything.
For production, it's kind of crap, but usable. It doesn't have any of the operational intelligence you'd want a resilient production system to have, doesn't have real version control, isn't immutable, and makes it very hard to identify and fix problems. A production alternative to K8s should be much more stripped-down, like Fargate, with more useful operational features, and other aspects handled by external projects.