Ok, let’s agree that K8s is overkill. What commodity tool should I be replacing ...

x86x87 · on Aug 1, 2022

How did people ever manage to do anything before k8s?

I'm going to make the claim that a load balancer + asg + basic networking setup is all you need in 95% of cases.

Learning how to package services and have them run anywhere? A lost art.

Cloud agnosticism is a red herring. No matter what you do today you are not cloud agnostic.

FridgeSeal · on Aug 1, 2022

> How did people ever manage to do anything before k8s?

Nobody ever said it wasn’t possible. Nobody is saying “you can’t run things the way you want”, the argument is “you keep saying K8s is bad, here’s some reasons why it’s maybe not”.

> I'm going to make the claim that a load balancer + asg + basic networking setup is all you need in 95% of cases.

If you can make it support the autoscaling Spark jobs, the jupyter-hub env the analysts use, and all our GRPC API’s, and have everything mTLS’d together, and have all of DNS+cert + routing magic with similar setup and maintenance effort, I’ll convert.

> Learning how to package services and have them run anywhere? A lost art.

The issue isn’t my teams code-if we were only running our own stuff, and we only used a single language, packaging would likely be straightforward. But it’s the fact that we run several other applications, of varying degrees of “packaged” and the fact that I can barely get the Python devs to use a virtualenv properly that makes this statement a bit too unreasonable in my experience. Containers may be messy, but at least I can run 3 different spark pipelines and 4 Python applications without needing to dig into the specifics of “how this particular app packages it’s dependencies” because that’s always an awful experience.

x86x87 · on Aug 1, 2022

> If you can make it support the autoscaling Spark jobs, the jupyter-hub env the analysts use, and all our GRPC API’s, and have everything mTLS’d together, and have all of DNS+cert + routing magic with similar setup and maintenance effort, I’ll convert.

Let's define what you're talking about here and I can show you the way.

What is the difficulty in autoscalling Spark jobs? What jupyter env do they use? You say grpc api. That's genericm what do those services really do? Sync request/response backed by a db? Async processing? What infra do they need to "work"

Where are you running your workloads? Cloud? Which one?

FridgeSeal · on Aug 1, 2022

> What is the difficulty in autoscalling Spark jobs?

I mean, running spark is an awful experience at the best of times, but let’s just go from there.

Spark drivers pull messages off Kafka, and scale executors up or down dynamically based off how much load/messages they have coming through. This means you need stable host names, ideally without manual intervention. The drivers and the executors should also use workload-specific roles - we use IRSA for this, and it works quite nicely. Multiple spark clusters will need to run, be oblivious to each other, and shouldn’t “tread on each other” so the provisioning topology should avoid scheduling containers from the same job (or competing jobs) onto the same node. Similarly, a given cluster should ideally be within an AZ to minimise latency, doesn’t matter which one, but it shouldn’t be hardcoded, because the scheduler (I.e. K8s) should be able to choose based on available compute. Some of the jobs load models, so they need an EBS attached as scratch space.

> What jupyter env do they use?

We use jupyterhub wired up to our identity provider. So an analyst logs on, a pod is scheduled somewhere in the cluster with available compute, and their unique EBS volume is automounted. If they go to lunch, or idle, state is saved, and the pod is scaled down and the EBS volume is auto-unattached.

> That's genericm what do those services really do? Sync request/response backed by a db? Async processing? What infra do they need to "work"

The API stuff is by far the easiest. Request response, backed by DB’s, the odd analytics tool and monitoring tool as well. Servers themselves autoscale horizontally, some of them use other services hosted within the same cluster, all east-west traffic within the cluster is secured with mTLS via linkerd, between that and our automatic metric collection, we get automatic collection of metrics, latencies, etc. like what you get with the AWS dashboards, but more detail (memory usage for one), automatic log collection too.

> Where are you running your workloads?

AWS, but minimal contact - S3 is basically a commodity API nowadays, the only tight integration is IRSA, which I believe GCP has a very similar version of. So most of this should work in any of the other clouds, with minimal alteration.

aledalgrande · on Aug 1, 2022

> How did people ever manage to do anything before k8s?

a lot of scripts crammed together that could be broken by any single dev

x86x87 · on Aug 1, 2022

I find it amazing how people take the absolute shittiest practice from the past and compare it to k8s and reach the conclusion k8s is better.

Have you ever done IaC where the whole copy of the infra came up on deployment and traffic was shifted between old and new? And developers could spin up their own stack with one Cloudformation or Terraform command? Have you used cloud primitives that solve most of the problems you are reinventing with k8s?

darkwater · on Aug 1, 2022

Yep, that's the typical straw-man from K8S zealots: compare static, even bare-metal setups with theK8S API "revolution", forgetting that the "clouds" (both public and private) already had APIs that covered most of it. Sure, there are some nice additions on top of that like "operators" that take advantage of the main loop you have already perpetually running in a k8s cluster, but the leap is not that astronomical as some says.

aledalgrande · on Aug 1, 2022

If you think those APIs don't need tons of scripts to be useful to a company, then either you do everything through the AWS console, or you are an anti-k8s zealot yourself.

darkwater · on Aug 1, 2022

There are many established solutions on top of those APIs, that let you manage them with code, DSLs, graphical UIs etc. (i.e. Terraform, Harness, Pulumi, Spinnaker etc)

aledalgrande · on Aug 1, 2022

In fact I am not arguing against TF, which I love. I don't know how that came up here, k8s is at a different level in the stack.

darkwater · on Aug 2, 2022

It came up here because with Terraform you can automate many parts of the cloud (VMs) provisioning and deployment.

aledalgrande · on Aug 1, 2022

I have done everything from deploying to Tomcat servers to modern k8s. I really don't like Cloudformation and ECS, arcane and poorly documented syntax. I like much more having a Helm chart for my apps. Have you ever tried to set up private DNS in ECS for services to talk? That doesn't look simple (or well documented) at all to me.

By the way, Terraform and k8s are not exclusive, I use them together.

vbezhenar · on Aug 1, 2022

I'm right now migrating from "anything before k8s" to k8s.

I can tell you how we managed to do anything.

We had few servers. They run docker-composes scattered all over people's home directories.

We had one nginx entrypoint. It's forbidden to restart it at work hours because it won't restart and there will be hours outage. You expected to restart it at weekend and then spend hours trying to start it.

Some docker images are built locally.

Backups are basically not worked before I came to this company. It was the first thing I fixed. Now we can restore our system if server dies. It'll take a week or two.

Kubernetes is going to be sane exit.

Yes, I know that it's possible to build sane environment. Problem is people don't bother to do that. They build it in the worse possible way. And with kubernetes, worst possible way is much better because kubernetes provides plenty of quality moving blocks which you don't need to reinvent. You don't need to reinvent nginx ingress with all the necessary scripts to regenerate its config, because you've got ingress-nginx. You don't need to invent certbot integration because you've got cert-manager (which is my personal favourite piece of k8s). You don't need to invent your docker-composes storage conventions, user permissions (777 everywhere, right), because k8s got you covered. All you need is to learn its conventions and follow them.

benreesman · on Aug 1, 2022

Right off the top of my head.........

AWS.

vbezhenar · on Aug 1, 2022

> * not locked to a specific cloud provider