Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't understand how you would do your job as a developer without understanding the infrastructure it runs on. I agree that it can make sense to have dedicated people do all the infrastructure setup/management/etc, but when you have an application running in production there are a lot of considerations which can't be cleanly separated from underlying infrastructure. Not to mention troubleshooting production issues. When something is not working in prod, the first thing I do is check basic operational stuff with the underlying deployment. Are all the pods still running? Have there been any restarts? If there is some DNS/network error how can I spin up a pod in the cluster to check on various things?


With an Ops team, developers aren’t expected to operate their code. That’s the ops team’s problem. And the ops team is measured on uptime, which is a function of the code itself, which they can’t actually change—devs own that. What the ops team can do is to slow down the rate of deployments (another input to downtime/uptime). Rather than many small deployments, they’ll have larger deployments once or twice a quarter (at best).

So a desire to ship features regularly and preserve agility and quality is the “trendy” that the GP is talking about.


Regardless of how often you ship, things still break sometimes though right? And you still need to find out why when they do. Often the issue is some interaction between application and infrastructure which requires knowledge of both to understand. Long before k8s was a thing and I worked in an environment like you describe above I still knew how the infrastructure worked even if I personally wasn't allowed to touch it.


> Regardless of how often you ship, things still break sometimes though right? And you still need to find out why when they do.

The point is that under the traditional model, ops is responsible for the debugging, and they are typically already familiar with the infrastructure. Of course, things in organizations are rarely neatly isolated like this, so certainly developers would help with the debugging in many other, and having infra expertise will help.


> When something is not working in prod, the first thing I do is check basic operational stuff with the underlying deployment. Are all the pods still running? Have there been any restarts? If there is some DNS/network error how can I spin up a pod in the cluster to check on various things?

And how much less downtime would you have if domain experts were doing that part?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: