No, part of managing a heard is having the right tools in place. Like monitoring, logging, and observability tools.
There is nothing I can learn from accessing a VM in production that I can't learn from my monitoring system.
In prod where I work, if someone logs into a production VM we mark it tainted and replace it with a fresh instance. This keeps things nice and consistent.
Of you need an interactive session on a prod machine you are missing tools.
> There is nothing I can learn from accessing a VM in production that I can't learn from my monitoring system.
Other than how to fix gaps and other problems with your monitoring? As you get experience, you’ll learn this is done like a garden — you can heavily reduce your need for interactive sessions but it never goes to zero.
I think you’re making the classic mistake of treating a guideline as more of a religious mandate. Yes, it’s good to have servers be easily replaced but that desire does not magically rewrite all existing software or retrain every IT worker.
Similarly, automation is great but you need to develop and maintain it - which almost always involves interactive work. The taint process you mentioned is a popular way to balance those needs long-term.
Finally, if you are thinking of “server” as only a production-hardened network service you’re missing out on a lot of other things enterprises use cloud services for, such as developer workstations or general virtual desktops. Many places heavily expanded that over the last year because you avoid the security concerns about having your data on easily lost/stolen laptops and can avoid turning your VPN into a massive bottleneck for the entire company.
> In prod where I work, if someone logs into a production VM we mark it tainted and replace it with a fresh instance. This keeps things nice and consistent.
doesn't make sense from an ROI perspective, at a great number of businesses. Like, "this would take a decade to pay off, and that's assuming it requires no maintenance" kind of bad ROI.
Lots of places, you script vm/server configs (even just with bash) and get CI running automated tests on important branches, and you've captured 99% of the benefit available from automation. Would the other stuff be nice? Yes, but five people saving 15 minutes per week means you can't reasonably spend the kind of time on it—for initial set-up and for ongoing maintenance—that you would if it were fifty people saving 15 minutes per week, let alone 500 (at that point you can have a couple people dedicated full-time to just that one piece of automation, and it's still saving you money).
Plugging all our cattle into a heart- and bloodpressure monitor and doing frequent blood draws from every cow "just in case" is wasteful and unnecessary. There is a balance between sensible general always-available monitoring and special-case-debugging a problem.
My rule for that is: more than once a year or more than 6h? Automate and tool it. Less? SSH or other special-case tools are fine.
0. http://cloudscaling.com/blog/cloud-computing/the-history-of-...