I found that a 'refreshing' read - reveling in the old-school sysadmin wizardry, after so many years of livestock-over-pets thinking. This server sounds like the ultimate pet.
It's just nostalgia though, I can't imagine managing systems that way (even at the less skilled level I had) any more.
But at the same time it won't stop me admiring those skills - it's like watching a traditional craftsman using old tools.
I can't imagine managing lots of such systems, but a handful sounds doable, which is all you need sometimes.
I'm currently speccing out a system for an internal application that processes huge amounts of data. So far the plan is to just use standard Postgres on Debian on a huge "pet" server with hundreds of GBs of RAM and redundant 4TB NVME drives and call it a day. It's sized for peak load so no need for any kind of scaling beyond that, and it's a single machine using well-tested software (and default configs whenever possible) so maintenance should be minimal (it'll also be isolated onto its own network and only ever accessed by trusted users so the need for timely updates is minimal too).
It's doable, believe me. Linux was always reliable, but its reliability improvements didn't stop over the years, so keeping a lot servers up to date, and running smoothly is easier than 15 years ago.
We have a lot of pets and cattle servers. Cattles can be installed in 150+ batches in 15-20 minutes or so, with zero intervention after initial triggering.
Pets are rarely re-installed. We generally upgrade them over generations, and they don't consume much time after initial configuration which are well documented.
I prefer to manage some of them via Salt, but it's rather an exercise for understanding how these things work, rather than a must.
In today's world, hardware is less resilient than the Linux installation running on top of it, so if you are going to build such monolith, spec it to be as redundant and hot-swappable as much as you can. Otherwise murphy may find ways to break your hardware in creative ways.
They should only be years behind given very unlucky timing, ie, upstream releases a new version right after the stable feature freeze.
Whatever the latest version is at the time of Debian's feature freeze, that will be the version for the life of that Debian release. That's basically the point of Debian—the world will never change out from under you.
> They should only be years behind given very unlucky timing, ie, upstream releases a new version right after the stable feature freeze.
Literally the first package I looked up is shipping a January 2020 version in bullseye despite freezes not starting until 2021. And yes, there were additional stable releases in 2020.
Stable is released every 2 years, so <=2 at most, and yes on purpose? Isn't that kinda the whole point of releases like Windows LTSC, Red Hat, etc.? That you actively do not want these updates, only security fixes?
Backporting security fixes is forking the software though.
There have been instances where upstream and Debian frozen version have drifted far enough apart that the security backport was done incorrectly and introduced a new CVE. Off the top of my head this happens for Apache more than once.
I for one appreciate the BSD "OS and packages are separate" so my software can be updated but my OS is stable
For apache I never heard about that. Instead the issue I heard about was that debian organises/manages apache quite differently, nothing about version drift.
This will be hosted by Hetzner, OVH or a suitable equivalent, so the “SLA” is based on assuming that they’ll rectify any hardware failures within 2 days. I this case I’ll gamble on backups with the idea that in the worst case scenario it takes us less that an hour or rebuild the machine on a different provider such as AWS.
The machine itself is only really required for a few days every quarter, with a few days’ worth of leeway if we fail. Therefore I feel this is a acceptable risk.
that sounds like a fun project, but you definately want to still automate its setup with something like ansible, saltstack, puppet, etc.
Because someday, you'll get a new pet, with much more CPU power you'll want to migrate to. Or maybe rather than upgrade to a newer version, plus reconfigure disks, etc, its just easier to move to a new system, etc. Or the system just plain dies, DC burns down, etc, and you need to quickly use DR to get it setup on a new system. Having all those configs, settings, application's you install, etc, defined in a tool like ansible, and then checked into git is just about priceless especially for pets or snowflakes.
I agree, learning Ansible (or equivalent) is on my todo list.
I the meantime, a document (and maybe a shell script) with commands explaining how to reinstall the machine from an environment provided by the hosting provider (a PXE-booted Debian) is enough considering the machine is only critically required for a few days every quarter and needs only softwares that’s already packaged by the distribution.
I was amazed how little work was required, then I thought about what "exim4 has a new taint system" might have looked like if you tried the upgrade not knowing what exim4 was.
When you manage servers for this long, all knowledge starts to compound fast. Many scary looking messages transform to, "Oh, you need X, Y, Z? OK. Let's do it".
The "livestock vs pets" comparison seems off. It's assumed with the livestock you can lose one server and don't care much – though with real animal livestock if your cow gets ill you don't kill it and order more healthy cows.
And the comparison also assumes you cannot kill the "pet" server. I have many pet servers with carefully chosen names, but I still can painlessly kill them and redeploy with the same name because I have Ansible or SaltStack code to do so
The term originates from CERN mostly, which does HPC stuff in its data center. We are also an HPC center, and it's very fitting.
The cattle servers are generally HPC worker nodes. Your users don't notice when a cattle server goes offline. Scheduler reschedules the lost jobs with high priority, so they restart/resume soon.
But, pet servers on the other hand are generally coordinators of the cattle, like your shepherd dogs, keeping them in order, or giving them orders. Losing them is really creates bigger problems, and you need to tend them quick, even if you have failovers, etc. (which we certainly have).
You can re-deploy a pet server pretty quickly, but they generally have an uptime of ~2 years, and reinstallation periods of 5-6 years, if ever. We upgrade them as much as we can.
> The "livestock vs pets" comparison seems off. It's assumed with the livestock you can lose one server and don't care much – though with real animal livestock if your cow gets ill you don't kill it and order more healthy cows.
Honestly, in most environments, it's like that. You don't delete a postgres server because a component crashed weirdly. You take a look at that component and see if there is a deeper reason for that crash and if there is a more important root cause to fix. That would prevent issues on a lot of other systems.
However, it's important to have the option to delete and rebuild the server. For example, we had a root drive corruption cause by some storage issues at the hoster on the server and binaries would crash in weird ways. At that point, I probably could fix the server by syncing binaries from other systems and such, but it's much easier to just drop it and rebuild it.
And that's very much how larger groups of animals are handled.
> And the comparison also assumes you cannot kill the "pet" server. I have many pet servers with carefully chosen names, but I still can painlessly kill them and redeploy with the same name because I have Ansible or SaltStack code to do so
Those don't sound like pets. For historic reasons, I have systems on which external agencies and consultancies have done things outside of the configuration management I don't know. And given the house of cards piled up on some of the systems, I don't think anyone knows how to redo that system. That's a pet. Once I delete that system, it never comes back the same way.
Sorry to derail, but I really am irked by this cattle VS. pets analogy.
I know that a lot of meat is industrially produced with little regard for the animal well-being (livestock). And I know quite a lot farmers growing their herds organically and naming every single one. Quite a few farmers told me they would never eat meat if they didn't know the name of the animal it came from. They know the character of every single animal in their herds.
So to me this analogy only works as long as we disregard the fact that these animals have unique characters. And that imho counters the analogy.
I prefer bots VS. pets to differentiate the two sides.
Unique character has nothing to do with it. They may have names but they're still livestock - raised en masse to be sold. Pets don't just have names, they're akin to family members. Livestock don't hang out on the couch with you while you watch TV.
Production systems are very much like livestock - spun up to serve a purpose. Getting overly attached as you might with a personal pet system is probably a mistake that will reduce your efficiency.
> So to me this analogy only works as long as we disregard the fact that these animals have unique characters. And that imho counters the analogy.
No, no... You are not mistaken. Every server in your "cattle" fleet has its own character after some point. Some of them eat through disks, one of them has a wonky Ethernet not quite broken, other one is always a little slower than the rest.
On a more serious note, I really understand what you're saying, but I'd rather not discuss it here, but the above paragraph really holds.
But the farmer still kill the cow just the same for slaughter, and another one replaces it. Please meet Sally2, I say to my 10 year old pet dog. Note, the farmer doesn’t have any bots and even the tractor is a pet.
Analogy is fine and I prefer to get these points of view on Portlandia.
It's just nostalgia though, I can't imagine managing systems that way (even at the less skilled level I had) any more.
But at the same time it won't stop me admiring those skills - it's like watching a traditional craftsman using old tools.