I found that a 'refreshing' read - reveling in the old-school sysadmin wizardry,...

Nextgrid · on July 28, 2022

I can't imagine managing lots of such systems, but a handful sounds doable, which is all you need sometimes.

I'm currently speccing out a system for an internal application that processes huge amounts of data. So far the plan is to just use standard Postgres on Debian on a huge "pet" server with hundreds of GBs of RAM and redundant 4TB NVME drives and call it a day. It's sized for peak load so no need for any kind of scaling beyond that, and it's a single machine using well-tested software (and default configs whenever possible) so maintenance should be minimal (it'll also be isolated onto its own network and only ever accessed by trusted users so the need for timely updates is minimal too).

bayindirh · on July 28, 2022

It's doable, believe me. Linux was always reliable, but its reliability improvements didn't stop over the years, so keeping a lot servers up to date, and running smoothly is easier than 15 years ago.

We have a lot of pets and cattle servers. Cattles can be installed in 150+ batches in 15-20 minutes or so, with zero intervention after initial triggering.

Pets are rarely re-installed. We generally upgrade them over generations, and they don't consume much time after initial configuration which are well documented.

I prefer to manage some of them via Salt, but it's rather an exercise for understanding how these things work, rather than a must.

In today's world, hardware is less resilient than the Linux installation running on top of it, so if you are going to build such monolith, spec it to be as redundant and hot-swappable as much as you can. Otherwise murphy may find ways to break your hardware in creative ways.

metadat · on July 28, 2022

Do Debian package versions still lag far behind the current public stable version (i.e. by years)?

Postgres gets solid updates seemingly bi-annually or at least annually.

Wowfunhappy · on July 28, 2022

They should only be years behind given very unlucky timing, ie, upstream releases a new version right after the stable feature freeze.

Whatever the latest version is at the time of Debian's feature freeze, that will be the version for the life of that Debian release. That's basically the point of Debian—the world will never change out from under you.

ac29 · on July 28, 2022

> They should only be years behind given very unlucky timing, ie, upstream releases a new version right after the stable feature freeze.

Literally the first package I looked up is shipping a January 2020 version in bullseye despite freezes not starting until 2021. And yes, there were additional stable releases in 2020.

Wowfunhappy · on July 28, 2022

What package? There's probably a story there.

ac29 · on July 28, 2022

mpv, a reasonably popular media player.

sigio · on July 28, 2022

And with debian's release cycle being a lot shorter these days, most packages should be more then new enough.

If needed you can get kernel-packages and stuff like browsers from backports or some 3rd party repositories tailored to bullseye.

riku_iki · on July 28, 2022

Postgres provides apt repo with fresh versions, which you can add to your debian.

diffeomorphism · on July 28, 2022

Stable is released every 2 years, so <=2 at most, and yes on purpose? Isn't that kinda the whole point of releases like Windows LTSC, Red Hat, etc.? That you actively do not want these updates, only security fixes?

anecdotal1 · on July 29, 2022

Backporting security fixes is forking the software though.

There have been instances where upstream and Debian frozen version have drifted far enough apart that the security backport was done incorrectly and introduced a new CVE. Off the top of my head this happens for Apache more than once.

I for one appreciate the BSD "OS and packages are separate" so my software can be updated but my OS is stable

diffeomorphism · on July 30, 2022

For apache I never heard about that. Instead the issue I heard about was that debian organises/manages apache quite differently, nothing about version drift.

> BSD

No thanks, I want my software to be stable too.

baq · on July 28, 2022

Put a replacement PSU, a few fans and a small box of replacement drives in the same order.

edit: Make that 2 PSUs just to be sure...

Nextgrid · on July 29, 2022

This will be hosted by Hetzner, OVH or a suitable equivalent, so the “SLA” is based on assuming that they’ll rectify any hardware failures within 2 days. I this case I’ll gamble on backups with the idea that in the worst case scenario it takes us less that an hour or rebuild the machine on a different provider such as AWS.

The machine itself is only really required for a few days every quarter, with a few days’ worth of leeway if we fail. Therefore I feel this is a acceptable risk.

briffle · on July 28, 2022

that sounds like a fun project, but you definately want to still automate its setup with something like ansible, saltstack, puppet, etc.

Because someday, you'll get a new pet, with much more CPU power you'll want to migrate to. Or maybe rather than upgrade to a newer version, plus reconfigure disks, etc, its just easier to move to a new system, etc. Or the system just plain dies, DC burns down, etc, and you need to quickly use DR to get it setup on a new system. Having all those configs, settings, application's you install, etc, defined in a tool like ansible, and then checked into git is just about priceless especially for pets or snowflakes.

Nextgrid · on July 29, 2022

I agree, learning Ansible (or equivalent) is on my todo list.

I the meantime, a document (and maybe a shell script) with commands explaining how to reinstall the machine from an environment provided by the hosting provider (a PXE-booted Debian) is enough considering the machine is only critically required for a few days every quarter and needs only softwares that’s already packaged by the distribution.

jrumbut · on July 28, 2022

I was amazed how little work was required, then I thought about what "exim4 has a new taint system" might have looked like if you tried the upgrade not knowing what exim4 was.

Real *nix expertise is spooky to watch sometimes!

bayindirh · on July 28, 2022

When you manage servers for this long, all knowledge starts to compound fast. Many scary looking messages transform to, "Oh, you need X, Y, Z? OK. Let's do it".

tryauuum · on July 28, 2022

The "livestock vs pets" comparison seems off. It's assumed with the livestock you can lose one server and don't care much – though with real animal livestock if your cow gets ill you don't kill it and order more healthy cows.

And the comparison also assumes you cannot kill the "pet" server. I have many pet servers with carefully chosen names, but I still can painlessly kill them and redeploy with the same name because I have Ansible or SaltStack code to do so

bayindirh · on July 28, 2022

The term originates from CERN mostly, which does HPC stuff in its data center. We are also an HPC center, and it's very fitting.

The cattle servers are generally HPC worker nodes. Your users don't notice when a cattle server goes offline. Scheduler reschedules the lost jobs with high priority, so they restart/resume soon.

But, pet servers on the other hand are generally coordinators of the cattle, like your shepherd dogs, keeping them in order, or giving them orders. Losing them is really creates bigger problems, and you need to tend them quick, even if you have failovers, etc. (which we certainly have).

You can re-deploy a pet server pretty quickly, but they generally have an uptime of ~2 years, and reinstallation periods of 5-6 years, if ever. We upgrade them as much as we can.

d110af5ccf · on July 28, 2022

> I still can painlessly kill them and redeploy with the same name because I have Ansible or SaltStack code to do so

those don't sound like pets to me ...

tetha · on July 28, 2022

> The "livestock vs pets" comparison seems off. It's assumed with the livestock you can lose one server and don't care much – though with real animal livestock if your cow gets ill you don't kill it and order more healthy cows.

Honestly, in most environments, it's like that. You don't delete a postgres server because a component crashed weirdly. You take a look at that component and see if there is a deeper reason for that crash and if there is a more important root cause to fix. That would prevent issues on a lot of other systems.

However, it's important to have the option to delete and rebuild the server. For example, we had a root drive corruption cause by some storage issues at the hoster on the server and binaries would crash in weird ways. At that point, I probably could fix the server by syncing binaries from other systems and such, but it's much easier to just drop it and rebuild it.

And that's very much how larger groups of animals are handled.

> And the comparison also assumes you cannot kill the "pet" server. I have many pet servers with carefully chosen names, but I still can painlessly kill them and redeploy with the same name because I have Ansible or SaltStack code to do so

Those don't sound like pets. For historic reasons, I have systems on which external agencies and consultancies have done things outside of the configuration management I don't know. And given the house of cards piled up on some of the systems, I don't think anyone knows how to redo that system. That's a pet. Once I delete that system, it never comes back the same way.

_abox · on July 28, 2022

I still manage mine as 'pets'. I have no intention of becoming a farmer :)

But I don't do anything with cloud at work.

bayindirh · on July 28, 2022

You don't become a farmer. Cattle "arrives" if your task calls for it. :)

sdoering · on July 28, 2022

Sorry to derail, but I really am irked by this cattle VS. pets analogy.

I know that a lot of meat is industrially produced with little regard for the animal well-being (livestock). And I know quite a lot farmers growing their herds organically and naming every single one. Quite a few farmers told me they would never eat meat if they didn't know the name of the animal it came from. They know the character of every single animal in their herds.

So to me this analogy only works as long as we disregard the fact that these animals have unique characters. And that imho counters the analogy.

I prefer bots VS. pets to differentiate the two sides.

d110af5ccf · on July 28, 2022

Unique character has nothing to do with it. They may have names but they're still livestock - raised en masse to be sold. Pets don't just have names, they're akin to family members. Livestock don't hang out on the couch with you while you watch TV.

Production systems are very much like livestock - spun up to serve a purpose. Getting overly attached as you might with a personal pet system is probably a mistake that will reduce your efficiency.

bayindirh · on July 28, 2022

> So to me this analogy only works as long as we disregard the fact that these animals have unique characters. And that imho counters the analogy.

No, no... You are not mistaken. Every server in your "cattle" fleet has its own character after some point. Some of them eat through disks, one of them has a wonky Ethernet not quite broken, other one is always a little slower than the rest.

On a more serious note, I really understand what you're saying, but I'd rather not discuss it here, but the above paragraph really holds.

dade_ · on July 28, 2022

But the farmer still kill the cow just the same for slaughter, and another one replaces it. Please meet Sally2, I say to my 10 year old pet dog. Note, the farmer doesn’t have any bots and even the tractor is a pet.

Analogy is fine and I prefer to get these points of view on Portlandia.