To be fair, all of these are node-local, not distributed, which is what the pare...

mmt · on Aug 22, 2018

I think what you're describing may be a batch sceduling system, such as PBS or LFS.

Those seem to be rarer, since the legitimate use case for a distributed system (what used to be called "grid computing") were rare.

Nowadays, I assume everyone just uses Yarn on Hadoop, when they think scaling "up" ends at a mid-range 2U server. I don't know how good its actual time-of-day/calendar based scheduling is, though.

atombender · on Aug 22, 2018

I disagree. If you really look at the problem space, it turns out that "classic cron" is just a "batch scheduling system" that is poorly implemented.

For example: Want to run a backup every night? With cron, you run into several issues: A backup job could fail; how do you recover? A backup job could run (due to sudden latencies) for an unexpectedly long time, overlapping with the next scheduled time; how do you prevent "dogpiling"? How do you record the complete log about when each job ran, what it output, and whether it was successful or not? And so on. Or for that matter: What if the box that is supposed to schedule this job goes down?

These are fundamental systems operations tasks that you want Unix to solve. Unfortunately, cron leaves all the actual hard challenges unsolved; it's fundamentally just a forker. Cron, then, isn't really useful for much except extremely mundane node-local tasks such as cleaning up $TMP. I can think of relatively few tasks in a modern environment that can use cron without running into its deficiencies.

This means that, for example, backups tend to be handled by an overlapping system that actually has these things built in. This is a shame, because the Unix philosophy wisely encourages us to separate out concerns into complementary tools that fit together. Instead of a rich, modular, failure-tolerant system that only knows how to execute jobs, not what the jobs are, you get various monolithic tools that build all of the logic into themselves.

Not "everyone just uses Yarn" at all. For the projects I work on, we use distributed cronjobs on Kubernetes, which solve pretty much all of the problems with cron. For many people, both Hadoop and Kubernetes are overkill, though, yet they would benefit from a resilient batch scheduler.

mmt · on Aug 22, 2018

> I disagree. If you really look at the problem space, it turns out that "classic cron" is just a "batch scheduling system" that is poorly implemented.

I'm a bit confused. It still sounds like you're describing a batch scheduling system and that cron isn't one (because it only implements a narrow function of such a system).

What features does PBS (not cron) lack that any of the re-inventions do have?

> Unfortunately, cron leaves all the actual hard challenges unsolved

> the Unix philosophy wisely encourages us to separate out concerns into complementary tools that fit together.

I'm also having trouble reconciling these two positions. Cron does one thing, which is forking on a schedule. To leave something like "dogpiling" for another utility (e.g. dotlockfile) to solve seems consistent with the Unix philosophy.

> Not "everyone just uses Yarn" at all. For the projects I work on, we use distributed cronjobs on Kubernetes

That doesn't quite refute my point, as you can feel free to consider "Yarn" as merely a metaphor for the currently most-popular inbuilt scheduler of a currently popular distributed computing platform.