Hacker Newsnew | past | comments | ask | show | jobs | submit | maliciouspickle's commentslogin

this is not a direct answer to the original question, but problems like this are what let to the creation of orchestrator tools like airflow, luigi, dagster, prefect, etc.. these tools provide features which help increase task/job observability, ease of debugging, and overall reliability of scheduled jobs/tasks.

it is a natural progression to move on from cron and adopt an orchestrator tool (many options nowadays) when you need more insight into cron, or when you start finding yourself building custom features around it.

i would do some research into orchestators and see if there are any that meet your requirements. many have feature sets and integrations that’s solve some of the exact problems you’re describing

(as a data engineer my current favorite general purpose orchestrator is dagster. it’s lightweight yet flexible)

edit: as a basic example, in most orchestrators, there is a first class way to define data quality checks, if you have less data than expected, or erroneous data (based upon your expectations) you can define this as an automated check

you can then choose to fail the job, or set a number re-retries before failing , or send a notification to some destination of your choice( they have integrations with slack, and many other alerting tools)

i like dagster because it is geared for hooking into the data itself. you can use it to ‘run a job’ like a some function, but really it shines when you use its ‘data asset features’ that tracks the data itself over time, and provides a nice UI to view and compare data from each run over time. hook in alerting for anomalies and you’re good to go!

they have many more features depending on the tool , and some more or less complicated to set up.


I would (respectfully) challenge this idea. :)

I'm not certain adding more complexity (which comes with the more powerful solutions you've suggested) will help things right now.

Cron is such a basic tool, it really shouldn't be causing any problems. I think fixing the underlying problems in the scripts themselves is important to do first.

Just my two cents though!


i completely agree with you. the authors approach seems complex and unnecessary. my basic expectation when I see something labeled as a REST API is:

1. i can submit a request via HTTP

2. data is returned as JSON by a response

3. the most minimal amount of HTTP/Pagination necessary is required


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: