Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Sequential and parallel execution of long-running shell commands (github.com/nukesor)
95 points by sea-gold on March 24, 2024 | hide | past | favorite | 49 comments


I am planning to use this for my homemade system-wide update scripts -- that updates via the OS package manager, Golang tools, Rust tools, OCaml tools, NPM-managed tools, editor configs (NeoVim headless updating of plugins as well), and others.

Seems like it has everything I need, the most crucial ones being separate queues / worker pools (f.ex. I only want the tasks that might involve source compilation to be executed one after another on a separate queue, never in parallel).

Anybody knows a good alternative to `multitail`? Still haven't found any other tool that can track the output of several commands at the same time on a screen that's automatically split -- and subsequently the finished tasks get removed from view until only the last one remains, taking the entire screen, and then it says "Done, press Q to quit".


You can script up something similar to your multitail example using tmux.


Is it going to automatically close when all tasks complete?


That can be done with ending the command with something like “; exit 0”


What exactly is the is doing that you can’t do with pure bash? You can set up chains of commands in serial or in parallel, use job arrays, establish dependencies or other run conditions, hop into a backgrounded run, etc.


Actually, the GNU xargs utility is the most efficient way to perform parallel execution on a large queue of input files.

GNU parallel is also an option, but it is written in Perl and has a larger footprint.

https://www.linuxjournal.com/content/parallel-shells-xargs-u...


You can probably achieve a good subset of its functionality in bash, it's just a nicer interface with a lot of configurability and several convenience features.

I'm generally a big fan of showing alternatives: https://github.com/Nukesor/pueue/?tab=readme-ov-file#similar...

Would you be willing to write a proper guide on how to do all of these things in bash? It would be great to have such a guide inside the Pueue wiki and link to it. It'll help people to make a more informed decision on whether they need this tool or not.


Pueue dumps the state of the queue to the disk as JSON every time the state changes, so when you have a lot of queued jobs this results in considerable disk io. I actually changed it to compress the state file via zstd which helped quite a bit but then eventually just moved on to running NATS [1] locally.

[1] https://nats.io/


Interesting.

May I ask how many tasks you were managing with Pueue and what your usecase was?

I also thought about using alternative formats such as CBOR, but choosing a human-readable format like JSON made debugging and such a lot easier.

If there's a good usecase for it, I might consider switching to a more compact format.


I was running a lot of small tasks. With 3000 queued tasks my state file was around 25 MB.

zstd compressed that down to 5% of the size. I have the code still if you want to look at it but it was just a quick experiment so I didn't add any tests. I did add it to the config, disabled by default, though.

https://github.com/veyh/pueue/commit/e9dcf52227304b4b4a2ded4...

Protobuf could be a pretty good alternative. It can be dumped to a human-readable format with the protoc cli.


I usually just need to run one big interactive job on a server, in which case I just do

tmux my-command Ctrl-b d (detach) logout, go to bed, sleep, whatever log back into the server, tmux attach

Would there be an advantage to using this over that?


Your usecase reminds me of GNU screen


They're both terminal multiplexers. In my experience tmux is a much better experience though


Probably not in practice. You do save that ctrl-b keypress tho.


Ampersand also works, no tmux needed


Does it? I must have always configured differently but sighup. Usually have to disown preventing sighup!


Prefix with nohup


Nope. Pueue is designed to handle your usecase times 5+.

I'm still using it for your usecase, as I'm already used to the interface by now :D.


If you see following error logs,

$ pueue Error: Couldn't find a configuration file. Did you start the daemon yet?

Run `pueued -d` first. I think this prompt should be printed on the screen the first time it runs, or automatically executed.


"pueue follow <task_id>" lets you see stdout or stderror of the specified task.

If one enqueues a single chain of tasks (no parallel tasks), is there a way to monitor stdout or stderror for the chain, without having to issue the follow command for each task at the time the task starts to run? This would provide better observability of what is running, as in a shell script with the tasks sequentially listed.


> Pueue is considered feature-complete

Oh that is sweet sweet music to my ears!


This looks quite helpful actually. Thanks for sharing.


A similar tool I highly recommend: https://github.com/justanhduc/task-spooler

At first I thought it would just be a one-off tool I used for one of my projects, not until I discovered later that it has everything I need and became my daily driver ever since.


I've wanted to write something like this for years! For me it's always long running rsync commands I want to chain.

I still might write one as it would be a fun way to play around with some low level code, but when I actually want to get things done I'll be checking this out.


Could you just chain them in bash script? You can do it in a dumb way or you can even do it conditionally on the exit status of the previous rsync command.


Also, editing a command in the middle if you notice a mistake becomes tricky.

Pueue also allows you to do stuff like dependencies, which get tricky in bash if a task depends on more than one tasks finishing.


If this ability is important to you then you can break your pipeline into individual files that aren’t going to be read until they are executed, giving you time to edit.


Fair point, but to be honest, at this point it's just easier to do a:

pueue add 'rsync somestuff host:location'

And if I notice any problems, I just do a `pueue edit $id` and I'm good to go. It's just a lot more convenient than manually building pipelines with files that'll be executed.

It would be something different if this was about recurrent tasks that needed to be done, though. But for one-off stuff, your approach seems a bit cumbersome.


Yes or I could use the bash `&&` but the issue is I need to know all the commands at the start. I want to be able to come back an hour later and easily add a command to the end of the chain.


You can do that by abrogating your pid


Under "Features":

>Pause/resume tasks, when you need some processing power right NOW!

How is the pause and resume done?


> How is the pause and resume done?

Perhaps by sending SIGSTOP and SIGCONT, much like hitting Ctrl+Z on the console and later running bg <pid> or fg <pid>.

Note that this is not the same as Ctrl+S & Ctrl+Q on the console – that just pauses the output display not the process (though the process may subsequently pause if a buffer somewhere down the pipeline becomes full due to the terminal output pausing).


Yep, that's exactly how it's done :)


Thought it would be, but I didn't want to state that more authoritatively as I'd not bothered checking the docs/source. And I'm lazy like that.


Thanks.


Actually pretty sweet. `ps | grep` can be tricky to navigate sometimes. I know I've certainly grabbed the wrong process a few times.

I'm not exactly sure what advantage this has over managing `screen` sessions tho. Maybe it's cleaner from a process tree perspective?


This question has been asked quite a lot, so I wrote a FAQ ;D :

https://github.com/Nukesor/pueue/wiki/FAQ#why-should-i-use-i...


I was about to write that this has a different use case, because managing screen/tmux sessions is very manual. However the repo iself states

> Pueue is not designed to be a programmable (scriptable) task scheduler/executor. The focus of pueue lies on human interaction.

So I also don't really see its usecase and probably would opt for tmux instead. If you had many workers to run, but still do it manually, you would use pueue? I would be interested in such a scenario!


The concurrency controls are not present using the tmux method. Though if I am running enough tasks for this to be an issue then something odd is happening.

The queue being persisted and surviving crashes could be useful, and isn't the case with tmux unless you manually script it up. Though it could be painful if the crash leaves things remaining tasks depend upon in an odd state…

The task tree could be useful: scheduling tasks to start once other tasks are completed.


You can do this in pure bash I believe. E.g. when you launch a command you can capture its pid then wait for that to exit as you expect before running further jobs.


>The focus of pueue lies on human interaction.

How is it better than me having another window open and running my long running command there?


Well, if you're planning on running 10 commands sequentially that might take a few hours each, you would have to keep those windows open for quite a while.

See https://github.com/Nukesor/pueue/wiki/FAQ#what-can-i-use-it-...


This looks awesome and the README seems to undersell it. Would it be possible for you to put code examples all over the readme so we can see what you’re talking about without needing to dig into the codebase?


So, there's a wiki which explains many of the usecases.

I specifically didn't want to further bloat the README, as it's already super long as it is.


“Pueue is not designed to be a programmable (scriptable) task scheduler/executor.”

Any alternatives similar to Pueue with capabilities of above?


Not that I know of. If anyone knows of stuff like that, please let me know, as this is regularly asked for :D.

I would be pretty stoked to be hired to build something like that, though.


Ah, I wish this was available when I was using gnu parallel.


The title is unclear, it should include the tool name


I do all this with systemd user services.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: