Sequential and parallel execution of long-running shell commands

pdimitar · on March 24, 2024

I am planning to use this for my homemade system-wide update scripts -- that updates via the OS package manager, Golang tools, Rust tools, OCaml tools, NPM-managed tools, editor configs (NeoVim headless updating of plugins as well), and others.

Seems like it has everything I need, the most crucial ones being separate queues / worker pools (f.ex. I only want the tasks that might involve source compilation to be executed one after another on a separate queue, never in parallel).

Anybody knows a good alternative to `multitail`? Still haven't found any other tool that can track the output of several commands at the same time on a screen that's automatically split -- and subsequently the finished tasks get removed from view until only the last one remains, taking the entire screen, and then it says "Done, press Q to quit".

kjkjadksj · on March 24, 2024

You can script up something similar to your multitail example using tmux.

pdimitar · on March 24, 2024

Is it going to automatically close when all tasks complete?

kjkjadksj · on March 26, 2024

That can be done with ending the command with something like “; exit 0”

kjkjadksj · on March 24, 2024

What exactly is the is doing that you can’t do with pure bash? You can set up chains of commands in serial or in parallel, use job arrays, establish dependencies or other run conditions, hop into a backgrounded run, etc.

chasil · on March 24, 2024

Actually, the GNU xargs utility is the most efficient way to perform parallel execution on a large queue of input files.

GNU parallel is also an option, but it is written in Perl and has a larger footprint.

https://www.linuxjournal.com/content/parallel-shells-xargs-u...

Nukesor · on March 25, 2024

You can probably achieve a good subset of its functionality in bash, it's just a nicer interface with a lot of configurability and several convenience features.

I'm generally a big fan of showing alternatives: https://github.com/Nukesor/pueue/?tab=readme-ov-file#similar...

Would you be willing to write a proper guide on how to do all of these things in bash? It would be great to have such a guide inside the Pueue wiki and link to it. It'll help people to make a more informed decision on whether they need this tool or not.

veyh · on March 24, 2024

Pueue dumps the state of the queue to the disk as JSON every time the state changes, so when you have a lot of queued jobs this results in considerable disk io. I actually changed it to compress the state file via zstd which helped quite a bit but then eventually just moved on to running NATS [1] locally.

[1] https://nats.io/

Nukesor · on March 25, 2024

Interesting.

May I ask how many tasks you were managing with Pueue and what your usecase was?

I also thought about using alternative formats such as CBOR, but choosing a human-readable format like JSON made debugging and such a lot easier.

If there's a good usecase for it, I might consider switching to a more compact format.

veyh · on March 27, 2024

I was running a lot of small tasks. With 3000 queued tasks my state file was around 25 MB.

zstd compressed that down to 5% of the size. I have the code still if you want to look at it but it was just a quick experiment so I didn't add any tests. I did add it to the config, disabled by default, though.

https://github.com/veyh/pueue/commit/e9dcf52227304b4b4a2ded4...

Protobuf could be a pretty good alternative. It can be dumped to a human-readable format with the protoc cli.

aeblyve · on March 24, 2024

I usually just need to run one big interactive job on a server, in which case I just do

tmux my-command Ctrl-b d (detach) logout, go to bed, sleep, whatever log back into the server, tmux attach

Would there be an advantage to using this over that?

Alifatisk · on March 24, 2024

Your usecase reminds me of GNU screen

squigz · on March 24, 2024

They're both terminal multiplexers. In my experience tmux is a much better experience though

eddd-ddde · on March 24, 2024

Probably not in practice. You do save that ctrl-b keypress tho.

kjkjadksj · on March 24, 2024

Ampersand also works, no tmux needed

renewiltord · on March 24, 2024

Does it? I must have always configured differently but sighup. Usually have to disown preventing sighup!

ec109685 · on March 24, 2024

Prefix with nohup

Nukesor · on March 24, 2024

Nope. Pueue is designed to handle your usecase times 5+.

I'm still using it for your usecase, as I'm already used to the interface by now :D.

west0n · on March 24, 2024

If you see following error logs,

$ pueue Error: Couldn't find a configuration file. Did you start the daemon yet?

Run `pueued -d` first. I think this prompt should be printed on the screen the first time it runs, or automatically executed.

martinthought · on March 27, 2024

"pueue follow <task_id>" lets you see stdout or stderror of the specified task.

If one enqueues a single chain of tasks (no parallel tasks), is there a way to monitor stdout or stderror for the chain, without having to issue the follow command for each task at the time the task starts to run? This would provide better observability of what is running, as in a shell script with the tasks sequentially listed.

andsens · on March 24, 2024

> Pueue is considered feature-complete

Oh that is sweet sweet music to my ears!

ninja3925 · on March 24, 2024

This looks quite helpful actually. Thanks for sharing.

mshockwave · on March 24, 2024

A similar tool I highly recommend: https://github.com/justanhduc/task-spooler

At first I thought it would just be a one-off tool I used for one of my projects, not until I discovered later that it has everything I need and became my daily driver ever since.

seabrookmx · on March 24, 2024

I've wanted to write something like this for years! For me it's always long running rsync commands I want to chain.

I still might write one as it would be a fun way to play around with some low level code, but when I actually want to get things done I'll be checking this out.

kjkjadksj · on March 24, 2024

Could you just chain them in bash script? You can do it in a dumb way or you can even do it conditionally on the exit status of the previous rsync command.

Nukesor · on March 25, 2024

Also, editing a command in the middle if you notice a mistake becomes tricky.

Pueue also allows you to do stuff like dependencies, which get tricky in bash if a task depends on more than one tasks finishing.

kjkjadksj · on March 26, 2024

If this ability is important to you then you can break your pipeline into individual files that aren’t going to be read until they are executed, giving you time to edit.

Nukesor · on March 27, 2024

Fair point, but to be honest, at this point it's just easier to do a:

pueue add 'rsync somestuff host:location'

And if I notice any problems, I just do a `pueue edit $id` and I'm good to go. It's just a lot more convenient than manually building pipelines with files that'll be executed.

It would be something different if this was about recurrent tasks that needed to be done, though. But for one-off stuff, your approach seems a bit cumbersome.

seabrookmx · on March 24, 2024

Yes or I could use the bash `&&` but the issue is I need to know all the commands at the start. I want to be able to come back an hour later and easily add a command to the end of the chain.

kjkjadksj · on March 26, 2024

You can do that by abrogating your pid

vram22 · on March 24, 2024

Under "Features":

>Pause/resume tasks, when you need some processing power right NOW!

How is the pause and resume done?

dspillett · on March 24, 2024

> How is the pause and resume done?

Perhaps by sending SIGSTOP and SIGCONT, much like hitting Ctrl+Z on the console and later running bg <pid> or fg <pid>.

Note that this is not the same as Ctrl+S & Ctrl+Q on the console – that just pauses the output display not the process (though the process may subsequently pause if a buffer somewhere down the pipeline becomes full due to the terminal output pausing).

Nukesor · on March 25, 2024

Yep, that's exactly how it's done :)

dspillett · on March 28, 2024

Thought it would be, but I didn't want to state that more authoritatively as I'd not bothered checking the docs/source. And I'm lazy like that.

vram22 · on March 24, 2024

Thanks.

elif · on March 24, 2024

Actually pretty sweet. `ps | grep` can be tricky to navigate sometimes. I know I've certainly grabbed the wrong process a few times.

I'm not exactly sure what advantage this has over managing `screen` sessions tho. Maybe it's cleaner from a process tree perspective?

Nukesor · on March 24, 2024

This question has been asked quite a lot, so I wrote a FAQ ;D :

https://github.com/Nukesor/pueue/wiki/FAQ#why-should-i-use-i...

cl3misch · on March 24, 2024

I was about to write that this has a different use case, because managing screen/tmux sessions is very manual. However the repo iself states

> Pueue is not designed to be a programmable (scriptable) task scheduler/executor. The focus of pueue lies on human interaction.

So I also don't really see its usecase and probably would opt for tmux instead. If you had many workers to run, but still do it manually, you would use pueue? I would be interested in such a scenario!

dspillett · on March 24, 2024

The concurrency controls are not present using the tmux method. Though if I am running enough tasks for this to be an issue then something odd is happening.

The queue being persisted and surviving crashes could be useful, and isn't the case with tmux unless you manually script it up. Though it could be painful if the crash leaves things remaining tasks depend upon in an odd state…

The task tree could be useful: scheduling tasks to start once other tasks are completed.

kjkjadksj · on March 24, 2024

You can do this in pure bash I believe. E.g. when you launch a command you can capture its pid then wait for that to exit as you expect before running further jobs.

wang_li · on March 24, 2024

>The focus of pueue lies on human interaction.

How is it better than me having another window open and running my long running command there?

Nukesor · on March 25, 2024

Well, if you're planning on running 10 commands sequentially that might take a few hours each, you would have to keep those windows open for quite a while.

See https://github.com/Nukesor/pueue/wiki/FAQ#what-can-i-use-it-...

bionhoward · on March 24, 2024

This looks awesome and the README seems to undersell it. Would it be possible for you to put code examples all over the readme so we can see what you’re talking about without needing to dig into the codebase?

Nukesor · on March 25, 2024

So, there's a wiki which explains many of the usecases.

I specifically didn't want to further bloat the README, as it's already super long as it is.

hbarka · on March 24, 2024

“Pueue is not designed to be a programmable (scriptable) task scheduler/executor.”

Any alternatives similar to Pueue with capabilities of above?

Nukesor · on March 25, 2024

Not that I know of. If anyone knows of stuff like that, please let me know, as this is regularly asked for :D.

I would be pretty stoked to be hired to build something like that, though.

whatever1 · on March 24, 2024

Ah, I wish this was available when I was using gnu parallel.

croemer · on March 24, 2024

The title is unclear, it should include the tool name

op00to · on March 24, 2024

I do all this with systemd user services.