I’ll give this a go, though honestly for the stuff I use Parallel for, I don’t think that Parallel is the bottleneck; usually I use parallel to queue up like 50 ffmpeg or imagemagick jobs, and the vast vast vast majority of the time for those (I would think) would be consumed by the main program, not Parallel.
Maybe I have a really atypical workflow on this? Or maybe I’m misunderstanding something?
I find parallel to have the best UI (beside the citation thing).
Whenever I use xargs (which is available more often on remote servers) I struggle to find the right invocation. And I learned about parallel way after using xargs, so I don't think it's just my habbit.
Similarly to you my use cases are relatively few (hundreds?) long-running commands, so the performance of the implementation is less important.
I used to use GNU parallel for automating some server management tasks (this was circa 2009). For long-running or (typical) non-interactive tasks, it's fine, but if you're waiting to see output from running something that should be really quick, the startup time is (or was) frustratingly long.
I ended up writing my own parallel execution code in shell, and it was much, much faster.
Instead of killing and starting a new process for every input, it will keep re-using the same coproc'ed process. I guess it's almost like daemonizing the process into several daemons and then feeding inputs through the daemons as one becomes available.
I can see this being useful for extremely short-lived high paralleled jobs where the overhead of the killing/starting a new process is significant relative to the job itself.
Maybe I have a really atypical workflow on this? Or maybe I’m misunderstanding something?