mmstick/parallel

--pipe option

nkh opened this issue · 12 comments

nkh commented

I'd like to use parallel to speed up some filters that take their input via stdin

see gnu parallel --pipe

It may be a little while before I pick development back up on this project. Currently working on some other things at the moment. There's a lot of work to be done in the Rust community, so it's hard for me to focus on one project at a time.

I'll make a mental note that the difference between --pipe and no --pipe at all is that --pipe supplies arguments from standard input to commands as their standard input instead of as arguments. Sounds simple enough to implement. If anyone's interested in implementing the feature, I accept pull requests.

In the meantime, you can solve this by passing the commands via echo so that they are given to the standard input of your program, like so:

data | parallel 'echo {} | command'
nkh commented

A good work around. I did open a new issue as there where quoting quirks. But it works fine otherwise.

nkh commented

I think there is still a problem.
seq -f 'xxx %f' 0 10000 | time -v rust-parallel 'echo {} | piper --global hi blue '\d+' red' > /dev/null

takes 22 seconds, all cpus running at around 96%
it takes the same time without the redirection to /dev/null

it takes .2 second in gnu parallel
it takes .1 seconds without any of the parallelizations (it's a small data set)

High CPU usage doesn't sound right. Parallel should be consuming pretty much nothing.

nkh commented

Let me know how I can help with tests.

Is it the parallel process specifically that's consuming the CPU? How many tasks are being run in parallel?

nkh commented

it seems that there are just a handful of tasks, the top task takes around 12% CPU the other seem to take very little.

Looking at top or htop is not that helpful, how would you collect the information?

nkh commented

8 tasks, around 70% user space

https://i.imgur.com/Y2Qd6vP.png

I'll start working on adding support for the --pipe option. As for speed, I'll have to look into that later.

I have piping working in my local branch, but I am going to spruce up my source code and implement quoting support before I upload it to master.

nkh commented

great!

I'll help debugging the timing issue, my guess is that it will disappear with you implementing --pipe. just let me know how. Best is if you check in a few test scripts I can run.

Piping is now implemented in version 0.6.0