shenwei356/rush

TODO

shenwei356 opened this issue · 11 comments

  • add example of -v
  • implement retry interval
  • add more examples on bioinformatics
  • do not send empty data
  • support continue
  • test more in windows
  • avoid mixed line from multiple process, e.g. the first half of a line is from one process and the last half of the line is from another process.
  • replacement string {^suffix} for removing suffix
  • add flag --eta

please add automatic detection for using shell or not-use.

OK. I'll use mattn/go-shellwords

mattn commented

go-shellwords doesn't detect multiple commands like foo; bar, Sorry. BTW I'm guessing why go is faster than rust in this result is whether shell is spawned.

https://www.reddit.com/r/rust/comments/5penft/parallelizing_enjarify_in_go_and_rust/dcr4y7f/

I think running all commands using shell ($SHELL -c for *nix and %COMSPEC% /c for Windows) for both single command and multiple commands like foo; bar is fine.

mattn commented

What I mean is Why rust is faster always. :)
If rush can avoid to spawn shell, rush will be faster, I guess.

I get it. Thanks you.

@mattn Running commands within a shell has very little overhead for my Rust implementation when you follow the recommendation to install dash. Here's a comparison of times with and without the shell:

Without Shell

seq 1 10000 | time -v target/x86_64-unknown-linux-musl/release/parallel 'echo {}' > /dev/null

User time (seconds): 0.40
System time (seconds): 2.68
Percent of CPU this job got: 93%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.29

5489.640372 task-clock:u (msec)

With Shell

These are times when the shell is enabled (with dash-static-musl installed)

seq 1 10000 | time -v target/x86_64-unknown-linux-musl/release/parallel 'echo {}; echo {}' > /dev/null

User time (seconds): 0.35
System time (seconds): 2.56
Percent of CPU this job got: 128%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.27

4593.366103 task-clock:u (msec)

Believe it or not, but the shell path with dash is actually much faster than the no-shell path. That is something that I will be investigating, to see where my bottleneck is in regards to the no-shell codepath.

@mmstick The rust implementation is indeed faster for this test. And the go API for running a process needs to call $SHELL -c, so I did not compare case without using shell.

What made me confused was why rush_linux_amd64 had a bad performance in your two computers. In my laptop, for the test seq 1 10000 | time -v $CMD 'echo {}' > /dev/null , rust-parallel has ~4X speed of rush but was >100X faster in your computers.

Here's a fresh result:

$ for cmd in parallel rust-parallel rush; do echo $cmd; seq 1 10000 | time -v $cmd 'echo {}' > /dev/null; done
parallel
        Command being timed: "parallel echo {}"
        User time (seconds): 28.73
        System time (seconds): 30.66
        Percent of CPU this job got: 185%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:32.04

rust-parallel
        Command being timed: "rust-parallel echo {}"
        User time (seconds): 3.13
        System time (seconds): 4.82
        Percent of CPU this job got: 312%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.54

rush
        Command being timed: "rush echo {}"
        User time (seconds): 12.81
        System time (seconds): 24.45
        Percent of CPU this job got: 274%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:13.57

Besides, speed is not the #.1 target for rush now, especially for processes that last long. I'm using it every day in my Bioinformatics analysis and try to keep on improving the usability and stability.

Do you have any AMD hardware? Both of my systems are powered with AMD so that could be one reason. It could also be the Intel CPU governor having issues of not retaining it's max frequency long enough.

Basically, before I perform my benchmarks, I ensure that all software is closed, that the CPU governor is set to performance via sudo cpupower frequency-set -g performance, and that transparent_hugepages is set to madvise via sudo sh -c "echo madvise > /sys/kernel/mm/transparent_hugepage/enabled". The Linux distribution that I am operating from is Arch Linux, and I have dash-static-musl installed because of it's high performance.

Would it be possible to process a set of commands that is specified in a file, for example like the "::::" argument in GNU parallel?

@mfasold -i file.txt