Why is parallel so slow for me?
d33tah opened this issue · 7 comments
d33tah@d33tah-pc:/tmp$ cat /tmp/test.sh
#!/bin/bash
export LC_ALL=C
for i in `seq 3`; do
yes "banana" | dd count=$(( 10 ** $i )) > /tmp/yes2
time /usr/bin/parallel --pipe cat </tmp/yes2 >/dev/null
time /home/d33tah/.cargo/bin/parallel --pipe cat </tmp/yes2 >/dev/null
echo
done
d33tah@d33tah-pc:/tmp$ bash /tmp/test.sh
10+0 records in
10+0 records out
5120 bytes (5.1 kB, 5.0 KiB) copied, 0.000121935 s, 42.0 MB/s
real 0m0.185s
user 0m0.140s
sys 0m0.032s
real 0m0.145s
user 0m0.040s
sys 0m0.240s
100+0 records in
100+0 records out
51200 bytes (51 kB, 50 KiB) copied, 0.000211743 s, 242 MB/s
real 0m0.125s
user 0m0.088s
sys 0m0.024s
real 0m1.348s
user 0m0.344s
sys 0m2.632s
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 0.00160011 s, 320 MB/s
real 0m0.134s
user 0m0.100s
sys 0m0.016s
real 0m13.676s
user 0m3.236s
sys 0m26.820s
Looks like that's not the case:
[15:48:09] ➜ /tmp cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never
Installed via cargo install
.
I'll have to investigate this when I have time to put on this project. I'm still heavily engaged in Ion Shell development, which takes priority over this. I believe this may have to do with Parallel making a copy of the input file even though the input is already a file. A fix could be to check if the stdin is a file, and then using that directly. I'd need to get some perf profiling done to find the exact cause.
Once Ion is complete, I will be integrating it directly into Parallel, as I'll ensure that Ion can be called as a library. Then there won't be a need to call an external shell to execute commands, and it will be able to use Ion as a scripting language in the same way that GNU Parallel uses Perl. Will be a major performance and feature win, given that Ion is drastically superior to Dash, both in performance and feature set.
Something else you can try though is to compile Parallel with MUSL. It eliminates the shared dependencies on glibc, which has a high cost to short-lived parallel tasks.
rustup component add target x86_64-unknown-linux-musl
cargo build --release --target x86_64-unknown-linux-musl
Hi, just out of curiousity, why would transparent_hugepages set to always
hurt parallel
's performance?
@amosbird It's because THP has an issue where it majorly ruins memory-related performance when a binary is using jemalloc. Especially so when that program performs a lot of forks, such as this program, where most of it's time is spent forking. If set to always, it will always enact and aggressively purge caches that are used by jemalloc.
So I have a new project -- concurr. Still in it's early stages, but it has a service (concurr-jobsd
) and associated client for controlling nodes with that service running (concurr
). Syntax will be very similar to Parallel, but it won't be drop-in compatible -- taking a different route.
The server is built using Tokio, and executes each command within embedded instances of the Ion shell. The client sends a command template to each configured node (which can contain multiple commands), and then asynchronously submits inputs to execute to each slot on each node, and then reads the results back in the order of submission. So distributed computing capabilities are a big feature with the new solution.
The client is currently very basic though. Syntax is as follows:
concurr 'COMMAND TO EXECUTE {}' : arg1 arg2 arg3 arg4
concurr ' COMMAND {}' :: file1 file2 file3
It doesn't yet support reading from stdin, or permutating inputs, or any of the more advanced optional features of Parallel (only on day 3 of development). I'll be working on that shortly. But it does offer TOML configuration and XDG app dir support. Example config:
# A list of nodes that the client will connect to.
nodes = [
"127.0.0.1:31514",
"192.168.1.3:31514",
"192.168.1.194:31514"
]
# Defines whether the client should request outputs of inputs.
outputs = true