mmstick/parallel

example comparing parallel with gnu-parallel is nowhere close to the times posted

nkh opened this issue · 5 comments

nkh commented

seq 1 10000 | time -v rust-parallel echo > /dev/null

    Command being timed: "rust-parallel echo"
    User time (seconds): 2.03
    System time (seconds): 84.76
    Percent of CPU this job got: 407%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:21.29
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 27516
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 1220681
    Voluntary context switches: 80355
    Involuntary context switches: 21519
    Swaps: 0
    File system inputs: 0
    File system outputs: 176
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0
nkh commented

gnu parallel has half the wall time on my system

Let me guess: your Linux distributions is forcing transparent_hugepages to always?

cat /sys/kernel/mm/transparent_hugepage/enabled

You should file a bug report against your Linux distribution to change the value to madvise, the recommended value. The always setting conflicts with software using jemalloc for managing heaps.

nkh commented

That was it. I'll file a bug when I have read more about it.

In the meantime, you can fix this by running:

sudo sh -c "echo madvise > /sys/kernel/mm/transparent_hugepage/enabled"

Performance will drastically increase after setting that for pretty much all parallel Rust applications.

nkh commented

a much better result indeed

40 /tmp seq 0 10000 | time -v rust-parallel echo > /dev/null
Command being timed: "rust-parallel echo"
User time (seconds): 0.58
System time (seconds): 3.09
Percent of CPU this job got: 209%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.75
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 3644
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 1795473
Voluntary context switches: 85478
Involuntary context switches: 21873
Swaps: 0
File system inputs: 0
File system outputs: 176
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0