cathugger/mkp224o

Too much concurrency?

Closed this issue · 2 comments

Hi, I have made the following Perl script and I think it is getting consistently better performance results than the equivalent in threads of mkp224o, maybe improving parallelism and depending less on a shared state could improve performance since this script should be slower than the mkp224o implementation since you have to pay for the startup and end times once and once again.

I will share the script, if you find my results inaccurate then sorry, I guess I have been lucky in this case.

When I first noticed that could be an issue was when I found that running the mkp224o -j10 I get only sightly better times than with -j5, not the dramatical x2 increase in speed I would expect.

I got this time for my script:
269.84
And this time for mkp224o
184.20

I am closing the issue.

#!/usr/bin/env perl

use v5.36.0;

use strict;
use warnings;

use POSIX ":sys_wait_h";

my $number_of_names_to_get = $ARGV[0] // 10;
my $jobs                   = $ARGV[1] // 10;
my %pids;

main();

sub main {
    for ( my $i = 0 ; $i < $jobs ; $i++ ) {
        create_child();
    }
    while ( $number_of_names_to_get > 0 ) {
        check_if_some_process_finished();
    }
    kill TERM => keys %pids;
}

sub check_if_some_process_finished {
    my $pid = waitpid( -1, WNOHANG );
    if ($pid) {
        delete $pids{$pid};
        $number_of_names_to_get--;
        if ($number_of_names_to_get) {
            create_child();
        }
    }
}

sub create_child {
    my $pid = fork;
    if ( !$pid ) {
        child_work();
        exit;
    }
    $pids{$pid} = 1;
}

sub child_work {
    exec( 'mkp224o', 'berda', '-d', $ENV{HOME} . '/trash',
        '-n', 1, '-B', '-j', 1 );
}

I cannot let pass this opportunity to make you know that I am really thankful with the work done in this project, I am having big fun finding cool names.

I am now not sure if my data is correct.

Performance should scale roughly linearly with the number of threads up to the number of cores your CPU has available. Beyond that you will have diminishing, and perhaps even negative returns.

Here's a quick test of how the performance scales on my quad-core Intel CPU:
mkp224o multithreaded performance graph
As you can see it's roughly linear between 1-4 threads, and levels off after that.

I am not sure what the context of your timings are, but if you are timing how long it takes to complete with the -n argument, you will not get accurate results. The amount of time it takes to find a match is non-deterministic (without the -p argument). And the harder it is to find a match, the more the time to find it will vary (at least in absolute terms). If you want to measure how certain arguments affect performance, you should use the -S argument to print statistics, and not rely on the timing of results.

Thank you @scribblemaniac I think the problem I am facing is that I was trying to use all my processor threads when I should have used only a thread for core.