a parallel executor with a maximum number of concurrent tasks at any time
Imagine that we have a task, e.g. checking whether a domain name is available for register. We can run the following command
whois test.com
We may end up with an one-liner with filtered output:
$ whois test.com | grep -m 1 -Eo '(Expir[a-z]* Date: [0-9-]*|No match)' | sed 's/.*: //'
2019-06-17
$ whois test_non_exist.com | grep -m 1 -Eo '(Expir[a-z]* Date: [0-9-]*|No match)' | sed 's/.*: //'
No match
The above shows that test.com will be expired on 2019-06-17 and test_non_exist.com is available.
We can put the one-liner into a bash script and loop through the input data.
$ time bash check_domain.sh test.com
2019-06-17
real 0m6.601s
user 0m0.004s
sys 0m0.006s
$ time bash check_domain.sh test_non_exist.com
No match
real 0m3.275s
user 0m0.004s
sys 0m0.004s
If we have 1 million of these tasks, we will need 3.3 to 6.6 million seconds (38 to 76 days) to loop through all tasks.
$ ./corun --np=100 --in=input.1000.txt --out=output.1000.txt
The input file contains task identifiers and commands.
$ head input.1000.txt
able_bay: bash check_domain.sh ablebay.com
about_bay: bash check_domain.sh aboutbay.com
above_bay: bash check_domain.sh abovebay.com
act_bay: bash check_domain.sh actbay.com
add_bay: bash check_domain.sh addbay.com
after_bay: bash check_domain.sh afterbay.com
again_bay: bash check_domain.sh againbay.com
age_bay: bash check_domain.sh agebay.com
ahead_bay: bash check_domain.sh aheadbay.com
air_bay: bash check_domain.sh airbay.com
The output file contains task identifiers and outputs. The order of outputs may be different from the order of input. Thus we need the task idenfier.
$ head output.1000.txt
buy_bay: 2018-09-30
apply_bay: 2027-02-06
claim_bay: 2019-05-13
dark_bay: 2019-07-09
candy_bay: 2018-11-10
carry_bay: 2019-02-18
bus_bay: 2019-04-18
above_bay: No match
about_bay: No match
able_bay: 2019-03-08
When testing with 1000 example tasks and setting the maximum number of concurrent tasks equals to 100, the total execution time is 2.5 minutes. This includes the ramp-up and ramp-down time of the pipeline (the first 100 and last 100 tasks). It would take 1 to 2 hours if we run these tasks sequentially, based on the execution time of a single task (3.3 to 6.6 seconds).
$ time ./corun --np=100 --in=input.1000.txt --out=output.1000.txt
real 2m36.791s
user 0m5.858s
sys 0m3.411s