retire dependency on `pprocess`
PaulHancock opened this issue · 1 comments
The pprocess
module was a convenient way to implement multiprocessing for aegean
but it is no longer being developed.
The current AegeanTools
has internalised a version of pprocess
that was updated to work with python3, however this module cannot be supported long-term.
pprocess
should be replaced by appropriate use of the multiprocessing
module.
Testing on my local machine reveals that using --cores=1
is the fastest way to find sources in an image.
My resource monitor shows that I have 1 instance of aegean
running and that it is using 100% of all 8 cores.
On a test image of 3k x 3k pixels, with 1297 islands, and 1376 components:
--cores=1
takes about 4 minutes to complete--cores=6
takes at least 15 minutes to complete (I got bored and<ctrl+c>
'd it)
Using scalene
with --cores=1
I can see that 90% of the execution time is 'native' with the other 10% being 'python'.
According to the documentation, this means that 90% of the time is spent in c/c++ libraries, which I assume are numpy/scipy.
It seems that numpy/scipy have some parallelism built in.
In fact, upon further reading, it may not be numpy/scipy explicitly doing this - some BLAS/LAPAC and MKL functions have the ability to natively use multiple cores.
Since much of the work of aegean
is fitting via lmfit
which in turn uses scipy.optimize.minimize
which in turn uses system libraries, my system is able to do the multiprocessing 'for free'.
So, long story short, removing pprocess
may be as simple as just removing all the aegean
multiprocessing, and using the single core version.
The BANE
multiprocessing still needs to be managed by me since the linear algebra libraries don't do much work here.