[BUG] Parallel usage of Wavelets results in errors
chaithyagr opened this issue · 8 comments
System setup
OS: [e.g] macOS v10.14.1
Python version: [e.g.] v3.6.7
Python environment (if any): [e.g.] conda v4.5.11
Describe the bug
While using Pysap with Parallel in joblib to carry out Wavelet Transforms for various cases, We face issues. The nb_band_per_scale variable is not being populated.
To Reproduce
coeffs, coeffs_shape = \
zip(*Parallel(n_jobs=self.n_cpu)
(delayed(self._op)
(data[i], self.transform[i])
for i in numpy.arange(self.num_channels)))
with _op function defined as :
def _op(self, data, transform):
if isinstance(data, numpy.ndarray):
data = pysap.Image(data=data)
transform.data = data
transform.analysis()
coeffs, coeffs_shape = flatten(transform.analysis_data)
return coeffs, coeffs_shape
Expected behavior
We expect the adjoint operation to work. But we get random errors, especially that nb_band_per_scale is None.
Module and lines involved
I see that when n_cpu=1, things work smoothly, the issue is when we extend it to have more cores.
Are you planning to submit a Pull Request?
- Yes ---> If I get some fix that is, but this is at a lower priority for me
- No
Can you format the code in this issue to make it more readable?
Updated codes
Cool, can you also provide a minimal failing example so that we can directly copy-paste and investigate easily (it will also potentially be the base for a future unit test)?
Also don't forget to include the error traceback in the issue.
Finally, remember to also format the code when in text.
Well, it is not quite direct, but here is the smallest I could get.
import pysap
from pysap.base.utils import flatten
from pysap.base.utils import unflatten
from joblib import Parallel, delayed
import numpy as np
num_channels = 32
n_cpu = 8
N = 64
def op(data, transform):
if isinstance(data, np.ndarray):
data = pysap.Image(data=data)
transform.data = data
transform.analysis()
coeffs, coeffs_shape = flatten(transform.analysis_data)
return coeffs, coeffs_shape
def adj_op(coeffs, coeffs_shape, transform):
transform.analysis_data = unflatten(coeffs, coeffs_shape)
image = transform.synthesis()
return image.data
transform_klass = pysap.load_transform("db4")
transform = np.asarray([transform_klass(nb_scale=4) for i in np.arange(num_channels)])
data = (np.random.randn(num_channels, N, N) +
1j * np.random.randn(num_channels, N, N))
coeffs, coeffs_shape =\
zip(*Parallel(n_jobs=n_cpu)
(delayed(op)
(data[i], transform[i])
for i in np.arange(num_channels)))
coeffs_shape = np.asarray(coeffs_shape)
image = Parallel(n_jobs=n_cpu)(
delayed(adj_op)
(coeffs[i], coeffs_shape[i], transform[i])
for i in np.arange(num_channels))
Note that the test fails if n_cpu>1 with following traceback : (I did not add this earlier as to me it doesn't make a lot of sense)
transform.analysis_data = unflatten(coeffs, coeffs_shape)
File "/home/cg260486/cgr_venv/lib/python3.5/site-packages/python_pySAP-0.0.3-py3.5-linux-x86_64.egg/pysap/base/transform.py", line 264, in _set_analysis_data
if len(analysis_data) != sum(self.nb_band_per_scale):
TypeError: 'NoneType' object is not iterable
For some reason, when we try to run in parallel the nb_band_per_scale is not initialized.
However the above code works great with n_cpu=1, in which case, we are running sequentially
It looks like this is related with multiple imports of pysap as the backend is loky. Moving backend to threading solves this issue. I dont think there's much left to address here.
Closing.
I don't think we can close this. Indeed if at some point we want to do multi-processing (and not simply multi-threading) we will need to potentially use other backends.
Can you explain what you mean by the multiple imports of pysap?
Can you explain what you mean by the multiple imports of pysap?
For each process, a new pysap is loaded. Firstly, this adds to a lot of overhead.
In my opinion, the initializations and communications across multiple processes are not happening right in multi-process cases. We may have to explore deeper, and this could be an issue in joblib (mostly not), or here in pysap.
I am fine with keeping it open, I just felt that this could at this point mean, just merely more debug.
Well yes there should only be an overhead but not the error you were mentioning. Let's keep it open for further investigation.