erdogant/clustimage

Memory Error during import_dat

ntokenl opened this issue · 3 comments

Hi,

i am encountering a Memory error during the import_data step. After loading the images it throws a Memory Error. Anyway i can figure out what is the issue.

Traceback (most recent call last):
File "/home/stg/prod/combine_clustimage.py", line 56, in
results = cl.fit_transform(targetdir)
File "/home/stg/.local/lib/python3.10/site-packages/clustimage/clustimage.py", line 352, in fit_transform
_ = self.import_data(X, black_list=black_list)
File "/home/stg/.local/lib/python3.10/site-packages/clustimage/clustimage.py", line 992, in import_data
X = self.preprocessing(Xraw['pathnames'], grayscale=self.params['cv2_imread_colorscale'], dim=self.params['dim'], flatten=flatten)
File "/home/stg/.local/lib/python3.10/site-packages/clustimage/clustimage.py", line 806, in preprocessing
img, imgOK = zip(*imgs)
MemoryError

How many images are there in your directory ?

i tried with 30000-50000 images,

seem like the issue is with scipy lapack and the specific error can be reproduced via this
a = np.ones((30000,30000))
u,s,vh = svd(a)
documented here - scipy/scipy#10337

managed to find some solutions,

the workaround is to switch to lapack driver from 'gesdd' to 'gesvd' in scipy
"/usr/lib/python3/dist-packages/scipy/linalg/_decomp_svd.py"

another way i am trying is to use the intel-scipy python modules via pip
the latest seems to be incompatible with clustimage, it states the mismatch in the version of 'numba'
installing the release candidate of numba seems to make it work again.

not to sure about the accuracy or performance after these change but clustimage is no longer reporting the error.

Great. Nice to hear that you find a solution. However, I am not sure why it states that there is a mismatch in numba version. It is not directly used in clustimage but maybe it may have been imported in another package.