Label names mixed up in results and results_unique

Question

Label names mixed up in results and results_unique

Opened this issue 2 years ago · 4 comments

first, thank you very much for the work you put in this project and making it public! When experimenting with your code base I came across an error when I access the 'labels' of cl.results() and cl.results_unique(). It seems to me that between both, the abels somehow mix up. I want to give you an example:

'img_name_1' is assigned a label '2' in cl.results_unique['labels'], however when I iterate over cl.results['labels'] and search for the file with the same name 'img_name_1' this image belongs (sometimes) to a different label, lets say '5' for example.

My goal is to extract random images and the most centered image (unique) image per cluster label, that is why I would like to match both labels. Maybe, do you have a different idea how I could do it?

Thank you very much!

Best,

Maximilian

Answer 1 · 2023-05-20T07:53:33.000Z

Thank you for your issue. I could not reproduce the issue. Can you maybe demonstrate this with a small example?

from clustimage import Clustimage

cl = Clustimage()

# load example with flowers
pathnames = cl.import_example(data='flowers')

# Cluster flowers
cl.fit_transform(pathnames)

# Make plot
cl.clusteval.plot()
cl.clusteval.scatter(density=True, s=100, params_scatterd={'edgecolor': 'black'})


cl.results['labels']
cl.results_unique['labels']

Answer 2 · 2023-05-25T19:18:23.000Z

Thanks for the reply. I rerun your example, looks like everything is fine. It is probably an error in my code, I have to look. I will close this for now, and in case I find something I will reopen the issue. Thanks for your help!

Answer 3 · 2023-05-25T19:51:21.000Z

Update: I found out that apparently the keywords "filenames" and "pathnames" are not the same. If I use "pathnames" for results and results_unique it works. If I use "filenames" and "pathnames", the labels do not match.

Answer 4 · 2023-05-26T15:46:24.000Z

I again could not reproduce this error. Is there any way you can show the error using one of these four data sets?

from clustimage import Clustimage
cl = Clustimage()
X = cl.import_example(data='flowers')
X = cl.import_example(data='scenes')
X = cl.import_example(data='mnist')

X = cl.import_example(data='faces')
cl.fit_transform(X)
np.all(np.array(list(map(os.path.basename, cl.results['pathnames'])))==cl.results['filenames'])
 #True