erdogant/clustimage

Trainability

MalekBezzina opened this issue · 6 comments

Hello,

I just want to know if after running the model( with diffrent parameters) on similar datasets, will the model learn from one ruun to another. Like a classical NN when the weights are updated at each iteration?

if for exemple i run the code with diffrent parameters but i only save the last pkl file, am i only saving the weights of the last run or am i saving the whole thing?

I hope i explained myself well.

The model does not update after making a new run. At each initialization, the function clean_init is called to make sure that all previous results are removed and an entirely new model is trained. Regarding the save function, the underneath results are saved which are only the results from the last run.

cl.results
cl.params
cl.pca
cl.params_pca
cl.params_hog
cl.params_hash
cl.results_faces
cl.results_unique
cl.distfit
cl.clusteval

So if i understood correctly the model does not learn to cluster the images better at each run, it does not have the leaning ability?

and one last question: if i desactivate the clean_init function will the parameters of each run accumulate or how will it work??

That is correct. This is an unsupervised approach without learning on any target or response variable. However, I see the confusion because it is possible, with the find function, to score new unseen images with those initially modeled (I tried not to use the word "predict"). This is done by mapping the new unseen image in the exiting space and then doing the k-nearest neighbour approach or testing for significance using probability density fitting. The latter part does have a "learning" component.

Thus, if you deactivate or remove the clean_init function, it still can not learn because there is no real learning process involved as with Neural nets etc.

If you want to create a better or more comprehensive model, you need to re-run the model after adding the new samples in the dataset. This can be time-consuming if you iteratively add new samples.

I understand from your questions that you want to add new samples after fitting the model?

Oh now everything is perfectly clear!
Yes, in fact i am trying to create a web app that clusteres a dataset of test graph results, and it is a cumulative dataset.
That is why i don't want to re-run the model eeverytime i get new data, i wanted it to learn the shapes and save the "labels+weights"
So that when i run on the new dataset it can cluster them perfectly without error( because it had seen the shapes before)
and then do a mapping with the find function to figure out the labeles.
that is the goal, do you think you can help me in some way?

The find function can maybe be extended with another parameter, such as find(add_to_model=True). Then it is possible to extract the mapping and filenames etc from the prediction, and add it to the initial results.

Thus, add the columns self.results['predict']['feat'] into self.results['feat']
Add self.results['predict']['FILENAME.png']['labels'] into self.results['labels']
etc etc

Thus you first need to check out what is stored in self.results and add the names, labels, mapping from self.results['predict'] etc.

Thank you for your help!