miccunifi/ARNIQA

Inconsistency of visualization with different input samples.

Closed this issue · 3 comments

Hi, I have tried using your approach to classify images with different types of noise. The model converged after 120k training steps with 40k samples. However, I found that when visualizing the validation data, if I randomly select 10 samples and apply different types of noise to each of them, the final visualization of clustering changes significantly due to the varying random seed. Even though each type of noise is perfectly clustered among the 10 samples, the relative relationships between the different types of noise vary.

I'm wondering if you encountered the same issue when visualizing different types of distortion by changing the samples (image content) used for encoding? I tried both t-SNE and UMAP.

Below is my training curve.
Screenshot 2024-10-18 152209

Hi, AFAIK that is expected when using dimensionality reduction techniques such as t-SNE and UMAP, as they are inherently random. For instance, even in the simple examples reported in this guide the clusters' overall structure and their relative relationships change for every run.
However, the fact that every type of noise belongs to a different cluster suggests that the training of ARNIQA was successful. If I were you, I would try using (a lot) more than 10 samples to see what happens.

If you find this repo useful for your work please consider leaving a star.

Hi, thanks for the quick reply. I have tried using 100 samples. The issue is that, for instance, a Gaussian noise cluster with seed 0 random samples occurs at the top of the visualization space, while with seed 42 it appears at the bottom. It seems that brightness and spatial distortions are similar in some samples, but in others, they are quite different, with brightness being close to blur.

I confirm that this is mostly due to the visualization techniques. They are stochastic and very dependent on their hyperparameters. UMAP should be better at preserving the overall structure but it's not guaranteed. With t-SNE the distance between the clusters has no meaning. I suggest you try to study the t-SNE and UMAP techniques to understand which one is more suitable for your needs.