luisoala/non-iid-ssdl

Question on the implementation of Minkowski based distances

luochonghai opened this issue · 4 comments

When reading paper 'MixMOOD: A systematic approach to class distribution mismatch in semi-supervised learning using deep dataset dissimilarity measures' with the code released in this repo, I meet a few questions.

  1. Here you calculate the distance in this way:
    image
    In your code however, the min_dist is calculated in another way: https://github.com/peglegpete/mixmood/blob/c45f3b442547884214f929b3ec7e68a422e63aa9/utilities/dataset_distance_measurer.py#L170
    image
    And it seems that the formula would be written as:
    image

  2. In the paper you mention that:
    image
    I get confused about the meaning of C, since C is the number of samples, do you mean that you randomly choose C samples from τ? In the paper, the hyper-parameter C is set as:
    image
    However, I find the only '30' here: https://github.com/peglegpete/mixmood/blob/c45f3b442547884214f929b3ec7e68a422e63aa9/utilities/test_generator.py#L194
    So would you please illustrate what C means in a easier way : )

  3. After I read the paper, I'm interested in the reason why the four metrics work well and why the density-based metrics perform better than Minkowski-based metrics(not experiments, by your theoretical analysis). Also, I wonder about the motivation you get to design these metrics_(:з」∠)_ Would you please share your opinion on these questions : )

Many thx~

Dear @luochonghai,

Just to notify that I received your question just now. Will get back to you with detailed answer tomorrow (3 AM here:P). Thanks for your interest in our work!

Best,
Luis

Dear @luochonghai ,

Apologies for latency and thanks for your detailed comments.

  1. Argmin indeed is incorrect here, thank you for pointing that out. It should read min_k, i.e. the min wrt to the index k. We will update this in the paper. The notation you provide min ||h_j - h_k||_p is not entirely correct either. ||h_j - h_k||p returns a scalar and treating this scalar as a set of size one would always return this scalar. In fact, what we are doing is selecting h_k from H{b, \tau} such that the norm ||.||_p is minimized.

  2. C is the number of batches for doing the distance measuring. In our case is 10, not 30 as wrongly stated in the paper. We are going to correct it. In the code dataset_distance_tester_pdf(...) it is the parameter num_batches = 10, so C = 10.

  3. We wanted to explore two types of distance measures: between groups of samples and between empirical feature densities of samples (please note, as we point out in the paper, that our measures are not metrics in the formal sense, e.g. symmetry of distance does not necessarily hold, among other things). Note that the problem of measuring distances between datasets in absence of a probability measure is, to the best of our knowledge, not solved. We found these references helpful to understand more about the existing approaches
    [28] Facundo Mémoli. Gromov–wasserstein distances and the metric approach to object matching.
    Foundations of computational mathematics, 11(4):417–487, 2011.
    [29] Nikolaj Tatti. Distances between data sets based on summary statistics. Journal of Machine
    Learning Research, 8(Jan):131–154, 2007.

Finally, in terms of theoretical analysis we agree: this should be the next step and we are planning to incorporate this in future work. In case you are interested to collaborate on this let me know and we can have a call soon.

Best,
Luis

Dear @luisoala ,
Thank you for your detailed answers! The problem of measuring differences between datasets, in my opinion, is indeed a challenging but interesting topic. Since in the coming half year I have another topic to analyze, I will pay close attention to your work, and hope you'll find sth. interesting in this fascinating topic (ง •̀_•́)ง

Thanks and best of luck to you, too(: