naver/posescript

I have a question regarding the metrics output after training the model.

Closed this issue · 1 comments

Hello! I became interested in this research and ran the demo as follows:

bash generative_modifier/script_generative_modifier.sh 'train' pretrain 1

As a result, the following metrics were output:

BertScore: hashcode distilbert-base-uncased_L5_no-idf_version=0.3.12(hug_trans=4.41.2)
[val] Epoch: [2999] Stats: avg_likelihood: -0.643 kld_a: 1.609 kld_b: -0.528 fid: 0.017 v2v_elbo: 2.243 v2v_dist_avg: 0.213 v2v_dist_top1: 0.125 v2v_dist_top6: 0.15 jts_elbo: 1.791 jts_dist_avg: 0.292 jts_dist_top1: 0.151 jts_dist_top6: 0.191 rot_elbo: 1.259 rot_dist_avg: 9.255 rot_dist_top1: 6.998 rot_dist_top6: 7.696 ret_r1_prec: 89.457 ret_r2_prec: 94.436 ret_r3_prec: 96.597 ret_multimodality: 0.407 ...

Could you please explain what each of these metrics means?

Thank you, and I would appreciate it if you could check this at your convenience!

Hello,

I will refer to the paper below. The emoji 🚮 marks attempt metrics, used at some point during development, that were eventually abandoned, and should thus be taken with a pinch of salt. These metrics are computed here and here.

  • avg_likelihood: (averaged over the dataset) log-likelihood corresponding to the generated texts (computation here)
  • 🚮 kld_a: Kullback–Leibler Divergence (KLD) between the fusion distribution of the pose editing model (parameterized by ($\mu_p, \Sigma_p$) in Figure 4), obtained when using the generated text vs. using the original text annotation.
  • 🚮 kld_b: difference between two KLDs, obtained by comparing the pose distribution ( ($\mu_b, \Sigma_b$) in Figure 4) and the fusion distribution (($\mu_p, \Sigma_p$) in Figure 4) of the pose editing model, when using the the generated text vs. using the original text annotation. Instead of comparing the text effect as in kld_a, kld_b was supposed to measure the loss in alignment with the probabilistic view of pose B, when using the generated text instead of the original one.
  • 🚮 fid: FID (as in Table 5) when using the generated text in the pose-editing model.
  • 🚮 (v2v|jts|rot)_elbo: ELBO for vertices, joints and rotations (as in Table 5), when using the generated text in the pose-editing model.
  • 🚮 (v2v|jts|rot)_dist_avg: reconstruction metrics from Table 6 (MPVE|MPJE|geodesic), averaged over 30 generated pose samples.
  • (v2v|jts|rot)_dist_top(1|6): reconstruction metrics from Table 6 (MPVE|MPJE|geodesic) using the "best" (closer to the GT pose) generated pose sample out of 30 (or, if 🚮 top6, using the average over the top 6 best samples).
  • ret_r(1|2|3)_prec are the R precision metrics (table 6, first 3 columns).
  • 🚮 ret_multimodality: (computation) originally proposed along with the R precision metrics in TM2T [19], but based here on the cosine similarity instead of the Euclidean distance, as our evaluation retrieval model was trained with the former.