bfshi/DGAM-Weakly-Supervised-Action-Localization

Clarifications of implementations

Opened this issue · 1 comments

Hi, I read your paper and great job! I really liked your approach of using genrative model to remove the context frames.
I am new to conditional VAE concept and I wanted some clarifications

  1. In loss.py line 71, why do you subtract the attention from mean of z.
  2. In paper, section 3.3 paragraph after eq. 9, you set the prior as a Gaussian and to make it dependent to lambda you use r. May I know in code where you try to enforce this?
  3. Also what are your thoughts on adding another condition of action alongwith lambda to generate X. Can CVAE handle multiple multiple conditions? If you know about this, do you know any literature.
bfshi commented

Hi, thanks for the interest!

  1. For a frame with attention=a, we use the Gaussian distribution N(a, I) as the prior. The kld loss is the kl divergence between the prior N(a, I) and the posterior N(mean, var). this is equivalent to the KL divergence between N(0, I) and N(mean - a, var).

  2. The lambda is the attention. This is just what we're doing when subtracing the attention from the mean.

  3. Actually it's not another condition, but the latent code z. For regular VAE, z is the only input. For conditional VAE, we have both z and conditional variable (attention in our case) as input.