ChenFengYe/motion-latent-diffusion

According to Figure 2, the VAE stage does not use any text input?

Closed this issue · 0 comments

layumi commented

According to Figure 2, the VAE stage does not use any text input?
The text encoder is also not used in the first stage. Correct?