According to Figure 2, the VAE stage does not use any text input?

Question

Closed this issue a year ago · 0 comments

According to Figure 2, the VAE stage does not use any text input?
The text encoder is also not used in the first stage. Correct?