CaraJ7/CoMat

Where is Ltxt? How does the Mixed Latent Strategy involved in training?

lrzjason opened this issue · 3 comments

In Figure 4: Overview of CoMat. It has a loss call Ltxt. It involved a GT prompt and Text prompt.

In the bottom description, it becomes Li2t which only involved the Text prompt.

Before section 5, The formula combines all losses which also doesn't include Ltxt.
1723067244744

I double checked in code, it only generated one image which doesn't using the 'Noisy GT'.
KL)5{_2I_0NVD)W{KGYA0PB

And only once self.caption_model() to have call.

Does Mixed Latent Strategy was used in training at all?

Hi @lrzjason, the current version does not include the mixed latent strategy. We will update the codebase recently. Please stay tuned!

Thanks for pointing out the error in Fig. 4. In fact, the $\mathcal{L}_{txt}$ in the figure should be $\mathcal{L}_{i2t}$. $\mathcal{L}_{i2t}$ involves both GT prompt and Text prompt.

@CaraJ7 Thanks for the reply.