YifanXu74/Libra

[Question] VQGAN loss function

binbinsh opened this issue · 4 comments

Question

Thank you for your great work. I would like to try the VQGAN implementation in your repo, I found the loss definition in the code libra/models/libra/taming/models/vqgan.py:

        if lossconfig is not None:
            self.loss = instantiate_from_config(lossconfig)
        # self.quantize = VectorQuantizer(n_embed, embed_dim, beta=0.25,
        #                                 remap=remap, sane_index_shape=sane_index_shape)

I search the repo but does not found the config file, could you provide an example of the loss config, thank you very much.

Hi, the current code does not contain a training script for a vqgan tokenizer. I will update an instruction later for training a vqgan with LFQ.

Hi @binbinsh,

Thank you for your excellent work. I would appreciate it if you could provide more details on how to train the image tokenizer. Here is my initial understanding:

Assuming we have a patch feature z with a dimension of 512 from CLIP ViT, LFQ first projects it from 512 to 18 dimensions (for simplicity, let's ignore the efficient implementation that uses two codebooks). Each dimension is then quantized to -1 or 1. Once we have the quantized feature, we can calculate the corresponding index idx. After that, the reconstructed feature is obtained by projecting this 18-dimensional feature back to 512 dimensions, which is then fed into the VQ-GAN decoder to generate the original image. The trainable parameters include proj_in, proj_out and decoder

Once the image tokenizer is trained, the idx of the input image is used for discrete autoregressive training.

Am I misunderstanding or missing any important details in this process?

Thanks again!

Hi @haibo-qiu ,

From my understanding, this is how the process goes!

Much appreciate~