embedding size is not a multiple of 3?

Hi, I'm trying to reproduce the results in the paper but I had some trouble using the pos embedding with provided transformer embedding sizes. The paper use embedding sizes of 1024 and 512, but according to this line

VideoGPT-Paper/videogpt/layers/pos_embd.py

Line 11 in f4579d1

assert embd_dim % n_dim == 0, f"{embd_dim} % {n_dim} != 0"

, the size should be a multiple of n_dim, which is supposed to be 3? Am I missing anything? Thanks!

I simplified some of the code and removed some parts related to using multiple codebooks, so there were 4 dimensions in the input to the transformer (T, H, W, # codebooks), hence 512 / 1024. It should work to just choose some multiple of 3 near 512 / 1024 and should produce very similar results.

Interesting... I think that makes sense to me. A question irrelevant to the thread but relevant to your response, what is the point to try more than one codebooks other than increase the number of code in the codebook?

In general, using multiple codebooks is easier to optimize / less prone to codebook collapse. It also allows more (combinatorially) more expressivity in the codebook with linear scaling in the # of codes when compared to just using 1 codebook.

Oh these are good points! Thank you for your patient reply! : )