Why resize token embedings when dataset is pororo or flintstone?

Question

Why resize token embedings when dataset is pororo or flintstone?

LiamTTT opened this issue 2 years ago · 5 comments

Hi!

I am reproducing this work, and I noticed that the token embeding is resized when training on pororo or flintstone datasets.
My question is:

Why do that?
Why resize to those num?

BTW, thanks for your opensource!
look forward to your reply :)

Answer 1 · 2023-01-06T07:29:21.000Z

Hi, thanks for your interests. I not quite clear about what dose "resized token embeding" means. Could you please refer the corresponding code using a link?

Answer 2 · 2023-01-06T07:38:20.000Z

oh, sorry.
I mean why 'clip_embedding_tokens' or 'blip_embedding_tokens' is different when dataset change.

https://github.com/Flash-321/ARLDM/blob/94725e13e7c790ec1025fd5d485771becc367f02/config.yaml#L32-L56
https://github.com/Flash-321/ARLDM/blob/94725e13e7c790ec1025fd5d485771becc367f02/main.py#L119-L121

Answer 3 · 2023-01-06T07:48:27.000Z

Thanks for your comments! this is because we added some new tokens for characters in these two datasets.
https://github.com/Flash-321/ARLDM/blob/34b30703a2caeeb2364bdfb161345027217785c6/config.yaml#L35
https://github.com/Flash-321/ARLDM/blob/34b30703a2caeeb2364bdfb161345027217785c6/config.yaml#L42
https://github.com/Flash-321/ARLDM/blob/34b30703a2caeeb2364bdfb161345027217785c6/datasets/flintstones.py#L36-L39
https://github.com/Flash-321/ARLDM/blob/34b30703a2caeeb2364bdfb161345027217785c6/datasets/pororo.py#L37-L40
as a result, the vocab size of tokenizer has been changed, and we need to resize the token embeddings to make sure the embedding layer can still encode the sentence. And the num can be obtained by printing len(clip_tokenizer) and len(blip_tokenizer) after adding those new tokens.

Answer 4 · 2023-01-06T08:00:15.000Z

Got it! Thanks!
That is a fantastic work! Looking forward to more work from you.

Answer 5 · 2023-01-06T08:02:35.000Z

Thanks! Feel free to open an issue if you have any further questions!