prasunroy/tips

Which text encoding model you are using in this code?

crazySyaoran opened this issue · 5 comments

In your paper, it says

At first, we encode TB into an embedded vector vB either by many-hot encoding or using a pre-trained NLP model such as BERT, FastText, or Word2Vec

Can you tell me exactly which text encoding model you are using in your released code? Could you release the encoding model for custom images?

We use many-hot encoding in our code as it provides a straightforward way to encode the text for our application. However, BERT-encoded text also provides comparable results. We have tested with a pre-trained BERT model uncased_L-24_H-1024_A-16 from the following repository - https://github.com/google-research/bert#pre-trained-models.

Thanks a lot, it was a great help. I will try it soon.

Hi I noticed that the encoding length in encodings.csv is 84, while the output of BERT from your provided url is (61,1024). I urgently need to reproduce your results from custom input text. Could you release your many-hot encoding model mentioned above? or could you release the code suits the BERT's encoding shape?

The many-hot encoding was manually collected during data annotation. So, we do not have a model to infer this encoding directly from the image. It needs to be done manually as an interactive user input. In the case of a frozen text encoder, such as BERT, you need to consider the output from the last hidden layer. The final hidden layer output can be projected to the target shape through another linear layer if required. In our experiments, we tested with BERT encoding of length 384. Also, note that for any specific encoding type and/or length, the stage-1 network needs to be retrained.

Check the following resources on text encoding with BERT.
[1] https://medium.com/future-vision/real-time-natural-language-understanding-with-bert-315aff964bfa
[2] https://github.com/NVIDIA/TensorRT/tree/main/demo/BERT

However a more recent and currently recommended way is to use Hugging Face Transformers.
https://github.com/huggingface/transformers

A demo of TIPS with BERT encoding is (temporarily) available at
https://drive.google.com/file/d/1Jsms6hPKg6ESrJyRTdwkgKScIKow1RnU

Thanks for the reply, but I didnt find the BERT encoding of length 384 you mentioned above in hugging face. Berts I found in hugging face are:

  H=128 H=256 H=512 H=768
L=2 2/128 (BERT-Tiny) 2/256 2/512 2/768
L=4 4/128 4/256 (BERT-Mini) 4/512 (BERT-Small) 4/768
L=6 6/128 6/256 6/512 6/768
L=8 8/128 8/256 8/512 (BERT-Medium) 8/768
L=10 10/128 10/256 10/512 10/768
L=12 12/128 12/256 12/512 12/768 (BERT-Base)

from https://huggingface.co/google/bert_uncased_L-2_H-768_A-12

Could you please tell me where I can get your pretrained BERT encoding of length 384 ?