Which text encoding model you are using in this code?

Question

Which text encoding model you are using in this code?

crazySyaoran opened this issue 2 years ago · 5 comments

In your paper, it says

At first, we encode TB into an embedded vector vB either by many-hot encoding or using a pre-trained NLP model such as BERT, FastText, or Word2Vec

Can you tell me exactly which text encoding model you are using in your released code? Could you release the encoding model for custom images?

Answer 1 · 2022-12-06T09:15:45.000Z

We use many-hot encoding in our code as it provides a straightforward way to encode the text for our application. However, BERT-encoded text also provides comparable results. We have tested with a pre-trained BERT model uncased_L-24_H-1024_A-16 from the following repository - https://github.com/google-research/bert#pre-trained-models.

Answer 2 · 2022-12-06T09:38:01.000Z

Thanks a lot, it was a great help. I will try it soon.

Answer 3 · 2022-12-07T07:59:36.000Z

Hi I noticed that the encoding length in encodings.csv is 84, while the output of BERT from your provided url is (61,1024). I urgently need to reproduce your results from custom input text. Could you release your many-hot encoding model mentioned above? or could you release the code suits the BERT's encoding shape?

Answer 4 · 2022-12-08T21:04:14.000Z

The many-hot encoding was manually collected during data annotation. So, we do not have a model to infer this encoding directly from the image. It needs to be done manually as an interactive user input. In the case of a frozen text encoder, such as BERT, you need to consider the output from the last hidden layer. The final hidden layer output can be projected to the target shape through another linear layer if required. In our experiments, we tested with BERT encoding of length 384. Also, note that for any specific encoding type and/or length, the stage-1 network needs to be retrained.

Check the following resources on text encoding with BERT.
[1] https://medium.com/future-vision/real-time-natural-language-understanding-with-bert-315aff964bfa
[2] https://github.com/NVIDIA/TensorRT/tree/main/demo/BERT

However a more recent and currently recommended way is to use Hugging Face Transformers.
https://github.com/huggingface/transformers

A demo of TIPS with BERT encoding is (temporarily) available at
https://drive.google.com/file/d/1Jsms6hPKg6ESrJyRTdwkgKScIKow1RnU

Answer 5 · 2022-12-09T09:19:48.000Z

Thanks for the reply, but I didnt find the BERT encoding of length 384 you mentioned above in hugging face. Berts I found in hugging face are:

	H=128	H=256	H=512	H=768
L=2	2/128 (BERT-Tiny)	2/256	2/512	2/768
L=4	4/128	4/256 (BERT-Mini)	4/512 (BERT-Small)	4/768
L=6	6/128	6/256	6/512	6/768
L=8	8/128	8/256	8/512 (BERT-Medium)	8/768
L=10	10/128	10/256	10/512	10/768
L=12	12/128	12/256	12/512	12/768 (BERT-Base)

from https://huggingface.co/google/bert_uncased_L-2_H-768_A-12

Could you please tell me where I can get your pretrained BERT encoding of length 384 ?