how cani get vocab from the model
ersamo opened this issue · 5 comments
Thanks for sharing the code . I'm trying to get vocab from this model but couldn't
the way i used is vocab=bert.get_tokenizer().get_vocab()
with BERT but how can i get it from your model please
vocab=bert.get_tokenizer().get_vocab()
Thank you for your interest in our work. You can get the vocab of our model using the following script.
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('cambridgeltl/tacl-bert-base-uncased')
vocab= tokenizer.get_vocab()
Hope this can help you :)
thanks for replying i got it but how can i get tokenizer.word_index.items()
from your model as I'm trying to get it by
def word_for_id(integer, tokenizer):
for word, index in tokenizer.word_index.items():
if index == integer:
return word
return None
but didn't work
thanks for replying i got it but how can i get
tokenizer.word_index.items()
from your model as I'm trying to get it bydef word_for_id(integer, tokenizer): for word, index in tokenizer.word_index.items(): if index == integer: return word return None
but didn't work
Hi,
I think you can check the official huggingface document. The usage of our model should be the same as the original BERT model.
thanks for replying i got it but how can i get
tokenizer.word_index.items()
from your model as I'm trying to get it bydef word_for_id(integer, tokenizer): for word, index in tokenizer.word_index.items(): if index == integer: return word return None
but didn't work
Hi,
I think you can check the official huggingface document. The usage of our model should be the same as the original BERT model.
thanks a lot for replying . i already checked it but couldn't find the same manner there . can you help please as i need to your model with this method ..
thanks for replying i got it but how can i get
tokenizer.word_index.items()
from your model as I'm trying to get it bydef word_for_id(integer, tokenizer): for word, index in tokenizer.word_index.items(): if index == integer: return word return None
but didn't work
Hi,
I think you can check the official huggingface document. The usage of our model should be the same as the original BERT model.thanks a lot for replying . i already checked it but couldn't find the same manner there . can you help please as i need to your model with this method ..
Hi,
Please try the following lines:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('cambridgeltl/tacl-bert-base-uncased')
vocab = tokenizer.get_vocab()
id2word_dict = {}
for key, value in vocab.items():
id2word_dict[value] = key
# input: id2word_dict[0]
# output: '[PAD]'
For example, if you want to find the word that has an ID of 0. Just use: id2word_dict[0]. It should give you the output as '[PAD]'. Hope this helps :)