how cani get vocab from the model

Question

how cani get vocab from the model

ersamo opened this issue 3 years ago · 5 comments

Thanks for sharing the code . I'm trying to get vocab from this model but couldn't
the way i used is vocab=bert.get_tokenizer().get_vocab() with BERT but how can i get it from your model please

Answer 1 · 2021-12-28T17:09:36.000Z

vocab=bert.get_tokenizer().get_vocab()

Thank you for your interest in our work. You can get the vocab of our model using the following script.

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('cambridgeltl/tacl-bert-base-uncased')
vocab= tokenizer.get_vocab()

Hope this can help you :)

Answer 2 · 2021-12-28T17:18:06.000Z

thanks for replying i got it but how can i get tokenizer.word_index.items() from your model as I'm trying to get it by

def word_for_id(integer, tokenizer):
    for word, index in tokenizer.word_index.items():
        if index == integer:
            return word
    return None

but didn't work

Answer 3 · 2021-12-28T19:45:48.000Z

thanks for replying i got it but how can i get tokenizer.word_index.items() from your model as I'm trying to get it by
def word_for_id(integer, tokenizer):
    for word, index in tokenizer.word_index.items():
        if index == integer:
            return word
    return None
but didn't work

Hi,

I think you can check the official huggingface document. The usage of our model should be the same as the original BERT model.

Answer 4 · 2021-12-28T20:19:55.000Z

thanks for replying i got it but how can i get tokenizer.word_index.items() from your model as I'm trying to get it by
def word_for_id(integer, tokenizer):
    for word, index in tokenizer.word_index.items():
        if index == integer:
            return word
    return None
but didn't work
Hi,

I think you can check the official huggingface document. The usage of our model should be the same as the original BERT model.

thanks a lot for replying . i already checked it but couldn't find the same manner there . can you help please as i need to your model with this method ..

Answer 5 · 2021-12-28T21:00:11.000Z

thanks for replying i got it but how can i get tokenizer.word_index.items() from your model as I'm trying to get it by
def word_for_id(integer, tokenizer):
    for word, index in tokenizer.word_index.items():
        if index == integer:
            return word
    return None
but didn't work
Hi,
I think you can check the official huggingface document. The usage of our model should be the same as the original BERT model.
thanks a lot for replying . i already checked it but couldn't find the same manner there . can you help please as i need to your model with this method ..

Hi,

Please try the following lines:

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('cambridgeltl/tacl-bert-base-uncased')
vocab = tokenizer.get_vocab()
id2word_dict = {}
for key, value in vocab.items():
       id2word_dict[value] = key

# input: id2word_dict[0]
# output: '[PAD]'

For example, if you want to find the word that has an ID of 0. Just use: id2word_dict[0]. It should give you the output as '[PAD]'. Hope this helps :)