Batch-mode prediction
Closed this issue · 2 comments
antoine-isnardy-danone commented
Hi,
Thank you for providing these tremendous resources. I'm currently trying to leverage the models that were uploaded to Hugginface (this one e.g.)
Is it expected not to be able to tokenize/generate in a batch-mode fashion?
See below an example:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-es-en")
inputs = tokenizer.encode("mango manzana y pera", return_tensors="pt")
inputs
tensor([[34090, 29312, 11, 306, 75, 0]])
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-es-en")
inputs = tokenizer.encode(["mango manzana y pera"], return_tensors="pt")
inputs
tensor([[1, 0]])
Qwert567777 commented
Fucjivu
jorgtied commented
I am not sure how compatible the tokenizers from huggingface are with the SentencePiece unigram models that we provide for the models here that have been converted to their interfaces. This would be a question to ask at huggingface. Good luck!