llama3 requires 2 eos_token_id's
Closed this issue · 1 comments
jkbbwr commented
I looked into it,
generational_config.json
defines
{
"_from_model_config": true,
"bos_token_id": 128000,
"eos_token_id": [128001, 128009],
"transformers_version": "4.40.0.dev0"
}
We have validation in lib/bumblebee/text/generation_config.ex:298
eos_token_id: {"eos_token_id", optional(number())},
This causes a failure.
I figured it'd be too easy to modify that to list(number())
for my usecase. But I tried it anyway and now I get a failure in the tokenizer.
** (ErlangError) Erlang error: "Could not decode field on position 1"
(tokenizers 0.4.0) Tokenizers.Native.encoding_pad(#Tokenizers.Encoding<[length: 8, ids: [3923, 374, 701, 19214, 9955, 3904, 30, 720]]>, 8096, [pad_id: nil, pad_token: "</s>", direction: :left])
(elixir 1.15.7) lib/enum.ex:1693: Enum."-map/2-lists^map/1-1-"/2
(bumblebee 0.5.3) lib/bumblebee/text/pre_trained_tokenizer.ex:305: Bumblebee.Text.PreTrainedTokenizer.apply/2
(nx 0.7.1) lib/nx.ex:4447: Nx.with_default_backend/2
(bumblebee 0.5.3) lib/bumblebee/text/text_generation.ex:89: anonymous fn/4 in Bumblebee.Text.TextGeneration.generation/4
(nx 0.7.1) lib/nx/serving.ex:1748: anonymous fn/3 in Nx.Serving.handle_preprocessing/2
(telemetry 1.2.1) /home/kibb/.cache/mix/installs/elixir-1.15.7-erts-14.2.3/cf4fca83fbed0caff4eeaa26133235c5/deps/telemetry/src/telemetry.erl:321: :telemetry.span/3
(nx 0.7.1) lib/nx/serving.ex:683: Nx.Serving.run/2
This is about the limit of my diagnosis right now.
Any advice or suggestions welcome
jonatanklosko commented