DeepSeek-Coder-V2推理警告
Qlalq opened this issue · 0 comments
Qlalq commented
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:10<00:00, 2.52s/it]
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:100001 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
推理时终端输出如上警告,且无法通过训练集的测试(训练输入A,输出B,实际输入A,输出C),训练loss如图,请问您之前解决过类似的问题吗?