尝试重构您的代码,出现 IndexError: Target -1 is out of bounds
HarborZeng opened this issue · 6 comments
我使用 torch==1.7.0,transformers==3.5.1 重构您的代码,在 update 方法的
(lm_loss), *_ = model(input_ids, labels=lm_labels, token_type_ids=token_type_ids)
这一行遇到错误,但是切换环境到 torch==1.4.0,transformers==2.1.1 就没有这个问题,想必是版本问题,不知如何修复。
全文报错如下:
Special tokens have been added in the vocabulary, make sure the associated word embedding are fine-tuned or trained.
['[CLS] [speaker1] 王 雁 盟 [speaker2] 1 9 9 6 年 , 台 湾 计 算 机 程 序 设 计 师 王 雁 盟 到 欧 洲 旅 游 , 在 布 拉 格 街 头 他 为 街 头 艺 人 的 手 风 琴 演 奏 所 着 迷 。 于 是 在 第 二 年 , 他 拜 巴 黎 手 风 琴 演 奏 家 d o m i n i q u e b o d i n 为 师 , 学 习 手 风 琴 演 奏 技 术 。 1 9 9 8 年 回 台 湾 , 在 街 头 拉 着 他 的 手 风 琴 游 荡 。 之 后 , 他 开 始 为 电 影 、 剧 团 演 出 等 伴 奏 手 风 琴 。 到 2 0 0 3 年 , 他 为 几 米 的 《 地 下 铁 一 个 音 乐 的 旅 程 》 音 乐 剧 作 曲 与 演 出 。 《 漂 浮 的 手 风 琴 》 是 他 自 己 制 作 、 作 曲 并 演 奏 的 第 一 个 专 辑 。 [SEP]', '[CLS] [speaker1] 大 话 西 游 之 月 光 宝 盒 主 演 [speaker2] 罗 家 英 [SEP] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]']
['[CLS] [speaker1] [speaker1] [speaker1] [speaker1] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2]', '[CLS] [speaker1] [speaker1] [speaker1] [speaker1] [speaker1] [speaker1] [speaker1] [speaker1] [speaker1] [speaker1] [speaker1] [speaker1] [speaker2] [speaker2] [speaker2] [speaker2] [speaker2] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD] [PAD]']
['[UNK] [UNK] [UNK] [UNK] [UNK] [UNK] 1 9 9 6 年 , 台 湾 计 算 机 程 序 设 计 师 王 雁 盟 到 欧 洲 旅 游 , 在 布 拉 格 街 头 他 为 街 头 艺 人 的 手 风 琴 演 奏 所 着 迷 。 于 是 在 第 二 年 , 他 拜 巴 黎 手 风 琴 演 奏 家 d o m i n i q u e b o d i n 为 师 , 学 习 手 风 琴 演 奏 技 术 。 1 9 9 8 年 回 台 湾 , 在 街 头 拉 着 他 的 手 风 琴 游 荡 。 之 后 , 他 开 始 为 电 影 、 剧 团 演 出 等 伴 奏 手 风 琴 。 到 2 0 0 3 年 , 他 为 几 米 的 《 地 下 铁 一 个 音 乐 的 旅 程 》 音 乐 剧 作 曲 与 演 出 。 《 漂 浮 的 手 风 琴 》 是 他 自 己 制 作 、 作 曲 并 演 奏 的 第 一 个 专 辑 。 [SEP]', '[UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] 罗 家 英 [SEP] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK]']
Current run is terminating due to exception: Target -1 is out of bounds..
Engine run is terminating due to exception: Target -1 is out of bounds..
Traceback (most recent call last):
File "/t/main.py", line 40, in <module>
trainer.run(train_dataloader, max_epochs=2)
File "/lib/python3.7/site-packages/ignite/engine/engine.py", line 691, in run
return self._internal_run()
File "/lib/python3.7/site-packages/ignite/engine/engine.py", line 762, in _internal_run
self._handle_exception(e)
File "/lib/python3.7/site-packages/ignite/engine/engine.py", line 467, in _handle_exception
raise e
File "/lib/python3.7/site-packages/ignite/engine/engine.py", line 730, in _internal_run
time_taken = self._run_once_on_dataset()
File "/lib/python3.7/site-packages/ignite/engine/engine.py", line 828, in _run_once_on_dataset
self._handle_exception(e)
File "/lib/python3.7/site-packages/ignite/engine/engine.py", line 467, in _handle_exception
raise e
File "/lib/python3.7/site-packages/ignite/engine/engine.py", line 811, in _run_once_on_dataset
self.state.output = self._process_function(self, self.state.batch)
File "/home/kingsoft/gang/t/main.py", line 21, in update
(lm_loss), *_ = model(input_ids, labels=lm_labels, token_type_ids=token_type_ids)
File "/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/lib/python3.7/site-packages/transformers/modeling_openai.py", line 595, in forward
loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))
File "/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 962, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File "/lib/python3.7/site-packages/torch/nn/functional.py", line 2468, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/lib/python3.7/site-packages/torch/nn/functional.py", line 2264, in nll_loss
ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
IndexError: Target -1 is out of bounds.
Process finished with exit code 1
main.py
from transformers import BertTokenizer, OpenAIGPTLMHeadModel
from dataset import get_dataloader
tokenizer = BertTokenizer.from_pretrained("models/CDial-GPT_LCCC-large", do_lower_case=True)
train_dataloader = get_dataloader(tokenizer)
model = OpenAIGPTLMHeadModel.from_pretrained("models/CDial-GPT_LCCC-large")
from transformers import AdamW
optimizer = AdamW(model.parameters(), lr=1e-5)
import torch
def update(engine, batch):
input_ids, token_type_ids, lm_labels = tuple(batch)
print(tokenizer.batch_decode(input_ids))
print(tokenizer.batch_decode(token_type_ids))
print(tokenizer.batch_decode(lm_labels))
model.train()
(lm_loss), *_ = model(input_ids, labels=lm_labels, token_type_ids=token_type_ids)
loss = lm_loss / 64
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
if engine.state.iteration % 64 == 0:
optimizer.step()
optimizer.zero_grad()
return loss.item(), optimizer.param_groups[0]['lr']
from ignite.engine import create_supervised_trainer
# from torch import nn
# trainer = create_supervised_trainer(model, optimizer, loss_fn=nn.NLLLoss())
from ignite.engine import Engine
trainer = Engine(update)
trainer.run(train_dataloader, max_epochs=2)
dataset.py
import os
from itertools import chain
import torch
from torch.utils.data import DataLoader
from torch.utils.data import Dataset
from torch.nn.utils.rnn import pad_sequence
SPECIAL_TOKENS = ["[CLS]", "[SEP]", "[speaker1]", "[speaker2]"]
MODEL_INPUTS = ["input_ids", "lm_labels", "token_type_ids"]
class WBDataset(Dataset):
def __init__(self, data, tokenizer, max_history=15, batch_first=True, lm_labels=True):
self.data = data
self.tokenizer = tokenizer
self.max_history = max_history
self.pad = tokenizer.pad_token_id
self.batch_first = batch_first
self.lm_labels = lm_labels
def __len__(self):
return len(self.data)
def __getitem__(self, index):
history = self.data[index][-2 * self.max_history:-1]
resposne = self.data[index][-1]
return self.process(history, resposne)
def process(self, history, resposne, with_eos=True):
bos, eos, speaker1, speaker2 = self.tokenizer.convert_tokens_to_ids(SPECIAL_TOKENS)
sequence = [[bos]] + history + [resposne + ([eos] if with_eos else [])]
sequence = [sequence[0]] + [[speaker2 if i % 2 else speaker1] + s
for i, s in enumerate(sequence[1:])]
instance = {
"input_ids": list(chain(*sequence)),
"token_type_ids": [bos] + [speaker2 if i % 2 else speaker1 for i, s in enumerate(sequence[1:]) for _ in s],
"lm_labels": ([-1] * sum(len(s) for s in sequence[:-1])) + [-1] + sequence[-1][1:]
}
return instance
def collate(self, batch):
input_ids = pad_sequence(
[torch.tensor(instance["input_ids"][:512], dtype=torch.long) for instance in batch],
batch_first=self.batch_first, padding_value=self.pad)
token_type_ids = pad_sequence(
[torch.tensor(instance["token_type_ids"][:512], dtype=torch.long) for instance in batch],
batch_first=self.batch_first, padding_value=self.pad)
labels = pad_sequence(
[torch.tensor(instance["lm_labels"][:512], dtype=torch.long) for instance in batch],
batch_first=self.batch_first, padding_value=-1)
return input_ids, token_type_ids, labels
def get_dataset(tokenizer):
dataset_cache = "dataset_cache_" + type(tokenizer).__name__
if os.path.isfile(dataset_cache):
dataset = torch.load(dataset_cache)
else:
import json
dataset = {
"train": json.load(open("data/corpus.json"))["conversations"]
}
def tokenize(obj):
if isinstance(obj, str):
return tokenizer.convert_tokens_to_ids(tokenizer.tokenize(obj))
if isinstance(obj, dict):
return dict((n, tokenize(o)) for n, o in obj.items())
return list(tokenize(o) for o in obj)
dataset = tokenize(dataset)
torch.save(dataset, dataset_cache)
return dataset
def get_dataloader(tokenizer):
dataset = get_dataset(tokenizer)
train_dataset = WBDataset(dataset["train"], tokenizer)
train_loader = DataLoader(train_dataset, collate_fn=train_dataset.collate, batch_size=2, shuffle=True)
return train_loader
我猜是label padding的数字的问题,您尝试看下不同版本ignored_index那里,再看看我们padding数据的时候用的是哪个,您相应的根据版本改一下。
我猜是label padding的数字的问题,您尝试看下不同版本ignored_index那里,再看看我们padding数据的时候用的是哪个,您相应的根据版本改一下。
我该在哪里加呢,我重构的代码还没开始定义 loss function 呢,我查了下,那个 ignore_index
好像是和 loss function 有关的
应该是在不同版本的transformers库中,
GPT2LMHeadModel
这个类在计算loss的时候实现方法不一样导致的。
建议去对比一下这两个版本中,在计算loss的时候实现上的不同。
重点关注他们在计算loss的时候给ignore_index
的赋值
应该是在不同版本的transformers库中,
GPT2LMHeadModel
这个类在计算loss的时候实现方法不一样导致的。建议去对比一下这两个版本中,在计算loss的时候实现上的不同。
重点关注他们在计算loss的时候给
ignore_index
的赋值
感谢您提供的思路,顺着您的思路,去查了源码,发现老版本中,https://github.com/huggingface/transformers/blob/v2.1.1/transformers/modeling_openai.py#L517
loss_fct = CrossEntropyLoss(ignore_index=-1)
而新版本中去掉了 ignore_index=-1
默认值:https://github.com/huggingface/transformers/blob/v3.5.1/src/transformers/modeling_openai.py#L594
loss_fct = CrossEntropyLoss()
查看 CrossEntropyLoss
源码发现 ignore_index: int = -100
如此默认值,于是,将 collate
函数中倒数第二行改为 padding_value=-100
,process
函数中 "lm_labels": ([-100] * sum(len(s) for s in sequence[:-1])) + [-100] + sequence[-1][1:]
把所有 padding value 改为 -100。
这样就能成功运行了,我可以继续重构了。
我不知道新版本中在计算loss的其他地方是否有不同,请在重构的时候多加注意。