Tokenzier与Model分别设置时出现问题
rela0426 opened this issue · 2 comments
rela0426 commented
陆博士,您好,我将模型的Tokenizer
部分替换为了XLMRobertaTokenizer
,其它没有变。每次训练到第500条数据时报错。换其他的Tokenizer也是一样,难道T5
的模型只能和T5Tokenizer
一起使用吗?看过您新发布的UIE系统,中文的Tokenizer
是以BertTokenizer
改写的,我也尝试按照这种方式进行,也失败了。调试了一周没有头绪,下面是改动部分和报错信息,错误的原因可能在哪里?我应该从哪里排查?当Tokenizer
和model
不是同一个模型时应该注意什么?谢谢陆博士~
只改动了下面调用Tokenizer的部分
tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-base",bos_token=None,eos_token='</s>',unk_token='<unk>',pad_token='<pad>',cls_token=None,mask_token=None)
报错信息:
ModelArguments(model_name_or_path='t5-small', config_name=None, tokenizer_name=None, cache_dir=None, use_fast_tokenizer=False, model_revision='main', use_auth_token=False)
DataTrainingArguments(task='event', dataset_name=None, dataset_config_name=None, text_column=None, summary_column=None, train_file='data/text2tree/one_ie_ace2005_subtype/train.json', validation_file='data/text2tree/one_ie_ace2005_subtype/val.json', test_file='data/text2tree/one_ie_ace2005_subtype/test.json', overwrite_cache=False, preprocessing_num_workers=None, max_source_length=256, max_target_length=128, val_max_target_length=128, pad_to_max_length=False, max_train_samples=None, max_val_samples=None, max_test_samples=None, source_lang=None, target_lang=None, num_beams=None, ignore_pad_token_for_loss=True, source_prefix='event: ', decoding_format='tree', event_schema='data/text2tree/one_ie_ace2005_subtype/event.schema')
ConstraintSeq2SeqTrainingArguments(output_dir='models/CF_2022-05-20-14-30-29880_t5-small_tree_one_ie_ace2005_subtype_linear_lr1e-4_ls0_16_wu2000', overwrite_output_dir=False, do_train=True, do_eval=True, do_predict=True, evaluation_strategy=<IntervalStrategy.STEPS: 'steps'>, prediction_loss_only=False, per_device_train_batch_size=16, per_device_eval_batch_size=64, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=0.0001, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=30.0, max_steps=-1, lr_scheduler_type=<SchedulerType.LINEAR: 'linear'>, warmup_ratio=0.0, warmup_steps=2000, logging_dir='models/CF_2022-05-20-14-30-29880_t5-small_tree_one_ie_ace2005_subtype_linear_lr1e-4_ls0_16_wu2000_log', logging_strategy=<IntervalStrategy.STEPS: 'steps'>, logging_first_step=False, logging_steps=500, save_strategy=<IntervalStrategy.STEPS: 'steps'>, save_steps=500, save_total_limit=1, no_cuda=False, seed=421, fp16=False, fp16_opt_level='O1', fp16_backend='auto', fp16_full_eval=False, local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=500, dataloader_num_workers=0, past_index=-1, run_name='models/CF_2022-05-20-14-30-29880_t5-small_tree_one_ie_ace2005_subtype_linear_lr1e-4_ls0_16_wu2000', disable_tqdm=False, remove_unused_columns=True, label_names=None, load_best_model_at_end=True, metric_for_best_model='eval_role-F1', greater_is_better=True, ignore_data_skip=False, sharded_ddp=[], deepspeed=None, label_smoothing_factor=0.0, adafactor=False, group_by_length=False, report_to=['tensorboard'], ddp_find_unused_parameters=None, dataloader_pin_memory=True, skip_memory_metrics=False, sortish_sampler=False, predict_with_generate=True, constraint_decoding=True, label_smoothing_sum=False)
05/20/2022 14:30:10 - WARNING - __main__ - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False
05/20/2022 14:30:10 - INFO - __main__ - Training/evaluation parameters ConstraintSeq2SeqTrainingArguments(output_dir='models/CF_2022-05-20-14-30-29880_t5-small_tree_one_ie_ace2005_subtype_linear_lr1e-4_ls0_16_wu2000', overwrite_output_dir=False, do_train=True, do_eval=True, do_predict=True, evaluation_strategy=<IntervalStrategy.STEPS: 'steps'>, prediction_loss_only=False, per_device_train_batch_size=16, per_device_eval_batch_size=64, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=0.0001, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=30.0, max_steps=-1, lr_scheduler_type=<SchedulerType.LINEAR: 'linear'>, warmup_ratio=0.0, warmup_steps=2000, logging_dir='models/CF_2022-05-20-14-30-29880_t5-small_tree_one_ie_ace2005_subtype_linear_lr1e-4_ls0_16_wu2000_log', logging_strategy=<IntervalStrategy.STEPS: 'steps'>, logging_first_step=False, logging_steps=500, save_strategy=<IntervalStrategy.STEPS: 'steps'>, save_steps=500, save_total_limit=1, no_cuda=False, seed=421, fp16=False, fp16_opt_level='O1', fp16_backend='auto', fp16_full_eval=False, local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=500, dataloader_num_workers=0, past_index=-1, run_name='models/CF_2022-05-20-14-30-29880_t5-small_tree_one_ie_ace2005_subtype_linear_lr1e-4_ls0_16_wu2000', disable_tqdm=False, remove_unused_columns=True, label_names=None, load_best_model_at_end=True, metric_for_best_model='eval_role-F1', greater_is_better=True, ignore_data_skip=False, sharded_ddp=[], deepspeed=None, label_smoothing_factor=0.0, adafactor=False, group_by_length=False, report_to=['tensorboard'], ddp_find_unused_parameters=None, dataloader_pin_memory=True, skip_memory_metrics=False, sortish_sampler=False, predict_with_generate=True, constraint_decoding=True, label_smoothing_sum=False)
05/20/2022 14:30:11 - WARNING - datasets.builder - Using custom data configuration default-1e528a5b4868ef92
05/20/2022 14:30:11 - WARNING - datasets.builder - Reusing dataset json (/home/xiaoli/.cache/huggingface/datasets/json/default-1e528a5b4868ef92/0.0.0/ac0ca5f5289a6cf108e706efcf040422dbbfa8e658dee6a819f20d76bb84d26b)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 465.57it/s]
loading configuration file https://huggingface.co/t5-small/resolve/main/config.json from cache at /home/xiaoli/.cache/huggingface/transformers/fe501e8fd6425b8ec93df37767fcce78ce626e34cc5edc859c662350cf712e41.406701565c0afd9899544c1cb8b93185a76f00b31e5ce7f6e18bbaef02241985
Model config T5Config {
"architectures": [
"T5WithLMHeadModel"
],
"d_ff": 2048,
"d_kv": 64,
"d_model": 512,
"decoder_start_token_id": 0,
"dropout_rate": 0.1,
"eos_token_id": 1,
"feed_forward_proj": "relu",
"initializer_factor": 1.0,
"is_encoder_decoder": true,
"layer_norm_epsilon": 1e-06,
"model_type": "t5",
"n_positions": 512,
"num_decoder_layers": 6,
"num_heads": 8,
"num_layers": 6,
"output_past": true,
"pad_token_id": 0,
"relative_attention_num_buckets": 32,
"task_specific_params": {
"summarization": {
"early_stopping": true,
"length_penalty": 2.0,
"max_length": 200,
"min_length": 30,
"no_repeat_ngram_size": 3,
"num_beams": 4,
"prefix": "summarize: "
},
"translation_en_to_de": {
"early_stopping": true,
"max_length": 300,
"num_beams": 4,
"prefix": "translate English to German: "
},
"translation_en_to_fr": {
"early_stopping": true,
"max_length": 300,
"num_beams": 4,
"prefix": "translate English to French: "
},
"translation_en_to_ro": {
"early_stopping": true,
"max_length": 300,
"num_beams": 4,
"prefix": "translate English to Romanian: "
}
},
"transformers_version": "4.4.2",
"use_cache": true,
"vocab_size": 32128
}
loading configuration file https://huggingface.co/xlm-roberta-base/resolve/main/config.json from cache at /home/xiaoli/.cache/huggingface/transformers/87683eb92ea383b0475fecf99970e950a03c9ff5e51648d6eee56fb754612465.dfaaaedc7c1c475302398f09706cbb21e23951b73c6e2b3162c1c8a99bb3b62a
Model config XLMRobertaConfig {
"architectures": [
"XLMRobertaForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"bos_token_id": 0,
"eos_token_id": 2,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 514,
"model_type": "xlm-roberta",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"output_past": true,
"pad_token_id": 1,
"position_embedding_type": "absolute",
"transformers_version": "4.4.2",
"type_vocab_size": 1,
"use_cache": true,
"vocab_size": 250002
}
loading file https://huggingface.co/xlm-roberta-base/resolve/main/sentencepiece.bpe.model from cache at /home/xiaoli/.cache/huggingface/transformers/9df9ae4442348b73950203b63d1b8ed2d18eba68921872aee0c3a9d05b9673c6.00628a9eeb8baf4080d44a0abe9fe8057893de20c7cb6e6423cddbf452f7d4d8
loading file https://huggingface.co/xlm-roberta-base/resolve/main/tokenizer.json from cache at /home/xiaoli/.cache/huggingface/transformers/daeda8d936162ca65fe6dd158ecce1d8cb56c17d89b78ab86be1558eaef1d76a.a984cf52fc87644bd4a2165f1e07e0ac880272c1e82d648b4674907056912bd7
loading file https://huggingface.co/xlm-roberta-base/resolve/main/added_tokens.json from cache at None
loading file https://huggingface.co/xlm-roberta-base/resolve/main/special_tokens_map.json from cache at None
loading file https://huggingface.co/xlm-roberta-base/resolve/main/tokenizer_config.json from cache at None
Using bos_token, but it is not set yet.
loading weights file https://huggingface.co/t5-small/resolve/main/pytorch_model.bin from cache at /home/xiaoli/.cache/huggingface/transformers/fee5a3a0ae379232608b6eed45d2d7a0d2966b9683728838412caccc41b4b0ed.ddacdc89ec88482db20c676f0861a336f3d0409f94748c209847b49529d73885
All model checkpoint weights were used when initializing T5ForConditionalGeneration.
All the weights of T5ForConditionalGeneration were initialized from the model checkpoint at t5-small.
If your task is similar to the task the model of the checkpoint was trained on, you can already use T5ForConditionalGeneration for predictions without further training.
Assigning ['<extra_id_0>', '<extra_id_1>'] to the additional_special_tokens key of the tokenizer
Using bos_token, but it is not set yet.
Using cls_token, but it is not set yet.
Using mask_token, but it is not set yet.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:03<00:00, 6.06ba/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 5.26ba/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 5.54ba/s]
***** Running training *****
Num examples = 19216
Num Epochs = 30
Instantaneous batch size per device = 16
Total train batch size (w. parallel, distributed & accumulation) = 16
Gradient Accumulation steps = 1
Total optimization steps = 36030
{'loss': 13.1629, 'learning_rate': 2.5e-05, 'epoch': 0.42}
1%|█▌ | 500/36030 [01:53<2:15:04, 4.38it/s]***** Running Evaluation *****
Num examples = 901
Batch size = 64
Traceback (most recent call last):
File "run_seq2seq.py", line 762, in <module>
main()
File "run_seq2seq.py", line 662, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/data/xiaoli/env/conda3/envs/text2event_test/lib/python3.8/site-packages/transformers/trainer.py", line 1105, in train
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch)
File "/data/xiaoli/env/conda3/envs/text2event_test/lib/python3.8/site-packages/transformers/trainer.py", line 1198, in _maybe_log_save_evaluate
metrics = self.evaluate()
File "/data/xiaoli/env/conda3/envs/text2event_test/lib/python3.8/site-packages/transformers/trainer_seq2seq.py", line 74, in evaluate
return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
File "/data/xiaoli/env/conda3/envs/text2event_test/lib/python3.8/site-packages/transformers/trainer.py", line 1667, in evaluate
output = self.prediction_loop(
File "/data/xiaoli/env/conda3/envs/text2event_test/lib/python3.8/site-packages/transformers/trainer.py", line 1805, in prediction_loop
loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
File "/data/xiaoli/Text2Event_test/Text2Event-main/seq2seq/constrained_seq2seq.py", line 158, in prediction_step
generated_tokens = self.model.generate(
File "/data/xiaoli/env/conda3/envs/text2event_test/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "/data/xiaoli/env/conda3/envs/text2event_test/lib/python3.8/site-packages/transformers/generation_utils.py", line 982, in generate
return self.greedy_search(
File "/data/xiaoli/env/conda3/envs/text2event_test/lib/python3.8/site-packages/transformers/generation_utils.py", line 1288, in greedy_search
next_tokens_scores = logits_processor(input_ids, next_token_logits)
File "/data/xiaoli/env/conda3/envs/text2event_test/lib/python3.8/site-packages/transformers/generation_logits_process.py", line 89, in __call__
scores = processor(input_ids, scores)
File "/data/xiaoli/env/conda3/envs/text2event_test/lib/python3.8/site-packages/transformers/generation_logits_process.py", line 460, in __call__
mask[batch_id * self._num_beams + beam_id, self._prefix_allowed_tokens_fn(batch_id, sent)] = 0
File "/data/xiaoli/Text2Event_test/Text2Event-main/seq2seq/constrained_seq2seq.py", line 137, in prefix_allowed_tokens_fn
return self.constraint_decoder.constraint_decoding(src_sentence=src_sentence,
File "/data/xiaoli/Text2Event_test/Text2Event-main/extraction/extract_constraint.py", line 90, in constraint_decoding
valid_token_ids = self.get_state_valid_tokens(
File "/data/xiaoli/Text2Event_test/Text2Event-main/extraction/extract_constraint.py", line 198, in get_state_valid_tokens
state, index = self.check_state(tgt_generated)
File "/data/xiaoli/Text2Event_test/Text2Event-main/extraction/extract_constraint.py", line 125, in check_state
last_special_index, last_special_token = special_index_token[-1]
IndexError: list index out of range
1%|█▌ | 500/36030 [01:53<2:14:07, 4.41it/s]
luyaojie commented
应该是因为受限解码出现的问题,受限解码需要专门针对Tokenizer写。
可以加上 --wo_constraint_decoding
不使用受限解码即可。
rela0426 commented
应该是因为受限解码出现的问题,受限解码需要专门针对Tokenizer写。
可以加上
--wo_constraint_decoding
不使用受限解码即可。
按照您的建议,完美解决了,谢谢~