finetune gpt2 with e2e dataset
Closed this issue · 3 comments
heya5 commented
Hello,
I have tried to use the command to finetune gpt2-medium with e2e dataset, but got some errors.
Could you please give me an example to train the model with TextBox?
python run_textbox.py --model=GPT2 --dataset=e2e --model_path=./PTMs/gpt2-medium/
When do generating after training one epoch,
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
After generating,
26 Nov 01:02 ERROR Traceback (most recent call last):
File "/home/hy/TextBox/textbox/utils/dashboard.py", line 320, in new_experiment
yield True
File "/home/hy/TextBox/textbox/quick_start/experiment.py", line 130, in run
self._do_train_and_valid()
File "/home/hy/TextBox/textbox/quick_start/experiment.py", line 105, in _do_train_and_valid
self.valid_result = self.trainer.fit(train_data, valid_data)
File "/home/hy/TextBox/textbox/trainer/trainer.py", line 453, in fit
self.stopped |= self._valid(valid_data, 'epoch')
File "/home/hy/miniconda3/envs/textbox/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/hy/TextBox/textbox/trainer/trainer.py", line 297, in _valid
valid_results = self.evaluate(valid_data, is_valid=True)
File "/home/hy/miniconda3/envs/textbox/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/hy/TextBox/textbox/trainer/trainer.py", line 571, in evaluate
result = self.evaluator.evaluate(generate_corpus, reference_dataset)
File "/home/hy/TextBox/textbox/evaluator/base_evaluator.py", line 151, in evaluate
metric_result = evaluator.evaluate(generate_corpus, reference_corpus, avg=avg)
File "/home/hy/TextBox/textbox/evaluator/abstract_evaluator.py", line 31, in evaluate
metric_dict = self._calc_metrics_info(generate_corpus=generate_corpus, reference_corpus=reference_corpus)
File "/home/hy/TextBox/textbox/evaluator/bleu_evaluator.py", line 92, in _calc_metrics_info
reference_corpus = list(zip_longest(*reference_corpus))
TypeError: type object argument after * must be an iterable, not Corpus
StevenTang1998 commented
We have fixed this bug. Thanks for your reporting!
You can pull the latest respository and run the command you mentioned above.
heya5 commented
Hello,
I don't see a commit or update related to this issue. The latest PR is
Merge pull request #289 from huyiwen/2.0.0
Fix: delete useless import
StevenTang1998 commented
Sorry, we forgot to merge the pr. You can try it now.