finetune gpt2 with e2e dataset

Question

finetune gpt2 with e2e dataset

Closed this issue 2 years ago · 3 comments

Hello,

I have tried to use the command to finetune gpt2-medium with e2e dataset, but got some errors.
Could you please give me an example to train the model with TextBox?

python run_textbox.py --model=GPT2 --dataset=e2e --model_path=./PTMs/gpt2-medium/

When do generating after training one epoch,

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

After generating,

26 Nov 01:02    ERROR Traceback (most recent call last):
  File "/home/hy/TextBox/textbox/utils/dashboard.py", line 320, in new_experiment
    yield True
  File "/home/hy/TextBox/textbox/quick_start/experiment.py", line 130, in run
    self._do_train_and_valid()
  File "/home/hy/TextBox/textbox/quick_start/experiment.py", line 105, in _do_train_and_valid
    self.valid_result = self.trainer.fit(train_data, valid_data)
  File "/home/hy/TextBox/textbox/trainer/trainer.py", line 453, in fit
    self.stopped |= self._valid(valid_data, 'epoch')
  File "/home/hy/miniconda3/envs/textbox/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/hy/TextBox/textbox/trainer/trainer.py", line 297, in _valid
    valid_results = self.evaluate(valid_data, is_valid=True)
  File "/home/hy/miniconda3/envs/textbox/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/hy/TextBox/textbox/trainer/trainer.py", line 571, in evaluate
    result = self.evaluator.evaluate(generate_corpus, reference_dataset)
  File "/home/hy/TextBox/textbox/evaluator/base_evaluator.py", line 151, in evaluate
    metric_result = evaluator.evaluate(generate_corpus, reference_corpus, avg=avg)
  File "/home/hy/TextBox/textbox/evaluator/abstract_evaluator.py", line 31, in evaluate
    metric_dict = self._calc_metrics_info(generate_corpus=generate_corpus, reference_corpus=reference_corpus)
  File "/home/hy/TextBox/textbox/evaluator/bleu_evaluator.py", line 92, in _calc_metrics_info
    reference_corpus = list(zip_longest(*reference_corpus))
TypeError: type object argument after * must be an iterable, not Corpus

Answer 1 · 2022-11-27T06:23:09.000Z

We have fixed this bug. Thanks for your reporting!

You can pull the latest respository and run the command you mentioned above.

Answer 2 · 2022-11-28T04:32:35.000Z

Hello,
I don't see a commit or update related to this issue. The latest PR is

Merge pull request #289 from huyiwen/2.0.0
Fix: delete useless import

Answer 3 · 2022-11-28T05:55:52.000Z

Sorry, we forgot to merge the pr. You can try it now.