RUCAIBox/TextBox

finetune gpt2 with e2e dataset

Closed this issue · 3 comments

heya5 commented

Hello,

I have tried to use the command to finetune gpt2-medium with e2e dataset, but got some errors.
Could you please give me an example to train the model with TextBox?

python run_textbox.py --model=GPT2 --dataset=e2e --model_path=./PTMs/gpt2-medium/

When do generating after training one epoch,

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

After generating,

26 Nov 01:02    ERROR Traceback (most recent call last):
  File "/home/hy/TextBox/textbox/utils/dashboard.py", line 320, in new_experiment
    yield True
  File "/home/hy/TextBox/textbox/quick_start/experiment.py", line 130, in run
    self._do_train_and_valid()
  File "/home/hy/TextBox/textbox/quick_start/experiment.py", line 105, in _do_train_and_valid
    self.valid_result = self.trainer.fit(train_data, valid_data)
  File "/home/hy/TextBox/textbox/trainer/trainer.py", line 453, in fit
    self.stopped |= self._valid(valid_data, 'epoch')
  File "/home/hy/miniconda3/envs/textbox/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/hy/TextBox/textbox/trainer/trainer.py", line 297, in _valid
    valid_results = self.evaluate(valid_data, is_valid=True)
  File "/home/hy/miniconda3/envs/textbox/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/hy/TextBox/textbox/trainer/trainer.py", line 571, in evaluate
    result = self.evaluator.evaluate(generate_corpus, reference_dataset)
  File "/home/hy/TextBox/textbox/evaluator/base_evaluator.py", line 151, in evaluate
    metric_result = evaluator.evaluate(generate_corpus, reference_corpus, avg=avg)
  File "/home/hy/TextBox/textbox/evaluator/abstract_evaluator.py", line 31, in evaluate
    metric_dict = self._calc_metrics_info(generate_corpus=generate_corpus, reference_corpus=reference_corpus)
  File "/home/hy/TextBox/textbox/evaluator/bleu_evaluator.py", line 92, in _calc_metrics_info
    reference_corpus = list(zip_longest(*reference_corpus))
TypeError: type object argument after * must be an iterable, not Corpus

We have fixed this bug. Thanks for your reporting!

You can pull the latest respository and run the command you mentioned above.

heya5 commented

Hello,
I don't see a commit or update related to this issue. The latest PR is

Merge pull request #289 from huyiwen/2.0.0
Fix: delete useless import

Sorry, we forgot to merge the pr. You can try it now.