IsakZhang/ABSA-QUAD

Address issues that arise during the training of the repository.

kisejin opened this issue · 4 comments

Hello, thanks for sharing your code.

Currently, when I run this repository, I encounter some issues, and I have provided solutions that seem to work well. I am sharing this with everyone in the hope of receiving approval from the author.

  1. silent in read_line_examples_from_file isn't set default:
    Solve:
def read_line_examples_from_file(data_path, silence=False):
    .....
  1. Can't assign hparams in model T5FineTunerwith new version lightning >= 2.0.0:
    Solve:
self.hparams.update(vars(hparams))
  1. New version lightning isn't support training_epoch_end and training_epoch_end, so adding prefix on_ before name of these functions

  2. Lightning has integrated the gradient step in the 20 hooks, so I commented out the optimizer_step function because I observed issues with the optimizer closure."

  3. Due to the internal separation of train and validation data within the model, it is not possible to assign output parameters to on_validation_epoch_end. Therefore, I removed it and replaced it with the following code snippet:

 class MyLightningModule(L.LightningModule):
    def __init__(self):
        super().__init__()
        self.validation_step_outputs = []

     def validation_step(self, ...):
         loss = ...
         self.validation_step_outputs.append(loss)
         return loss

    def on_validation_epoch_end(self):
        epoch_average = torch.stack(self.validation_step_outputs).mean()
        self.log("validation_epoch_average", epoch_average)
        self.validation_step_outputs.clear()  # free memory
  1. Param gpus does not exist in the Lightning Trainer; instead, add the device parameter with the value 'auto' to automatically detect the available GPUs, and set accelerator='gpu'."
    Solve:
    train_params = dict(
        default_root_dir=args.output_dir,
        accumulate_grad_batches=args.gradient_accumulation_steps,
        devices='auto',
        gradient_clip_val=1.0,
        max_epochs=args.num_train_epochs,
        callbacks=[LoggingCallback()],
        accelerator='gpu'
    )

These are the solutions I have referenced online. If there are any errors, please overlook them and feel free to provide additional feedback

@kisejin That's nicely said.

I would just like to mention one more point

    def optimizer_step(self, epoch, batch_idx, optimizer, optimizer_idx, second_order_closure=None):
        if self.trainer.use_tpu:
            xm.optimizer_step(optimizer)
        else:
          optimizer.step()
          optimizer.zero_grad()
          self.lr_scheduler.step()

The use_tpu is causing an issue as well, so i commented it

   def optimizer_step(self, epoch, batch_idx, optimizer, optimizer_idx, second_order_closure=None):
       optimizer.step()
       optimizer.zero_grad()
       self.lr_scheduler.step()

@kisejin Thanks for sharing this with us.

However, I am not reproducing the performance of the paper.

Based on the res15 dataset, 200 runs,
I have seen performance between 0.42XX and 0.46XX.

Is anyone else seeing a significant performance difference?

@ssoyaavv are you testing this on a CPU device?

@LawrenceMoruye I used a GPU for testing.

image