pytorchbearer/torchbearer

ReduceLROnPlateau

FrancescoSaverioZuppichini opened this issue · 5 comments

Dear all,

first of all, I love this library.

The ReduceLROnPlateau is not working when I call trail.evaluate.

...
    trial = Trial(cnn, optimizer, loss, metrics=['acc', 'loss'],
                  callbacks=[
                    #   CometCallback(experiment),
                      ReduceLROnPlateau(monitor='val_loss',
                                        factor=0.1, patience=5),
                    #   EarlyStopping(monitor='val_acc', patience=5, mode='max'),
                    #   CSVLogger('history.csv'),
                    #   ModelCheckpoint('best.pt', monitor='val_acc', mode='max')
    ]).to(device)
    trial.with_generators(train_generator=train_dl,
                          val_generator=val_dl, test_generator=test_dl)
    # history = trial.run(params['epochs'], verbose=1)
    preds = trial.evaluate(data_key=torchbearer.TEST_DATA)

error:

0/1(e): 100%|███████████████████████████████████| 1/1 [00:00<00:00,  2.18it/s, test_acc=0.4667, test_loss=0.6516]
Traceback (most recent call last):
  File "c:/Users/Francesco/Documents/PyTorch-Deep-Learning-Template/main.py", line 62, in <module>
    preds = trial.evaluate(data_key=torchbearer.TEST_DATA)
  File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\trial.py", line 298, in wrapper
    res = func(self, *args, **kwargs)
  File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\trial.py", line 133, in wrapper
    res = func(self, *args, **kwargs)
  File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\trial.py", line 1131, in evaluate
    state[torchbearer.CALLBACK_LIST].on_end_epoch(state)
  File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\callbacks\callbacks.py", line 221, in on_end_epoch
    self._for_list(lambda callback: callback.on_end_epoch(state))
  File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\trial.py", line 105, in _for_list
    function(self.callback_list)
  File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\callbacks\callbacks.py", line 221, in <lambda>
    self._for_list(lambda callback: callback.on_end_epoch(state))
  File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\callbacks\callbacks.py", line 221, in on_end_epoch
    self._for_list(lambda callback: callback.on_end_epoch(state))
  File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\callbacks\callbacks.py", line 66, in _for_list
    function(callback)
  File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\callbacks\callbacks.py", line 221, in <lambda>
    self._for_list(lambda callback: callback.on_end_epoch(state))
  File "C:\Users\Francesco\Anaconda3\envs\dl\lib\site-packages\torchbearer\callbacks\torch_scheduler.py", line 32, in on_end_epoch
    self._scheduler.step(state[torchbearer.METRICS][self._monitor], epoch=state[torchbearer.EPOCH])
KeyError: 'val_loss'

Probably you need to disable the callback when evaluating (or just checking if the monitored metrics is in state['metrics'].

Thank you!

Best Regards,

Francesco Saverio

Hi Francesco,

Thanks for the issue.

I think it's probably best for now to just check if the key we're monitoring is in the metrics dictionary and then throw a warning if it isn't. This does mean that misspelling the monitor key on initialization (or similar bugs) would just throw a warning and effectively turn off the scheduler, but I suppose this is okay. If we work out a better solution then we can come back to it in the future.

We could check what is under torchbearer.DATA and just turn it off when looking at test data. I'd rather not do this though since it reduces flexibility, it might be that someone wants to store a second validation set under test data or something like that.

@ethanwharris what do you think?

Thank you for the fast reply. I also think that the best solution is to check if the monitor key is in the state['metrics']. I am surprised to see that nobody else had this issue.

@MattPainter01 any news? Should I fork and create a pull request? Thank you

@MattPainter01
Hi Matt, I have tried to uninstall and install again torchbearer but I still have the same error, maybe you need to push a new release to make the update available?

Thanks :)

You're right, we haven't done a new release in a while.
We'll try and push one today.