stfwn/mscai-dl2

Bug related to drawing from `Beta`

Closed this issue · 3 comments

stfwn commented
Traceback (most recent call last):
  File "/home/stefan/university/deep-learning-2/project/src/train.py", line 89, in <module>
    main()
  File "/home/stefan/university/deep-learning-2/project/src/train.py", line 84, in main
    trainer.fit(model, datamodule=datamodule)
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 768, in fit
    self._call_and_handle_interrupt(
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 721, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1234, in _run
    results = self._run_stage()
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1321, in _run_stage
    return self._run_train()
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1351, in _run_train
    self.fit_loop.run()
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 269, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 208, in advance
    batch_output = self.batch_loop.run(batch, batch_idx)
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance
    outputs = self.optimizer_loop.run(split_batch, optimizers, batch_idx)
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 203, in advance
    result = self._run_optimization(
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 256, in _run_optimization
    self._optimizer_step(optimizer, opt_idx, batch_idx, closure)
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 369, in _optimizer_step
    self.trainer._call_lightning_module_hook(
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1593, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/core/lightning.py", line 1625, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/core/optimizer.py", line 168, in step
    step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 193, in optimizer_step
    return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 155, in optimizer_step
    return optimizer.step(closure=closure, **kwargs)
  File "/home/stefan/.local/lib/python3.10/site-packages/torch/optim/optimizer.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/home/stefan/.local/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/stefan/.local/lib/python3.10/site-packages/torch/optim/adam.py", line 100, in step
    loss = closure()
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 140, in _wrap_closure
    closure_result = closure()
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 148, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 134, in closure
    step_output = self._step_fn()
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 427, in _training_step
    training_step_output = self.trainer._call_strategy_hook("training_step", *step_kwargs.values())
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1763, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/home/stefan/.local/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 333, in training_step
    return self.model.training_step(*args, **kwargs)
  File "/home/stefan/university/deep-learning-2/project/src/models.py", line 341, in training_step
    preds, p, halted_at, lambdas, (alphas, betas) = self(x)
  File "/home/stefan/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/stefan/university/deep-learning-2/project/src/models.py", line 292, in forward
    y_hat_n, state, lambda_n, beta_params = self.step_function(x, state)
  File "/home/stefan/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/stefan/university/deep-learning-2/project/src/models.py", line 775, in forward
    distribution = dist_beta.Beta(alpha, beta)
  File "/home/stefan/.local/lib/python3.10/site-packages/torch/distributions/beta.py", line 36, in __init__
    self._dirichlet = Dirichlet(concentration1_concentration0, validate_args=validate_args)
  File "/home/stefan/.local/lib/python3.10/site-packages/torch/distributions/dirichlet.py", line 52, in __init__
    super(Dirichlet, self).__init__(batch_shape, event_shape, validate_args=validate_args)
  File "/home/stefan/.local/lib/python3.10/site-packages/torch/distributions/distribution.py", line 55, in __init__
    raise ValueError(
ValueError: Expected parameter concentration (Tensor of shape (32, 2)) of distribution Dirichlet(concentration: torch.Size([32, 2])) to satisfy the constraint IndependentConstraint(GreaterThan(lower_bound=0.0), 1), but found invalid values:
tensor([[nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan]], device='cuda:0', grad_fn=<StackBackward0>)

This happens when the gradients explode in my experience. To fix this we could try implementing gradient clipping, however it doesn't occur very often so I don't know if it's worth the effort right now.

stfwn commented

Yeah makes sense.

Gradient clipping is just one param to the Trainer (e.g. gradient_clip_val=0.5) but it's hard to know what to set it to.

Also maybe we should change the title for professionality's sake