error in train with ptlflow_demo_train.ipynb

Question

error in train with ptlflow_demo_train.ipynb

Cyrus1993 opened this issue 3 years ago · 2 comments

Cyrus1993 commented 3 years ago

Hi, When I run this code in Colab. I get the following error. Please advise. I did not change anything.

!python train.py raft_small \ --gpus 1 \ --train_dataset overfit-sintel \ --val_dataset none \ --train_batch_size 1 \ --max_epochs 100 \ --lr 1e-3

ERROR: torch_scatter not found. CSV requires torch_scatter library to run. Check instructions at: https://github.com/rusty1s/pytorch_scatter
Global seed set to 1234
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
01/02/2022 08:49:01 - WARNING: --train_crop_size is not set. It will be set as (432, 1024).
01/02/2022 08:49:01 - INFO: Loading 1 samples from Sintel_clean dataset.
Traceback (most recent call last):
File "train.py", line 152, in
train(args)
File "train.py", line 111, in train
trainer.fit(model)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 732, in fit
self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 682, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 768, in _fit_impl
results = self._run(model, ckpt_path=ckpt_path)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 1155, in _run
self.strategy.setup(self)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/strategies/single_device.py", line 76, in setup
super().setup(trainer)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/strategies/strategy.py", line 118, in setup
self.setup_optimizers(trainer)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/strategies/strategy.py", line 108, in setup_optimizers
self.lightning_module
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/core/optimizer.py", line 174, in _init_optimizers_and_lr_schedulers
optim_conf = model.trainer._call_lightning_module_hook("configure_optimizers", pl_module=model)
File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 1535, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/ptlflow/models/base_model/base_model.py", line 358, in configure_optimizers
optimizer, self.args.lr, total_steps=self.args.max_steps, pct_start=0.05, cycle_momentum=False, anneal_strategy='linear')
File "/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py", line 1452, in init
raise ValueError("Expected positive integer total_steps, but got {}".format(total_steps))
ValueError: Expected positive integer total_steps, but got -1

Answer 1 · 2022-01-03T03:47:49.000Z

Hi, thank you for reporting this error.

This was caused by a change in the default value of max_steps in newer versions of PyTorch Lightning. The code in this repo was already updated, but the version in PyPI was not.

I just pushed a new version to PyPI, and the colab example should work now. If it doesn't, please check if the ptlflow version being installed is the 0.2.5. If it isn't, you can try to force to install the new version with pip install --upgrade ptlflow.

I hope that helps.

Answer 2 · 2022-01-03T07:34:05.000Z

Hi. excellent. It works exactly right.
Thank you so much for developing this awesome repository.