hmorimitsu/ptlflow

TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'

coufin opened this issue · 35 comments

TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'

/python3.8/site-packages/ptlflow/utils/callbacks/logger.py", line 411, in _compute_max_range
max_range = int(limit_batches * dataloader_length) - 1
TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'

Hi, could you tell me which version of pytorch and pytorch-lightning you are using?

torch version: '1.12.0+cu102'
lightning version: 1.6.4

Another problem is when I try to train this
python3 train.py pwcnet --train_dataset kitti-training --val_dataset none --train_batch_size 1 --train_crop_size 512 128 --max_epochs 100 --lr 1e-3
the error is
/.local/lib/python3.10/site-packages/ptlflow/data/datasets.py", line 911, in init
assert len(img1_paths) == len(flow_paths), f'{len(img1_paths)} vs {len(flow_paths)}'
AssertionError: 0 vs 200

Another problem is when I try to train this python3 train.py pwcnet --train_dataset kitti-training --val_dataset none --train_batch_size 1 --train_crop_size 512 128 --max_epochs 100 --lr 1e-3 the error is /.local/lib/python3.10/site-packages/ptlflow/data/datasets.py", line 911, in init assert len(img1_paths) == len(flow_paths), f'{len(img1_paths)} vs {len(flow_paths)}' AssertionError: 0 vs 200

This means that the images are not found. You should check if the KITTI images are in the correct directories. For KITTI 2012, they should be at: <kitti_2012_root_dir>/training/colored_0/, and for KITTI 2015 it's: <kitti_2015_root_dir>/training/image_2/. The image names should also follow the KITTI standard, which is *_10.png and *_11.png.

/python3.8/site-packages/ptlflow/utils/callbacks/logger.py", line 411, in _compute_max_range max_range = int(limit_batches * dataloader_length) - 1 TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'

Thank you for reporting this problem.

It was caused by a change in the default value in the arguments of pytorch-lightning. It is now fixed in the main branch.

To fix your code, please update to the latest version of this repo or downgrade your pytorch-lightning to 1.5.

I want to know how to solve this error
/.local/lib/python3.8/site-packages/ptlflow/models/base_model/base_model.py", line 337, in configure_optimizers
assert self.loss_fn is not None, f'Model {self.class.name} cannot be trained. It does not have loss function.'
AssertionError: Model LiteFlowNet cannot be trained. It does not have loss function.

This is currently a limitation of the code. Not all models can be trained.

In this case, this model does not have a loss function yet. This is because the original code was not in PyTorch and I did not convert the training part to my library. Making sure that the training is correct takes a long time, so I only added training when the original code also provided it in PyTorch.

Also, please note that I only tested the training using RAFT, and by default, the training hyperparameters also follow RAFT. So I do not guarantee that training other models will provide similar results to the original ones.

And how to solve this
/.local/lib/python3.10/site-packages/torch/utils/data/sampler.py", line 107, in init
raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0

It is working. Thank you

How can I calculate F1-score for the model?

F1 is returned as a metric when you use a model that predicts either: confidence, occlusion, or motion boundaries.

You can see the available metrics at: https://github.com/hmorimitsu/ptlflow/blob/main/ptlflow/utils/flow_metrics.py

How can I see the metrics after training a model?

The metrics are logged by pytorch-lightning, which by default is using tensorboard for visualization.

If you want to see them in another way, you should change the base_model and maybe the train scripts.

Hello, I want to know what do epe and px1, px3 stand for in results.

EPE (end-point-error): the average Euclidean distance between the prediction and the groundtruth
px1, px3: percentage of predictions within 1 (or 3) pixels away from the groundtruth

In Tensorboard I only see epe, outlier, but no F1, where can I see it?

Are you using a model and a dataset that has confidence, occlusion, or motion boundaries?

I use the model: PWCNet
and dataset is KITTI-2012

Do you really mean F1 or FL (the metric from KITTI 2015)?

If FL, then it is the same as the outlier metric.

If you really want F1, then it is not available for PWCNet, as this model does not output confidence, occlusion, or motion boundaries predictions.

OK, but models raft and Flownets also can't see F1. Are they also not available?
And what is FL mean for models?

Very few models have predictions compatible with F1. At the moment, there is not any list of models which support it, the only way is to look at their code or read the respective papers.

For FL, I recommend you check the outlier results for the KITTI 2015 dataset.

OK tahnk you.

when run craft I have this error
RuntimeError: CUDA out of memory. Tried to allocate 286.00 MiB (GPU 0; 7.79 GiB total capacity; 6.15 GiB already allocated; 69.25 MiB free; 6.29 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.

This means CRAFT is too large for the GPU you are using. You either have to use smaller images, decrease the batch size, or run in a GPU with more memory.

I want to know where can I see the metrics of the test of these model?
I can only see metrics like epe, px1 of train in tensorboard

When I train model dicl: The error is this
RuntimeError: The expanded size of the tensor (1) must match the existing size (0) at non-singleton dimension 3. Target sizes: [1, 32, 5, 1]. Tensor sizes: [32, 5, 0]

I want to know where can I see the metrics of the test of these model?
I can only see metrics like epe, px1 of train in tensorboard

By default, validation is run at the end of every epoch of training. So you should see the validation metrics as well.

Did you configure the datasets.yml file to point to the validation datasets?

When I train model dicl: The error is this
RuntimeError: The expanded size of the tensor (1) must match the existing size (0) at non-singleton dimension 3. Target sizes: [1, 32, 5, 1]. Tensor sizes: [32, 5, 0]

Thanks for reporting. I'll take a look later to see what's wrong.

Can you tell me the difference between epe and loss?

EPE is just a value (the Euclidean distance), while the loss is the signal that drives the training. EPE could be used as the loss signal, but so could other values as well.

I want to know what is the signal of loss in the models, and why if epe used as loss signal why they are different?

If I want the x-axis change to epoch from batch in tensorboard, What should I do?

Sorry, but I'm not sure. The logging is handled by Pytorch Lightning, so you can try to see their docs to find how to change it.