OpenDriveLab/Openpilot-Deepdive

training error

Closed this issue · 1 comments

Hi,

I was training with comma2k19 with two A6000 GPU cards in a PC with CUDA 11.5, Ubuntu 20.04, with two terminals running each

PORT=23345 SLURM_PROCID=0 SLURM_NTASKS=2 python main.py
PORT=23346 SLURM_PROCID=1 SLURM_NTASKS=2 python main.py

I got below error from the first terminal after started. I also tried with one GPU card but it also gave same error. How can I solve this? Thanks.

[1676912307.07] starting job... 0 of 2
[1676912608.11] DDP Initialized at localhost:23345 0 of 2
2023-02-20 09:03:28.404838: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Comma2k19SequenceDataset: DEMO mode is on.
Traceback (most recent call last):
File "main.py", line 246, in
main(rank=int(os.environ['SLURM_PROCID']), world_size=int(os.environ['SLURM_NTASKS']), args=args)
File "main.py", line 119, in main
train_dataloader, val_dataloader = get_dataloader(rank, world_size, args.batch_size, False, args.n_workers)
File "main.py", line 69, in get_dataloader
train_sampler = DistributedSampler(train, **dist_sampler_params)
TypeError: init() got an unexpected keyword argument 'drop_last'

I suspect You are using a very old version of pytorch.

As the error message has said, drop_last is just a parameter of dataloader. If you don't want to upgrade the version of torch, you can simply remove this parameter.

https://pytorch.org/docs/stable/data.html