dotchen/LearningByCheating

Unable to benchmark pre-trained image model: freezing on import

rohanb2018 opened this issue ยท 14 comments

Hello, thanks for providing the code for your paper!

I have been unable to run either the benchmark_agent.py code (with the pre-trained model-10.th), or the data_collector.py script. Both scripts seem to be freezing at some point in the import process, which I haven't been able to resolve. They don't throw errors (or print anything besides the pygame message) but seem to hang indefinitely.

I had to make a couple of modifications to the installed dependencies, because it seems that the RTX GPU family is not compatible with CUDA 9.0 or below (or with any packages that are built with CUDA 9.0 or below), as per pytorch/pytorch#17543, and because I ran into CUDA warnings and CUDNN errors when using the original dependencies (that were built with CUDA 8.0)

The relevant versions of dependencies on my system are as follows:

Package Version
cudatoolkit 10.1.243
pytorch py3.5_cuda10.1.243_cudnn7.6.3_0
cudnn 7.6.5

If anyone has guidance on how to debug/resolve this issue, I'd really appreciate it. Thanks so much!


My set-up:
GPU: GeForce RTX 2080 with Max-Q
CUDA version: 10.1

Hello, thanks for providing the code for your paper!

I have been unable to run either the benchmark_agent.py code (with the pre-trained model-10.th), or the data_collector.py script. Both scripts seem to be freezing at some point in the import process, which I haven't been able to resolve. They don't throw errors (or print anything besides the pygame message) but seem to hang indefinitely.

I had to make a couple of modifications to the installed dependencies, because it seems that the RTX GPU family is not compatible with CUDA 9.0 or below (or with any packages that are built with CUDA 9.0 or below), as per pytorch/pytorch#17543, and because I ran into CUDA warnings and CUDNN errors when using the original dependencies (that were built with CUDA 8.0)

The relevant versions of dependencies on my system are as follows:

Package Version
cudatoolkit 10.1.243
pytorch py3.5_cuda10.1.243_cudnn7.6.3_0
cudnn 7.6.5
If anyone has guidance on how to debug/resolve this issue, I'd really appreciate it. Thanks so much!

My set-up:
GPU: GeForce RTX 2080 with Max-Q
CUDA version: 10.1

Hey @rohanb2018 ,
You are right about the incompatibility of CUDA 8.0 and RTX 2080.

Downgrading my dependencies to the following versions fixed the problem while I was testing it out.

cudatoolkit 10.0.130
cudnn 7.6.0
pytorch_1.0.0 py3.5_cuda10.0.130_cudnn7.4.1_1

Hope this helps

Thanks for your interest in our project!

If data_collector.py hangs it probably suggests issues other than pytorch installations (not 100% though). Could you paste the messages you got? Also, make sure you installed our .egg file and that the portal number matches the CARLA instance you launched, and try changing the order in which carla/pytorch related stuff is imported.

Thanks for the response, guys!

@raks097: Thanks for the suggestion, unfortunately when trying to conda install the suggested PyTorch version you suggested, I ran into a conda UnsatisfiableError with a long list of incompatible specifications. It's weird because all of the dependencies (and their compatible versions) for that version of PyTorch seem to be present in the environment, but conda still complains.

@dianchen96: Sure! The only console output I get from running either the agent benchmark script or the data collector script is just the pygame message:

pygame 1.9.4
Hello from the pygame community. https://www.pygame.org/contribute.html

In both cases I have to kill the running Python process because it doesn't respond to the usual keyboard interrupt.

I did confirm that I have your CARLA .egg file - just to make sure, I re-downloaded and installed it and the freezing issue persists. Also checked the port number and it seems fine.

I'll try playing around with the import order and see if that helps.

@rohanb2018

could be related to carla-simulator/carla#2132

Yeah this might be it! I tried import carla followed by import torchvision and it freezes up, also requiring me to kill the process like I had to do for the agent benchmark code. I guess I have to move the torchvision import to be earlier than the first instance of wherever carla is imported - will try it and see.

one way I got around this previously was just to remove the torchvision import,
since torchvision is only used for the ToTensor transform, and replace all the ToTensor with the following

myToTensor = lambda x: (torch.FloatTensor(x) / 255.0).transpose(0, 1).transpose(0, 2).contiguous()

which converts a numpy uint8 to a FloatTensor (taken from https://pytorch.org/docs/stable/_modules/torchvision/transforms/functional.html#to_tensor)

Great, I think I actually got the benchmark_agent.py example working now!

I ended up commenting out all of the torchvision imports inside the bird_view folder. This included replacing all of the ToTensor calls with the code that you mentioned. Additionally, I had to comment out instances of torchvision.utils inside the logger.py and saver.py files. I replaced the calls to tv_utils.make_grid in both of those files by just copying the source code from PyTorch (https://pytorch.org/docs/stable/_modules/torchvision/utils.html#make_grid).

I assume I'll have to do the same to the files inside the training folder since I see some references to torchvision there as well.

Thanks again for your help! Will let you know if I run into any other issues.

note - another easier way to get around this is to just find the first instance of

import carla

and simply add import torchvision
right before that

many libraries have this problem due to the way pytorch is compiled,
see pytorch/pytorch#19739 (comment)

Hi @bradyz , thanks for sharing the code! When I tried to run benchmark_agent.py, it freezes on import torchvision inside bird_view/utils/carla_utils.py. After I comment out the line, the code is able to run until it again freezes on import torch in bird_view/utils/bz_utils/saver.py. Do you have any insight on how to fix this?

Besides, according to carla-simulator/carla#2132 (comment), this issue might have been fixed with CARLA 0.9.9. Is the latest version by any chance a viable option?

I am testing the latest version (031308).

Update: benchmark_agent.py no longer freezes with CARLA 0.9.9.4 but will run into the following errors:

pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
suite: FullTown02-v1
before run_benchmark
  0%|                                                                                                                         | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "benchmark_agent.py", line 69, in run
    run_benchmark(agent_maker, env, benchmark_dir, seed, autopilot, resume, max_run=max_run, show=show)
  File "/home/peiyunh/code/lbc/benchmark/run_benchmark.py", line 243, in run_benchmark
    result, diagnostics = run_single(env, weather, start, target, agent_maker, seed, autopilot, show=show)
  File "/home/peiyunh/code/lbc/benchmark/run_benchmark.py", line 174, in run_single
    env.init(start=start, target=target, weather=cu.PRESET_WEATHERS[weather])
  File "/home/peiyunh/code/lbc/benchmark/goal_suite.py", line 44, in init
    super().init(**kwargs)
  File "/home/peiyunh/code/lbc/bird_view/utils/carla_utils.py", line 497, in init
    self.spawn_player()
  File "/home/peiyunh/code/lbc/bird_view/utils/carla_utils.py", line 528, in spawn_player
    self._player.start_dtcrowd()
AttributeError: 'Vehicle' object has no attribute 'start_dtcrowd'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "benchmark_agent.py", line 94, in <module>
    run(Path(args.model_path), args.port, args.suite, args.big_cam, args.seed, args.autopilot, args.resume, max_run=args.max_run, show=args.show)
  File "benchmark_agent.py", line 69, in run
    run_benchmark(agent_maker, env, benchmark_dir, seed, autopilot, resume, max_run=max_run, show=show)
  File "/home/peiyunh/code/lbc/bird_view/utils/carla_utils.py", line 737, in __exit__
    self.clean_up()
  File "/home/peiyunh/code/lbc/benchmark/goal_suite.py", line 86, in clean_up
    super().clean_up()
  File "/home/peiyunh/code/lbc/bird_view/utils/carla_utils.py", line 626, in clean_up
    self._player.stop_dtcrowd()
AttributeError: 'Vehicle' object has no attribute 'stop_dtcrowd'

At the very very top of benchmark_agent can you try

import torch
import torchvision

@peiyunh start_dtcrowd/stop_dtcrowd only come with our custom CARLA 0.9.6 egg for the pedestrian fix. If you would like to use this repo with CARLA 0.9.9 you need to modify some of the utilities code.

Thanks for the replies @bradyz @dianchen96 !

At the very very top of benchmark_agent can you try

import torch
import torchvision

This works! I am able to run benchmark_agent now. Thanks a lot!

Thanks for the replies @bradyz @dianchen96 !

At the very very top of benchmark_agent can you try
import torch
import torchvision

This works! I am able to run benchmark_agent now. Thanks a lot!

Hi, I met the same problem. How did you fix it? I add
import torch
import torchvision
at the first line of benchmark_agent.py but the problem still exists.
Thanks!

At the very very top of benchmark_agent can you try

import torch import torchvision

From all the quotes here, this was the best one
I tried it even with coda 12.5 at it really worked!
Thanks