Unable to benchmark pre-trained image model: freezing on import
rohanb2018 opened this issue ยท 14 comments
Hello, thanks for providing the code for your paper!
I have been unable to run either the benchmark_agent.py
code (with the pre-trained model-10.th
), or the data_collector.py
script. Both scripts seem to be freezing at some point in the import process, which I haven't been able to resolve. They don't throw errors (or print anything besides the pygame message) but seem to hang indefinitely.
I had to make a couple of modifications to the installed dependencies, because it seems that the RTX GPU family is not compatible with CUDA 9.0 or below (or with any packages that are built with CUDA 9.0 or below), as per pytorch/pytorch#17543, and because I ran into CUDA warnings and CUDNN errors when using the original dependencies (that were built with CUDA 8.0)
The relevant versions of dependencies on my system are as follows:
Package | Version |
---|---|
cudatoolkit | 10.1.243 |
pytorch | py3.5_cuda10.1.243_cudnn7.6.3_0 |
cudnn | 7.6.5 |
If anyone has guidance on how to debug/resolve this issue, I'd really appreciate it. Thanks so much!
My set-up:
GPU: GeForce RTX 2080 with Max-Q
CUDA version: 10.1
Hello, thanks for providing the code for your paper!
I have been unable to run either the
benchmark_agent.py
code (with the pre-trainedmodel-10.th
), or thedata_collector.py
script. Both scripts seem to be freezing at some point in the import process, which I haven't been able to resolve. They don't throw errors (or print anything besides the pygame message) but seem to hang indefinitely.I had to make a couple of modifications to the installed dependencies, because it seems that the RTX GPU family is not compatible with CUDA 9.0 or below (or with any packages that are built with CUDA 9.0 or below), as per pytorch/pytorch#17543, and because I ran into CUDA warnings and CUDNN errors when using the original dependencies (that were built with CUDA 8.0)
The relevant versions of dependencies on my system are as follows:
Package Version
cudatoolkit 10.1.243
pytorch py3.5_cuda10.1.243_cudnn7.6.3_0
cudnn 7.6.5
If anyone has guidance on how to debug/resolve this issue, I'd really appreciate it. Thanks so much!My set-up:
GPU: GeForce RTX 2080 with Max-Q
CUDA version: 10.1
Hey @rohanb2018 ,
You are right about the incompatibility of CUDA 8.0 and RTX 2080.
Downgrading my dependencies to the following versions fixed the problem while I was testing it out.
cudatoolkit 10.0.130
cudnn 7.6.0
pytorch_1.0.0 py3.5_cuda10.0.130_cudnn7.4.1_1
Hope this helps
Thanks for your interest in our project!
If data_collector.py
hangs it probably suggests issues other than pytorch installations (not 100% though). Could you paste the messages you got? Also, make sure you installed our .egg
file and that the portal number matches the CARLA instance you launched, and try changing the order in which carla/pytorch related stuff is imported.
Thanks for the response, guys!
@raks097: Thanks for the suggestion, unfortunately when trying to conda install the suggested PyTorch version you suggested, I ran into a conda UnsatisfiableError with a long list of incompatible specifications. It's weird because all of the dependencies (and their compatible versions) for that version of PyTorch seem to be present in the environment, but conda still complains.
@dianchen96: Sure! The only console output I get from running either the agent benchmark script or the data collector script is just the pygame message:
pygame 1.9.4
Hello from the pygame community. https://www.pygame.org/contribute.html
In both cases I have to kill the running Python process because it doesn't respond to the usual keyboard interrupt.
I did confirm that I have your CARLA .egg
file - just to make sure, I re-downloaded and installed it and the freezing issue persists. Also checked the port number and it seems fine.
I'll try playing around with the import order and see if that helps.
could be related to carla-simulator/carla#2132
could be related to carla-simulator/carla#2132
Yeah this might be it! I tried import carla
followed by import torchvision
and it freezes up, also requiring me to kill the process like I had to do for the agent benchmark code. I guess I have to move the torchvision
import to be earlier than the first instance of wherever carla
is imported - will try it and see.
one way I got around this previously was just to remove the torchvision import,
since torchvision is only used for the ToTensor
transform, and replace all the ToTensor
with the following
myToTensor = lambda x: (torch.FloatTensor(x) / 255.0).transpose(0, 1).transpose(0, 2).contiguous()
which converts a numpy uint8 to a FloatTensor (taken from https://pytorch.org/docs/stable/_modules/torchvision/transforms/functional.html#to_tensor)
Great, I think I actually got the benchmark_agent.py
example working now!
I ended up commenting out all of the torchvision
imports inside the bird_view
folder. This included replacing all of the ToTensor
calls with the code that you mentioned. Additionally, I had to comment out instances of torchvision.utils
inside the logger.py
and saver.py
files. I replaced the calls to tv_utils.make_grid
in both of those files by just copying the source code from PyTorch (https://pytorch.org/docs/stable/_modules/torchvision/utils.html#make_grid).
I assume I'll have to do the same to the files inside the training
folder since I see some references to torchvision
there as well.
Thanks again for your help! Will let you know if I run into any other issues.
note - another easier way to get around this is to just find the first instance of
import carla
and simply add import torchvision
right before that
many libraries have this problem due to the way pytorch is compiled,
see pytorch/pytorch#19739 (comment)
Hi @bradyz , thanks for sharing the code! When I tried to run benchmark_agent.py
, it freezes on import torchvision
inside bird_view/utils/carla_utils.py
. After I comment out the line, the code is able to run until it again freezes on import torch
in bird_view/utils/bz_utils/saver.py
. Do you have any insight on how to fix this?
Besides, according to carla-simulator/carla#2132 (comment), this issue might have been fixed with CARLA 0.9.9. Is the latest version by any chance a viable option?
I am testing the latest version (031308).
Update: benchmark_agent.py
no longer freezes with CARLA 0.9.9.4 but will run into the following errors:
pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html
suite: FullTown02-v1
before run_benchmark
0%| | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last):
File "benchmark_agent.py", line 69, in run
run_benchmark(agent_maker, env, benchmark_dir, seed, autopilot, resume, max_run=max_run, show=show)
File "/home/peiyunh/code/lbc/benchmark/run_benchmark.py", line 243, in run_benchmark
result, diagnostics = run_single(env, weather, start, target, agent_maker, seed, autopilot, show=show)
File "/home/peiyunh/code/lbc/benchmark/run_benchmark.py", line 174, in run_single
env.init(start=start, target=target, weather=cu.PRESET_WEATHERS[weather])
File "/home/peiyunh/code/lbc/benchmark/goal_suite.py", line 44, in init
super().init(**kwargs)
File "/home/peiyunh/code/lbc/bird_view/utils/carla_utils.py", line 497, in init
self.spawn_player()
File "/home/peiyunh/code/lbc/bird_view/utils/carla_utils.py", line 528, in spawn_player
self._player.start_dtcrowd()
AttributeError: 'Vehicle' object has no attribute 'start_dtcrowd'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "benchmark_agent.py", line 94, in <module>
run(Path(args.model_path), args.port, args.suite, args.big_cam, args.seed, args.autopilot, args.resume, max_run=args.max_run, show=args.show)
File "benchmark_agent.py", line 69, in run
run_benchmark(agent_maker, env, benchmark_dir, seed, autopilot, resume, max_run=max_run, show=show)
File "/home/peiyunh/code/lbc/bird_view/utils/carla_utils.py", line 737, in __exit__
self.clean_up()
File "/home/peiyunh/code/lbc/benchmark/goal_suite.py", line 86, in clean_up
super().clean_up()
File "/home/peiyunh/code/lbc/bird_view/utils/carla_utils.py", line 626, in clean_up
self._player.stop_dtcrowd()
AttributeError: 'Vehicle' object has no attribute 'stop_dtcrowd'
At the very very top of benchmark_agent can you try
import torch
import torchvision
@peiyunh start_dtcrowd/stop_dtcrowd only come with our custom CARLA 0.9.6 egg for the pedestrian fix. If you would like to use this repo with CARLA 0.9.9 you need to modify some of the utilities code.
Thanks for the replies @bradyz @dianchen96 !
At the very very top of benchmark_agent can you try
import torch
import torchvision
This works! I am able to run benchmark_agent now. Thanks a lot!
Thanks for the replies @bradyz @dianchen96 !
At the very very top of benchmark_agent can you try
import torch
import torchvisionThis works! I am able to run benchmark_agent now. Thanks a lot!
Hi, I met the same problem. How did you fix it? I add
import torch
import torchvision
at the first line of benchmark_agent.py but the problem still exists.
Thanks!
At the very very top of benchmark_agent can you try
import torch import torchvision
From all the quotes here, this was the best one
I tried it even with coda 12.5 at it really worked!
Thanks