NVIDIA A100-SXM4-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.
andysingal opened this issue · 2 comments
andysingal commented
Describe the bug
A clear and concise description of what the bug is.
Please share your notebook link so that we can reproduce the error
https://colab.research.google.com/drive/1Mw1K4QuCmnSp6YGFmqpmWtdECWBNVX5X?usp=sharing
ERROR:
/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:145: UserWarning:
NVIDIA A100-SXM4-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA A100-SXM4-40GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
[WARNING] Trainer has no policies, not saving anything.
Traceback (most recent call last):
File "/usr/local/bin/mlagents-learn", line 33, in <module>
sys.exit(load_entry_point('mlagents', 'console_scripts', 'mlagents-learn')())
File "/content/ml-agents/ml-agents/mlagents/trainers/learn.py", line 264, in main
run_cli(parse_command_line())
File "/content/ml-agents/ml-agents/mlagents/trainers/learn.py", line 260, in run_cli
run_training(run_seed, options, num_areas)
File "/content/ml-agents/ml-agents/mlagents/trainers/learn.py", line 136, in run_training
tc.start_learning(env_manager)
File "/content/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "/content/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 172, in start_learning
self._reset_env(env_manager)
File "/content/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "/content/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 107, in _reset_env
self._register_new_behaviors(env_manager, env_manager.first_step_infos)
File "/content/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 267, in _register_new_behaviors
self._create_trainers_and_managers(env_manager, new_behavior_ids)
File "/content/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 165, in _create_trainers_and_managers
self._create_trainer_and_manager(env_manager, behavior_id)
File "/content/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 137, in _create_trainer_and_manager
policy = trainer.create_policy(
File "/content/ml-agents/ml-agents/mlagents/trainers/ppo/trainer.py", line 194, in create_policy
policy = TorchPolicy(
File "/content/ml-agents/ml-agents/mlagents/trainers/policy/torch_policy.py", line 41, in __init__
GlobalSteps()
File "/content/ml-agents/ml-agents/mlagents/trainers/torch_entities/networks.py", line 748, in __init__
torch.Tensor([0]).to(torch.int64), requires_grad=False
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Material
- Did you use Google Colab?
yes
If not: - Your Operating system (OS)
- Version of your OS
simoninithomas commented
Hey there 👋 just checked the notebook and it seems you don't have this error anymore? 🤔 .
https://huggingface.co/Andyrasika/ppo-Huggy: your model
simoninithomas commented
Closing the issue for now 🤗