CUDA out of memory

Question

CUDA out of memory

Opened this issue 7 months ago · 2 comments

设备明明有空间，但是显示内存不足。
修改了trainpy上的device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")，指定使用第二个GPU，但是他还是用的第一个
(FastSpeech) gaoyiyao@deeplearning-Z10PE-D8-WS:/home_1/gaoyiyao/FastSpeech2-master$ python3 train.py -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml
Prepare training ...
Number of FastSpeech2 Parameters: 35159361
Traceback (most recent call last):
File "train.py", line 199, in
main(args, configs)
File "train.py", line 49, in main
vocoder = get_vocoder(model_config, device)
File "/home_1/gaoyiyao/FastSpeech2-master/utils/model.py", line 63, in get_vocoder
ckpt = torch.load("hifigan/generator_LJSpeech.pth.tar")
File "/home_1/gaoyiyao/anaconda3/envs/FastSpeech/lib/python3.8/site-packages/torch/serialization.py", line 595, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home_1/gaoyiyao/anaconda3/envs/FastSpeech/lib/python3.8/site-packages/torch/serialization.py", line 774, in _legacy_load
result = unpickler.load()
File "/home_1/gaoyiyao/anaconda3/envs/FastSpeech/lib/python3.8/site-packages/torch/serialization.py", line 730, in persistent_load
deserialized_objects[root_key] = restore_location(obj, location)
File "/home_1/gaoyiyao/anaconda3/envs/FastSpeech/lib/python3.8/site-packages/torch/serialization.py", line 175, in default_restore_location
result = fn(storage, location)
File "/home_1/gaoyiyao/anaconda3/envs/FastSpeech/lib/python3.8/site-packages/torch/serialization.py", line 155, in _cuda_deserialize
return storage_type(obj.size())
File "/home_1/gaoyiyao/anaconda3/envs/FastSpeech/lib/python3.8/site-packages/torch/cuda/init.py", line 462, in _lazy_new
return super(_CudaBase, cls).new(cls, *args, **kwargs)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 10.91 GiB total capacity; 24.69 MiB already allocated; 13.62 MiB free; 26.00 MiB reserved in total by PyTorch)

Answer 1 · 2024-08-28T20:53:32.000Z

You have run out of vram. Either get more gpu power or try running it with less intensive settings.

Answer 2 · 2024-11-09T10:39:32.000Z

设备明明有空间，但是显示内存不足。修改了trainpy上的device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")，指定使用第二个GPU，但是他还是用的第一个 (FastSpeech) gaoyiyao@deeplearning-Z10PE-D8-WS:/home_1/gaoyiyao/FastSpeech2-master$ python3 train.py -p config/LJSpeech/preprocess.yaml -m config/LJSpeech/model.yaml -t config/LJSpeech/train.yaml Prepare training ... Number of FastSpeech2 Parameters: 35159361 Traceback (most recent call last): File "train.py", line 199, in main(args, configs) File "train.py", line 49, in main vocoder = get_vocoder(model_config, device) File "/home_1/gaoyiyao/FastSpeech2-master/utils/model.py", line 63, in get_vocoder ckpt = torch.load("hifigan/generator_LJSpeech.pth.tar") File "/home_1/gaoyiyao/anaconda3/envs/FastSpeech/lib/python3.8/site-packages/torch/serialization.py", line 595, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/home_1/gaoyiyao/anaconda3/envs/FastSpeech/lib/python3.8/site-packages/torch/serialization.py", line 774, in _legacy_load result = unpickler.load() File "/home_1/gaoyiyao/anaconda3/envs/FastSpeech/lib/python3.8/site-packages/torch/serialization.py", line 730, in persistent_load deserialized_objects[root_key] = restore_location(obj, location) File "/home_1/gaoyiyao/anaconda3/envs/FastSpeech/lib/python3.8/site-packages/torch/serialization.py", line 175, in default_restore_location result = fn(storage, location) File "/home_1/gaoyiyao/anaconda3/envs/FastSpeech/lib/python3.8/site-packages/torch/serialization.py", line 155, in _cuda_deserialize return storage_type(obj.size()) File "/home_1/gaoyiyao/anaconda3/envs/FastSpeech/lib/python3.8/site-packages/torch/cuda/init.py", line 462, in _lazy_new return super(_CudaBase, cls).new(cls, *args, **kwargs) RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 10.91 GiB total capacity; 24.69 MiB already allocated; 13.62 MiB free; 26.00 MiB reserved in total by PyTorch)

reduce the batch_size and train again