The error when training
daixiaolei623 opened this issue · 4 comments
Thank you for your great work.
However, when i train the maskformer_swin_large_IN21k_384_bs16_160k_res640.yaml using the commend:
./train_net.py --num-gpus 2 --config-file configs/ade20k-150/swin/maskformer_swin_large_IN21k_384_bs16_160k_res640.yaml
.
I got the following errors:
`MaskFormer Training Script.
This script is a simplified version of the training script in detectron2/tools.
: No such file or directory
import-im6.q16: not authorized copy' @ error/constitute.c/WriteImage/1037. import-im6.q16: not authorized
itertools' @ error/constitute.c/WriteImage/1037.
import-im6.q16: not authorized logging' @ error/constitute.c/WriteImage/1037. import-im6.q16: not authorized
os' @ error/constitute.c/WriteImage/1037.
from: can't read /var/mail/collections
from: can't read /var/mail/typing
import-im6.q16: not authorized torch' @ error/constitute.c/WriteImage/1037. import-im6.q16: not authorized
comm' @ error/constitute.c/WriteImage/1037.
from: can't read /var/mail/detectron2.checkpoint
from: can't read /var/mail/detectron2.config
from: can't read /var/mail/detectron2.data
from: can't read /var/mail/detectron2.engine
./train_net.py: line 21: syntax error near unexpected token (' ./train_net.py: line 21:
from detectron2.evaluation import ('`
Could you please tell me what is the problem and how to solve it?
thank you very much!
Add python
@bowenc0221
Thank you.
However, i have add python and install cuda-11.1, i run python ./train_net.py --num-gpus 2 --config-file /home/dai/code/semantic_segmentation/27/MaskFormer-master/configs/ade20k-150/swin/maskformer_swin_large_IN21k_384_bs16_160k_res640.yaml
, and got the following error:
`Command Line Args: Namespace(config_file='/home/dai/code/semantic_segmentation/27/MaskFormer-master/configs/ade20k-150/swin/maskformer_swin_large_IN21k_384_bs16_160k_res640.yaml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=2, num_machines=1, opts=[], resume=False)
/home/dai/TOOL/anaconda3/envs/maskfromer/lib/python3.7/site-packages/torch/cuda/init.py:52: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 101: invalid device ordinal (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0
Traceback (most recent call last):
File "./train_net.py", line 270, in
args=(args,),
File "/home/dai/TOOL/anaconda3/envs/maskfromer/lib/python3.7/site-packages/detectron2/engine/launch.py", line 79, in launch
daemon=False,
File "/home/dai/TOOL/anaconda3/envs/maskfromer/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/dai/TOOL/anaconda3/envs/maskfromer/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/home/dai/TOOL/anaconda3/envs/maskfromer/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/dai/TOOL/anaconda3/envs/maskfromer/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/dai/TOOL/anaconda3/envs/maskfromer/lib/python3.7/site-packages/detectron2/engine/launch.py", line 95, in _distributed_worker
assert torch.cuda.is_available(), "cuda is not available. Please check your installation."
AssertionError: cuda is not available. Please check your installation.`
@bowenc0221
thank you , i have solved the above error, but my GPU is 1080Ti, which is out of memory, i want to train on CPU, my CPU is 64G,
Could you please tell me how to train it on CPU?
thank you.
You can try adding MODEL.DEVICE 'cpu'
at the end of your command, but I have never tested it with CPU.