LVIS training bug: TypeError: can't pickle _thread.RLock objects
nemonameless opened this issue · 2 comments
nemonameless commented
Describe the bug
training on COCO dataset is ok, but when I train on LVIS meet this bug.
Environment
- Please run
python mmdet/utils/collect_env.py
to collect necessary environment information and paste it here.
sys.platform: linux
Python: 3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54) [GCC 7.3.0]
CUDA available: False
GCC: gcc (GCC) 5.2.0
PyTorch: 1.4.0
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,
TorchVision: 0.5.0
OpenCV: 4.4.0
MMCV: 1.1.2
MMDetection: 2.4.0+
MMDetection Compiler: GCC 7.3
MMDetection CUDA Compiler: 10.1
Error traceback
If applicable, paste the error trackback here.
2020-11-14 20:28:12,249 - mmdet - INFO - workflow: [('train', 1)], max: 12 epochs
Traceback (most recent call last):
File "./tools/train.py", line 177, in <module>
main()
File "./tools/train.py", line 173, in main
meta=meta)
File "/data/cdp_algo_ceph_ssd/users/georgeni/causallvis/mmdet/apis/train.py", line 143, in train_detector
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/data/anaconda3/envs/zxcheng/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 122, in run
epoch_runner(data_loaders[i], **kwargs)
File "/data/anaconda3/envs/zxcheng/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 27, in train
for i, data_batch in enumerate(self.data_loader):
File "/data/anaconda3/envs/zxcheng/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 279, in __iter__
return _MultiProcessingDataLoaderIter(self)
File "/data/anaconda3/envs/zxcheng/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 719, in __init__
w.start()
File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle _thread.RLock objects
Traceback (most recent call last):
File "./tools/train.py", line 177, in <module>
main()
File "./tools/train.py", line 173, in main
meta=meta)
File "/data/cdp_algo_ceph_ssd/users/georgeni/causallvis/mmdet/apis/train.py", line 143, in train_detector
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/data/anaconda3/envs/zxcheng/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 122, in run
epoch_runner(data_loaders[i], **kwargs)
File "/data/anaconda3/envs/zxcheng/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 27, in train
for i, data_batch in enumerate(self.data_loader):
File "/data/anaconda3/envs/zxcheng/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 279, in __iter__
return _MultiProcessingDataLoaderIter(self)
File "/data/anaconda3/envs/zxcheng/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 719, in __init__
w.start()
File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/data/anaconda3/envs/zxcheng/lib/python3.6/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle _thread.RLock objects
^C^C^C^C^C^C^C^C^C^C^C^C^CTraceback (most recent call last):
File "/data/anaconda3/envs/zxcheng/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/data/anaconda3/envs/zxcheng/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/data/anaconda3/envs/zxcheng/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in <module>
main()
File "/data/anaconda3/envs/zxcheng/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/data/anaconda3/envs/zxcheng/bin/python', '-u', './tools/train.py', '--local_rank=1', 'configs/lvis/htcnosemlvis.py', '--launcher', 'pytorch', '--work-dir', 'work_bendilvis/lvis/htcnosemlvis', '--no-validate']' returned non-zero exit status 1.
kemaloksuz commented
Are you able to solve this error?
sangtrx commented
i got the same error