open-mmlab/mmdetection

When saving a checkpoint during model training, an OSError: [Errno 122] Disk quota exceeded message appears, but there is still space left in the save directory.

Wzh10032 opened this issue · 0 comments

12/09 06:49:23 - mmengine - INFO - Exp name: yolov3_d53_8xb8-ms-608-273e_coco_20241208_233948
12/09 06:49:23 - mmengine - INFO - Saving checkpoint at 14 epochs
Traceback (most recent call last):
File "tools/train.py", line 122, in
main()
File "tools/train.py", line 118, in main
runner.train()
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1777, in train
model = self.train_loop.run() # type: ignore
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 98, in run
self.run_epoch()
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 117, in run_epoch
self.runner.call_hook('after_train_epoch')
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1839, in call_hook
getattr(hook, fn_name)(self, **kwargs)
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/hooks/checkpoint_hook.py", line 345, in after_train_epoch
self._save_checkpoint(runner)
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/hooks/checkpoint_hook.py", line 476, in _save_checkpoint
self._save_checkpoint_with_step(runner, step, meta=meta)
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/hooks/checkpoint_hook.py", line 443, in _save_checkpoint_with_step
runner.save_checkpoint(
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/dist/utils.py", line 427, in wrapper
return func(*args, **kwargs)
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 2271, in save_checkpoint
save_checkpoint(
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/checkpoint.py", line 793, in save_checkpoint
file_backend.put(f.getvalue(), filename)
File "/mnt/wzh/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/fileio/backends/local_backend.py", line 78, in put
f.write(obj)
OSError: [Errno 122] Disk quota exceeded

My save directory still has 46G of free space:
Size Used Avail Use% Mounted on
12T 12T 46G 100% /mnt

Will the save method also save the corresponding files in another disk? My /home disk is indeed full.