open-mmlab/mmengine

[Bug] config to import yapf causes 'EOFError: Ran out of input' when distributed training

Closed this issue · 9 comments

Prerequisite

Environment

None

Reproduces the problem - code sample

None

Reproduces the problem - command or script

Just run torchrun --nproc_per_node 8 mmengine_train.py config.py --launcher pytorch

Reproduces the problem - error message

2024-01-24 17:16,Traceback (most recent call last):
2024-01-24 17:16,  File "mmengine_train.py", line 6, in <module>
2024-01-24 17:16,    from mmengine.config import Config
2024-01-24 17:16,  File "/usr/local/lib/python3.8/dist-packages/mmengine/__init__.py", line 3, in <module>
2024-01-24 17:16,    from .config import *
2024-01-24 17:16,  File "/usr/local/lib/python3.8/dist-packages/mmengine/config/__init__.py", line 2, in <module>
2024-01-24 17:16,    from .config import Config, ConfigDict, DictAction, read_base
2024-01-24 17:16,  File "/usr/local/lib/python3.8/dist-packages/mmengine/config/config.py", line 20, in <module>
2024-01-24 17:16,    import yapf
2024-01-24 17:16,  File "/usr/local/lib/python3.8/dist-packages/yapf/__init__.py", line 41, in <module>
2024-01-24 17:16,    from yapf.yapflib import yapf_api
2024-01-24 17:16,  File "/usr/local/lib/python3.8/dist-packages/yapf/yapflib/yapf_api.py", line 38, in <module>
2024-01-24 17:16,    from yapf.pyparser import pyparser
2024-01-24 17:16,  File "/usr/local/lib/python3.8/dist-packages/yapf/pyparser/pyparser.py", line 44, in <module>
2024-01-24 17:16,    from yapf.yapflib import format_token
2024-01-24 17:16,  File "/usr/local/lib/python3.8/dist-packages/yapf/yapflib/format_token.py", line 23, in <module>
2024-01-24 17:16,    from yapf.pytree import pytree_utils
2024-01-24 17:16,  File "/usr/local/lib/python3.8/dist-packages/yapf/pytree/pytree_utils.py", line 30, in <module>
2024-01-24 17:16,    from yapf_third_party._ylib2to3 import pygram
2024-01-24 17:16,  File "/usr/local/lib/python3.8/dist-packages/yapf_third_party/_ylib2to3/pygram.py", line 29, in <module>
2024-01-24 17:16,    python_grammar = driver.load_grammar(_GRAMMAR_FILE)
2024-01-24 17:16,  File "/usr/local/lib/python3.8/dist-packages/yapf_third_party/_ylib2to3/pgen2/driver.py", line 252, in load_grammar
2024-01-24 17:16,    g.load(gp)
2024-01-24 17:16,  File "/usr/local/lib/python3.8/dist-packages/yapf_third_party/_ylib2to3/pgen2/grammar.py", line 95, in load
2024-01-24 17:16,    d = pickle.load(f)
2024-01-24 17:16,EOFError: Ran out of input

Additional information

No response

I think it's a yapf problem, the repo of yapf also reported this link

I am getting this as well

In my case, setup is:

"mmengine==0.10.2"
"mmcv==2.1.0"
"mmdet==3.3.0"

on ubuntu

Same problem using the older mmcv-full (1.3.0) and mmseg=0.11.0.
Using yapf=0.40.1

Small update: pip install yapf=0.32 fixes this issue for me.

I used a wait function to control each rank's import, and make sure they are imported one by one. Stupid, but working

import time
import os

def wait_before_import_config():
    t = int(os.environ.get('LOCAL_RANK', 0))
    time.sleep(t * 0.5)

def wait_after_import_config():
    t = int(os.environ.get('WORLD_SIZE', 0)) - int(os.environ.get('LOCAL_RANK', 0))
    time.sleep(t * 0.5)

wait_before_import_config()
from mmengine.config import Config
wait_after_import_config()

This has been happening a lot for us. On older versions of mmcv.

@DeclK you are a hero!

I used a wait function to control each rank's import, and make sure they are imported one by one. Stupid, but working

import time
import os

def wait_before_import_config():
    t = int(os.environ.get('LOCAL_RANK', 0))
    time.sleep(t * 0.5)

def wait_after_import_config():
    t = int(os.environ.get('WORLD_SIZE', 0)) - int(os.environ.get('LOCAL_RANK', 0))
    time.sleep(t * 0.5)

wait_before_import_config()
from mmengine.config import Config
wait_after_import_config()

pls check this issue,may be helpful for you:google/yapf#1204

Similar problem i meet when using mmengine,how i find a way to fix in this issue:google/yapf#1204