open-mmlab/mmdeploy

[Bug] jetson NX :When i use the deploy.py or check_env.py, it occur: cannot import name 'ProcessGroup' from 'torch.distributed'

lijoe123 opened this issue · 12 comments

Checklist

  • I have searched related issues but cannot get the expected help.
  • 2. I have read the FAQ documentation but cannot get the expected help.
  • 3. The bug has not been fixed in the latest version.

Describe the bug

Jetson NX: When i use the deploy.py or check_env.py, it occur: cannot import name 'ProcessGroup' from 'torch.distributed'

Reproduction

python tools/deploy.py and python tools/check_env.py

Environment

I cannot use the check_env.py
jetpack 5.1.1
 # Name                    Version                   Build  Channel
_openmp_mutex             4.5                       2_gnu    conda-forge
addict                    2.4.0                     <pip>
aenum                     3.1.12                    <pip>
appdirs                   1.4.4                     <pip>
bzip2                     1.0.8                hf897c2e_4    conda-forge
ca-certificates           2018.03.07                    0    c4aarch64
certifi                   2023.5.7                  <pip>
charset-normalizer        3.1.0                     <pip>
click                     8.1.3                     <pip>
contourpy                 1.0.7                     <pip>
cycler                    0.11.0                    <pip>
Cython                    0.29.35                   <pip>
dill                      0.3.6                     <pip>
fonttools                 4.39.4                    <pip>
grpcio                    1.54.2                    <pip>
h5py                      3.8.0                     <pip>
idna                      3.4                       <pip>
importlib-resources       5.12.0                    <pip>
kiwisolver                1.4.4                     <pip>
ld_impl_linux-aarch64     2.39                 ha75b1e8_0    conda-forge
libffi                    3.4.2                h3557bc0_5    conda-forge
libgcc-ng                 12.2.0              h607ecd0_19    conda-forge
libgomp                   12.2.0              h607ecd0_19    conda-forge
libnsl                    2.0.0                hf897c2e_0    conda-forge
libsqlite                 3.40.0               hf9034f9_0    conda-forge
libuuid                   2.32.1            hf897c2e_1000    conda-forge
libzlib                   1.2.13               h4e544f5_4    conda-forge
Mako                      1.2.4                     <pip>
markdown-it-py            2.2.0                     <pip>
MarkupSafe                2.1.2                     <pip>
matplotlib                3.7.1                     <pip>
mdurl                     0.1.2                     <pip>
mmcv                      2.0.0                     <pip>
mmdeploy                  1.1.0                     <pip>
mmdet                     3.0.0                     <pip>
mmengine                  0.7.3                     <pip>
multiprocess              0.70.14                   <pip>
ncurses                   6.3                  headf329_1    conda-forge
ndindex                   1.7                       <pip>
numpy                     1.24.3                    <pip>
onnx                      1.14.0                    <pip>
opencv-python             4.7.0.72                  <pip>
openssl                   3.0.7                h4e544f5_0    conda-forge
packaging                 23.1                      <pip>
Pillow                    9.5.0                     <pip>
pip                       22.3.1             pyhd8ed1ab_0    conda-forge
platformdirs              3.5.1                     <pip>
prettytable               3.7.0                     <pip>
protobuf                  3.20.2                    <pip>
pycocotools               2.0.6                     <pip>
pycuda                    2022.2.2                  <pip>
Pygments                  2.15.1                    <pip>
pyparsing                 3.0.9                     <pip>
pyserial                  3.5                       <pip>
python                    3.8.13          h92ab765_0_cpython    conda-forge
python-dateutil           2.8.2                     <pip>
pytools                   2022.1.14                 <pip>
PyYAML                    6.0                       <pip>
readline                  8.1.2                h38e3740_0    conda-forge
requests                  2.31.0                    <pip>
rich                      13.3.5                    <pip>
scipy                     1.10.1                    <pip>
setuptools                65.5.1             pyhd8ed1ab_0    conda-forge
shapely                   2.0.1                     <pip>
six                       1.16.0                    <pip>
sqlite                    3.40.0               h69ca7e5_0    conda-forge
tensorrt                  8.5.2.2                   <pip>
termcolor                 2.3.0                     <pip>
terminaltables            3.1.10                    <pip>
tk                        8.6.12               hd8af866_0    conda-forge
tomli                     2.0.1                     <pip>
torch                     1.13.0a0+d0d6b1f2.nv22.09           <pip>
torchvision               0.14.0                    <pip>
typing_extensions         4.6.2                     <pip>
urllib3                   2.0.2                     <pip>
versioned-hdf5            1.3.13                    <pip>
wcwidth                   0.2.6                     <pip>
wheel                     0.38.4             pyhd8ed1ab_0    conda-forge
xz                        5.2.6                h9cdd2b7_0    conda-forge
yapf                      0.33.0                    <pip>
zipp                      3.15.0                    <pip>

Error traceback

No response

@lijoe123 hi, the env collection is calling functions from mmcv. Since you have installed mmcv, could you try run python -c "from mmcv.utils import collect_env; print(collect_env())"?

I'm sorry. The same problem
d9897f14d23b515ad2071eba96d66c2

then maybe you could post the issue in mmengine.

Hello I had see the two same problem. Could you had some solution to solve the problem? #967 ,#1891

I had asked the pytorch official, and they give me this advice #102596, and i don't know how to set this, could you give me some advice?
image

@lijoe123 If you only want to deploy model in a fast way. You cant git clone mmengine and comment out the unused line that throws error for from mmengine.dist.

# 1. clone source code
git clone --depth 1 https://github.com/open-mmlab/mmengine.git
# 2. comment some import lines in mmengine.dist.__init__.py 
# 3. reinstall mmengine
cd mmengine
pip installl  -e .

@RunningLeon Thank you for your reply. Acctually, the from torch.distributed import ProcessGroup is in the dist.py. And I don't know how to comment. And the ProcessGroup had been used , i don't know how to comment.
436952ebaf0d70f070e55fad42c3e40

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.