line351 /stitching_resnet_swim/train.py “--local_rank" argument error
Xinrt opened this issue · 7 comments
env:
pytorch 2.0.0
pytorch-cuda 11.7
python 3.10.10
as title, should this argument parsing change from “--local_rank" to “--local-rank"?
when using “--local_rank",
error prompt: train.py: error: unrecognized arguments: --local-rank=5
(see the attachment for complete error output)
local_rank_error.txt
Hello,
I used:
To stitch a ResNet-18 with ResNet-50 with 8 GPUs on ImageNet. with commands
./distributed_train.sh 8 \
[path/to/imagenet] \
-b 128 \
--stitch_config configs/resnet18_resnet50.json \
--sched cosine \
--epochs 30 \
--lr 0.05 \
--amp --remode pixel \
--reprob 0.6 \
--aa rand-m9-mstd0.5-inc1 \
--resplit --split-bn -j 10 --dist-bn reduce
and replace the [path/to/imagenet]
with my own path
Hi @Xinrt, I found the issue. It seems you are using the latest PyTorch 2.0. However, the old API in previous version seems to be deprecated:
...conda_envs/torch121/lib/python3.9/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.la
unch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
At this moment, you can fix this issue by downgrading your PyTorch version. For example, PyTorch 1.12 + CUDA 11.3, which should work for all the code available in this repo. I will try to find some time to make this repo compatible with the latest version of PyTorch.
Got it, thank you so much! I will try to use PyTorch 1.12
Hi @HubHop , I am working with @Xinrt
Do you guys use conda for env setup? Do you mind sharing the conda env file for this project?
I read the requirements.txt and the Requirements chapter in the README in /stitching_resnet_swim but it seems that there are still several dependencies missing when we were running your code.
Hi @xiangtianheng, to prepare your python env is pretty much easy for this project. I just updated the readme, where you can find how to to create a conda env for your experiments. I have tested this env and it can run all the code in this repo.
Thanks for the info! I am able to run the code now with the dependency provided! Feel free to close this issue.