ImportError preventing training
DaDudeIan opened this issue · 2 comments
DaDudeIan commented
During the execution of the train_nyu.sh
script, an ImportError
is encountered when trying to import VectorQuantizer2
from taming.modules.vqvae.quantize
. This error prevents the training process from starting as the model initialization fails.
(ecodepth) cv19f24@node13:~/EcoDepth/depth$ bash train_nyu.sh
master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/site-packages/mmcv/__init__.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
warnings.warn(
| distributed init (rank 0): env://, gpu 0
Max_depth = 10.0 meters for nyudepthv2!
<wandb logs>
model will be saved after every 200 steps
val will be done after every 200 steps
This experiment name is : 04151402_nyu_BS-16_lr-one_cycle_training_nyu
log_dir in main log_dir/04151402_nyu_BS-16_lr-one_cycle_training_nyu
Traceback (most recent call last):
File "/home/cv19f24/EcoDepth/depth/train.py", line 540, in <module>
main()
File "/home/cv19f24/EcoDepth/depth/train.py", line 462, in main
model = EcoDepth(args=args)
^^^^^^^^^^^^^^^^^^^
File "/home/cv19f24/EcoDepth/depth/models/model.py", line 166, in __init__
self.encoder = EcoDepthEncoder(out_dim=channels_in, dataset='nyu', args = args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cv19f24/EcoDepth/depth/models/model.py", line 56, in __init__
sd_model = instantiate_from_config(self.config.model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cv19f24/EcoDepth/depth/ldm/util.py", line 85, in instantiate_from_config
return get_obj_from_str(config["target"])(**config.get("params", dict()))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cv19f24/EcoDepth/depth/ldm/util.py", line 93, in get_obj_from_str
return getattr(importlib.import_module(module, package=None), cls)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen importlib._bootstrap>", line 1206, in _gcd_import
File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 940, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/home/cv19f24/EcoDepth/depth/ldm/models/diffusion/ddpm.py", line 25, in <module>
from ldm.models.autoencoder import VQModelInterface, IdentityFirstStage, AutoencoderKL
File "/home/cv19f24/EcoDepth/depth/ldm/models/autoencoder.py", line 6, in <module>
from taming.modules.vqvae.quantize import VectorQuantizer2 as VectorQuantizer
ImportError: cannot import name 'VectorQuantizer2' from 'taming.modules.vqvae.quantize' (/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/site-packages/taming/modules/vqvae/quantize.py)
<wandb logs>
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2380373) of binary: /home/cv19f24/.conda-2024.02/envs/ecodepth/bin/python
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/site-packages/torch/distributed/run.py", line 798, in <module>
main()
File "/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cv19f24/.conda-2024.02/envs/ecodepth/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
train.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-04-15_14:02:25
host : node13
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 2380373)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
I have followed installation down to the letter, so I don't know what to do. Thank you
frankkim1108 commented
Hi, @DaDudeIan
I had the same error this solve the problem.
pip install taming-transformers-rom1504
DaDudeIan commented
Thank you, @frankkim1108!
But I found another fix: CompVis/stable-diffusion#72
Doing this fixed it for me 😊