kakaobrain/noc

[Assertion error] bash /scripts/train_cc3m_noc.sh

Closed this issue · 2 comments

INFO:root:####### overrides: ['distributed.num_nodes=4', 'distributed.num_proc_per_node=1', 'experiment.expr_name=cc3m_noc']
DEBUG:hydra.core.utils:Setting JobRuntime:name=UNKNOWN_NAME
DEBUG:hydra.core.utils:Setting JobRuntime:name=app
Traceback (most recent call last):
File "/data/hyeokseung1208/NOC/train.py", line 66, in
main()
File "/data/hyeokseung1208/NOC/train.py", line 22, in main
cfg = init_hydra_config(mode="train")
File "/data/hyeokseung1208/NOC/noc/utils/main_utils.py", line 48, in init_hydra_config
cfg = infer_and_assert_hydra_config(cfg)
File "/data/hyeokseung1208/NOC/noc/utils/main_utils.py", line 72, in infer_and_assert_hydra_config
assert cfg.experiment.max_epochs is None
AssertionError

I edited config/default.yaml file max_epochs: 10,
but I cannot fully understand why you write the code like this
Screenshot 2024-05-02 at 2 34 51 PM
Can you explain this...?🥲

When both max_steps and max_epochs are given for Trainer of Pytorch Lightning, Training will stop if max_steps or max_epochs have reached (earliest).
So, to prevent the experiment from working differently than intended, we implemented it as above; i.e., force to use only one between max_steps and max_epochs.

Refer to the official description at PyTorch Lightning.
스크린샷 2024-05-02 오후 3 44 26

Well understood! Thank you and have a nice day :)