Training error: hydra error. Parameter lr_scheduler.total_num_update=null
odigab opened this issue · 5 comments
Hi,
I'm trying to train muss, but got an hydra error:
fairseq_prepare_and_train...
exp_dir=/scratch1/fer201/muss/muss-git/experiments/fairseq/local_1634083552326
fairseq-train /scratch1/fer201/muss/muss-git/resources/datasets/_9585ac127caca9d7160a28f1d8180050/fairseq_preprocessed_complex-simple --task translation --source-lang complex --target-lang simple --save-dir /scratch1/fer201/muss/muss-git/experiments/fairseq/local_1634083552326/checkpoints --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --lr-scheduler polynomial_decay --lr 3e-05 --warmup-updates 500 --update-freq 128 --arch bart_large --dropout 0.1 --weight-decay 0.0 --clip-norm 0.1 --share-all-embeddings --no-epoch-checkpoints --save-interval 999999 --validate-interval 999999 --max-update 20000 --save-interval-updates 100 --keep-interval-updates 1 --patience 10 --batch-size 64 --seed 917 --distributed-world-size 1 --distributed-port 15798 --fp16 --restore-file /scratch1/fer201/muss/muss-git/resources/models/bart.large/model.pt --max-tokens 512 --truncate-source --layernorm-embedding --share-all-embeddings --share-decoder-input-output-embed --reset-optimizer --reset-dataloader --reset-meters --required-batch-size-multiple 1 --label-smoothing 0.1 --attention-dropout 0.1 --weight-decay 0.01 --optimizer 'adam' --adam-betas '(0.9, 0.999)' --adam-eps 1e-08 --clip-norm 0.1 --skip-invalid-size-inputs-valid-test --find-unused-parameters
fairseq_prepare_and_train failed after 4.45s.
Traceback (most recent call last):
File "/scratch1/fer201/muss/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 513, in _apply_overrides_to_config
OmegaConf.update(cfg, key, value, merge=True)
File "/scratch1/fer201/muss/lib/python3.9/site-packages/omegaconf/omegaconf.py", line 613, in update
root.setattr(last_key, value)
File "/scratch1/fer201/muss/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 285, in setattr
raise e
File "/scratch1/fer201/muss/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 282, in setattr
self.__set_impl(key, value)
File "/scratch1/fer201/muss/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 266, in __set_impl
self._set_item_impl(key, value)
File "/scratch1/fer201/muss/lib/python3.9/site-packages/omegaconf/basecontainer.py", line 398, in _set_item_impl
self._validate_set(key, value)
File "/scratch1/fer201/muss/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 143, in _validate_set
self._validate_set_merge_impl(key, value, is_assign=True)
File "/scratch1/fer201/muss/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 156, in _validate_set_merge_impl
self._format_and_raise(
File "/scratch1/fer201/muss/lib/python3.9/site-packages/omegaconf/base.py", line 95, in _format_and_raise
format_and_raise(
File "/scratch1/fer201/muss/lib/python3.9/site-packages/omegaconf/_utils.py", line 694, in format_and_raise
_raise(ex, cause)
File "/scratch1/fer201/muss/lib/python3.9/site-packages/omegaconf/_utils.py", line 610, in _raise
raise ex # set end OC_CAUSE=1 for full backtrace
omegaconf.errors.ValidationError: child 'lr_scheduler.total_num_update' is not Optional
full_key: lr_scheduler.total_num_update
reference_type=Optional[PolynomialDecayLRScheduleConfig]
object_type=PolynomialDecayLRScheduleConfig
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/scratch1/fer201/muss/muss-git/scripts/train_model.py", line 21, in
result = fairseq_train_and_evaluate_with_parametrization(**kwargs)
File "/scratch1/fer201/muss/muss-git/muss/fairseq/main.py", line 224, in fairseq_train_and_evaluate_with_parametrization
exp_dir = print_running_time(fairseq_prepare_and_train)(dataset, **kwargs)
File "/scratch1/fer201/muss/muss-git/muss/utils/helpers.py", line 470, in wrapped_func
return func(*args, **kwargs)
File "/scratch1/fer201/muss/muss-git/muss/fairseq/main.py", line 74, in fairseq_prepare_and_train
fairseq_train(preprocessed_dir, exp_dir=exp_dir, **train_kwargs)
File "/scratch1/fer201/muss/muss-git/muss/utils/training.py", line 60, in wrapped_func
return func(*args, **kwargs)
File "/scratch1/fer201/muss/muss-git/muss/fairseq/base.py", line 127, in fairseq_train
train.cli_main()
File "/scratch1/fer201/muss/fairseq-git/fairseq_cli/train.py", line 496, in cli_main
cfg = convert_namespace_to_omegaconf(args)
File "/scratch1/fer201/muss/fairseq-git/fairseq/dataclass/utils.py", line 389, in convert_namespace_to_omegaconf
composed_cfg = compose("config", overrides=overrides, strict=False)
File "/scratch1/fer201/muss/lib/python3.9/site-packages/hydra/experimental/compose.py", line 31, in compose
cfg = gh.hydra.compose_config(
File "/scratch1/fer201/muss/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 507, in compose_config
cfg = self.config_loader.load_configuration(
File "/scratch1/fer201/muss/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 151, in load_configuration
return self._load_configuration(
File "/scratch1/fer201/muss/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 277, in _load_configuration
ConfigLoaderImpl._apply_overrides_to_config(config_overrides, cfg)
File "/scratch1/fer201/muss/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 520, in _apply_overrides_to_config
raise ConfigCompositionException(
hydra.errors.ConfigCompositionException: Error merging override lr_scheduler.total_num_update=null
Thanks for raising this issue!
What version of fairseq are you using?
working for prediction in a computer cluster fairseq==0.10.1
Would having fairseq==0.10.2 fix the issue?
Yes most likely! Keep me posted.
I downloaded fairseq from github,
and got this version:
1.0.0a0+f34abcf
some issue thought
What about pip install fairseq==0.10.2