KeyError when running training experiments
Closed this issue ยท 3 comments
I'm attempting to run the experiments with bash scripts/run.all.sh
and running into the following error:
Training bert on skills
{'batch_sampler': {'batch_size': 32, 'max_tokens': 1024, 'sampling_smoothing': 1, 'sorting_keys': ['tokens'], 'type': 'dataset_buckets'}}
Traceback (most recent call last):
File "/home/ec2-user/SkillSpan/machamp/train.py", line 56, in <module>
trainer.train(name, args.parameters_config, args.dataset_configs, device, args.resume, args.retrain, args.seed, cmd)
File "/home/ec2-user/SkillSpan/machamp/machamp/model/trainer.py", line 91, in train
dataset_configs = myutils.merge_configs(dataset_config_paths, parameters_config)
File "/home/ec2-user/SkillSpan/machamp/machamp/utils/myutils.py", line 66, in merge_configs
full_task_config = copy.deepcopy(parameters_config['decoders']['default_decoder'])
KeyError: 'decoders'
It seems to be using the model specific configs (ie spanbert.1.json) in 'SkillSpan/configs/Skills' where the decoders
key is nested unlike 'params.json' which has the expected key format.
Hi Sean,
I re-cloned everything and ran it on python 3.6 with torch 1.7.0 with cuda 11.0. My script is training and seems to be working fine.
Note that the machamp
library is a submodule (with a specific commit) here and that you should run the following.
$ git clone https://github/kris927b/SkillSpan.git
$ git submodule update --init --recursive
Let me know how you ran into the issue.
Thanks @jjzha, I think the specific commit was throwing me off, but following your command above I was able to run the training script successfully. Thank you for sharing and maintaining this excellent library and dataset!
Just for context; I ran into the issue by cloning machamp
as a submodule from its latest commit.