KeyError when running training experiments

Question

KeyError when running training experiments

Closed this issue 2 years ago · 3 comments

I'm attempting to run the experiments with bash scripts/run.all.sh and running into the following error:

Training bert on skills
{'batch_sampler': {'batch_size': 32, 'max_tokens': 1024, 'sampling_smoothing': 1, 'sorting_keys': ['tokens'], 'type': 'dataset_buckets'}}
Traceback (most recent call last):
  File "/home/ec2-user/SkillSpan/machamp/train.py", line 56, in <module>
    trainer.train(name, args.parameters_config, args.dataset_configs, device, args.resume, args.retrain, args.seed, cmd)
  File "/home/ec2-user/SkillSpan/machamp/machamp/model/trainer.py", line 91, in train
    dataset_configs = myutils.merge_configs(dataset_config_paths, parameters_config)
  File "/home/ec2-user/SkillSpan/machamp/machamp/utils/myutils.py", line 66, in merge_configs
    full_task_config = copy.deepcopy(parameters_config['decoders']['default_decoder'])
KeyError: 'decoders'

It seems to be using the model specific configs (ie spanbert.1.json) in 'SkillSpan/configs/Skills' where the decoders key is nested unlike 'params.json' which has the expected key format.

Answer 1 · 2023-06-05T13:16:21.000Z

Hi Sean,

I re-cloned everything and ran it on python 3.6 with torch 1.7.0 with cuda 11.0. My script is training and seems to be working fine.
Note that the machamp library is a submodule (with a specific commit) here and that you should run the following.

$ git clone https://github/kris927b/SkillSpan.git 
$ git submodule update --init --recursive

Let me know how you ran into the issue.

Answer 2 · 2023-06-05T15:02:12.000Z

Thanks @jjzha, I think the specific commit was throwing me off, but following your command above I was able to run the training script successfully. Thank you for sharing and maintaining this excellent library and dataset!

Answer 3 · 2023-06-05T15:17:13.000Z

Just for context; I ran into the issue by cloning machamp as a submodule from its latest commit.