RuntimeError: generator raised StopIteration
chrisspen opened this issue · 6 comments
Attempting to run the example kur -v train speech.yml
ends after about an hour with the error:
Traceback (most recent call last):
File "~/myproject/env/bin/kur", line 8, in <module>
sys.exit(main())
File "~/myproject/env/lib/python3.7/site-packages/kur/__main__.py", line 492, in main
sys.exit(args.func(args) or 0)
File "~/myproject/env/lib/python3.7/site-packages/kur/__main__.py", line 64, in train
func(step=args.step)
File "~/myproject/env/lib/python3.7/site-packages/kur/kurfile.py", line 434, in func
return trainer.train(**defaults)
File "~/myproject/env/lib/python3.7/site-packages/kur/model/executor.py", line 295, in train
**kwargs
File "~/myproject/env/lib/python3.7/site-packages/kur/model/executor.py", line 768, in wrapped_train
for num_batches, batch in parallelize(enumerate(provider)):
RuntimeError: generator raised StopIteration
and no weights
file is output anywhere. Is this an bug or the expected behavior?
Chris,
The full spec is:
weights:
initial: "/path/to/load/first/weights"
last: "/path/to/save/when/exiting"
checkpoint:
path: "/path/to/save/checkpoints"
minutes: "{{ checkpoint_time }}"
validation: "{{ batch size }}"
Are you using the default speech.yml in your tests?
Yes. I see no weights
or checkpoint
sections in your speech.yml
file. Where are those defined?
When you look in the 'train' section you see a 'weights' section that imports in the weights from the 'settings' section:
train:
data:
# A "speech_recognition" data supplier will create these data sources:
# utterance, utterance_length, transcript, transcript_length, duration
- speech_recognition:
<<: *data
url: "https://kur.deepgram.com/data/lsdc-train.tar.gz"
checksum: >-
fc414bccf4de3964f895eaa9d0e245ea28810a94be3079b55505cf0eb1644f94
weights: *weights
If you replace that with what I have above it should save checkpoints and a 'last' set of weights when the model stops. Paths can be relative so naming things like 'last.model.kur' and 'checkpoint.model.kur' allows you to just create a new directory, copy in the yml files and execute if you want to start a new model.
You also need to make any similar changes to the 'validate' section.
If I make those changes, then kur doesn't run at all and fails with the exception:
Traceback (most recent call last):
File "~/.env/bin/kur", line 8, in <module>
sys.exit(main())
File "~/.env/lib/python3.7/site-packages/kur/__main__.py", line 492, in main
sys.exit(args.func(args) or 0)
File "~/.env/lib/python3.7/site-packages/kur/__main__.py", line 62, in train
spec = parse_kurfile(args.kurfile, args.engine)
File "~/.env/lib/python3.7/site-packages/kur/__main__.py", line 48, in parse_kurfile
spec.parse()
File "~/.env/lib/python3.7/site-packages/kur/kurfile.py", line 129, in parse
self.engine, builtin['train'], stack, include_key=True)
File "~/.env/lib/python3.7/site-packages/kur/kurfile.py", line 960, in _parse_section
evaluated = engine.evaluate(self.data[key], recursive=True)
File "~/.env/lib/python3.7/site-packages/kur/engine/engine.py", line 228, in evaluate
for k, v in expression.items()}
File "~/.env/lib/python3.7/site-packages/kur/engine/engine.py", line 228, in <dictcomp>
for k, v in expression.items()}
File "~/.env/lib/python3.7/site-packages/kur/engine/engine.py", line 228, in evaluate
for k, v in expression.items()}
File "~/.env/lib/python3.7/site-packages/kur/engine/engine.py", line 228, in <dictcomp>
for k, v in expression.items()}
File "~/.env/lib/python3.7/site-packages/kur/engine/engine.py", line 208, in evaluate
new_expression = self._evaluate(expression)
File "~/.env/lib/python3.7/site-packages/kur/engine/jinja_engine.py", line 189, in _evaluate
result = self.env.from_string(expression).render(**self._scope)
File "~/.env/lib/python3.7/site-packages/jinja2/environment.py", line 880, in from_string
return cls.from_code(self, self.compile(source), globals, None)
File "~/.env/lib/python3.7/site-packages/jinja2/environment.py", line 591, in compile
self.handle_exception(exc_info, source_hint=source_hint)
File "~/.env/lib/python3.7/site-packages/jinja2/environment.py", line 780, in handle_exception
reraise(exc_type, exc_value, tb)
File "~/.env/lib/python3.7/site-packages/jinja2/_compat.py", line 37, in reraise
raise value.with_traceback(tb)
File "<unknown>", line 1, in template
File "~/.env/lib/python3.7/site-packages/jinja2/environment.py", line 497, in _parse
return Parser(self, source, name, encode_filename(filename)).parse()
File "~/.env/lib/python3.7/site-packages/jinja2/parser.py", line 901, in parse
result = nodes.Template(self.subparse(), lineno=1)
File "~/.env/lib/python3.7/site-packages/jinja2/parser.py", line 876, in subparse
self.stream.expect('variable_end')
File "~/.env/lib/python3.7/site-packages/jinja2/lexer.py", line 384, in expect
self.name, self.filename)
jinja2.exceptions.TemplateSyntaxError: expected token 'end of print statement', got 'size'
Why would the default speech.yml need modifications just to save the trained network? Shouldn't those settings be the default?
I suspect instead of:
validation: "{{ batch size }}"
you meant to type:
validation: "{{ batch_size }}"
That lets it run for a short while, but then it still errors with:
Traceback (most recent call last):
File "~/.env/bin/kur", line 8, in <module>
sys.exit(main())
File "~/.env/lib/python3.7/site-packages/kur/__main__.py", line 492, in main
sys.exit(args.func(args) or 0)
File "~/.env/lib/python3.7/site-packages/kur/__main__.py", line 64, in train
func(step=args.step)
File "~/.env/lib/python3.7/site-packages/kur/kurfile.py", line 434, in func
return trainer.train(**defaults)
File "~/.env/lib/python3.7/site-packages/kur/model/executor.py", line 295, in train
**kwargs
File "~/.env/lib/python3.7/site-packages/kur/model/executor.py", line 564, in wrapped_train
checkpoint[k]))
ValueError: Expected "minutes" key in "checkpoint" to be an integer. Received:
Are you sure minutes: "{{ checkpoint_time }}"
is the correct syntax and variable name?
Sorry about the typo there! I should have told you to define batch_size and checkpoint_time in the settings section. They can be literals, but for clarity I would recommend defining them in settings:
batch_size: 16
checkpoint_time: 30
Of course, select whatever batch size and checkpoint you need here.