Error when continuing training
Opened this issue · 7 comments
When an earlier run is found, I get the following error message:
Found earlier run, continuing training.
Traceback (most recent call last):
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "D:\Development\depRL\deprl\main.py", line 151, in <module>
main()
File "D:\Development\depRL\deprl\main.py", line 147, in main
train(config)
File "D:\Development\depRL\deprl\main.py", line 77, in train
logger.initialize(
File "D:\Development\depRL\deprl\vendor\tonic\utils\logger.py", line 272, in initialize
current_logger = Logger(*args, **kwargs)
File "D:\Development\depRL\deprl\vendor\tonic\utils\logger.py", line 79, in __init__
create_resumed_results_path(config, env)
File "D:\Development\depRL\deprl\vendor\tonic\utils\path_utils.py", line 10, in wrapper
result = func(*args, **kwargs)
File "D:\Development\depRL\deprl\vendor\tonic\utils\logger.py", line 56, in create_resumed_results_path
folder = get_sorted_folders(folders[0][1])[-1]
File "D:\Development\depRL\deprl\vendor\tonic\utils\logger.py", line 27, in get_sorted_folders
sorted_folders = sorted(folders, key=get_datetime_key)
File "D:\Development\depRL\deprl\vendor\tonic\utils\logger.py", line 18, in get_datetime_key
date_time_str = s.split(".")[0] + s.split(".")[1]
IndexError: list index out of range
Hi tgeijten, what exactly did you run and when did this error occur?
Hi Pierre, here's an example of steps to reproduce the issue (Windows 10):
- Run:
python -m deprl.main scone_run_h0918.yaml
- Wait until some checkpoints are generated
- Cancel the optimization
- Run again:
python -m deprl.main scone_run_h0918.yaml
Any idea yet what could be causing this? If you point me at the right bit of code, I can have a look for myself 😁
hey Thomas,
some folder path isn't recognised correctly and then the string splitting tries to index something which doesn't exist.
I believe this error only happens on windows, because of some difference in how folder paths are handled.
I can take a look at the linux version to make sure it works. I'll try to get you a more precise update on it, which can help you fix it for windows
I'm relatively certain it's happening in this line:
depRL/deprl/vendor/tonic/utils/logger.py
Line 56 in 8193e84
Thanks for the update. Let me know if there's anything I can do to help testing on Windows.
Hey Thomas,
I pushed an update to the dev branch on the repo, that should print the path that is being loaded.
Can you try again after installing from the dev branch?
I also ran github actions on windows, but I can't run hyfydy, as I didn't install the license key on the github test cluster.
My tests worked, so it might be related to the specific path of the reloaded hyfydy experiment.
When running it again, can you search for a line that looks like this and report back:
Found earlier run, continuing training: Path is: ./tests/test_DEPRL\myoLeg
Thanks, this is what I get:
Found earlier run, continuing training: Path is: D:/Dropbox/Documents/SCONE/_output\sconerun_h0918_v1
I suppose this path doesn't include the date/time string, so the subsequent call get_datetime_key()
fails?