martius-lab/depRL

Error when continuing training

Opened this issue · 7 comments

When an earlier run is found, I get the following error message:

Found earlier run, continuing training.
Traceback (most recent call last):
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "D:\Development\depRL\deprl\main.py", line 151, in <module>
    main()
  File "D:\Development\depRL\deprl\main.py", line 147, in main
    train(config)
  File "D:\Development\depRL\deprl\main.py", line 77, in train
    logger.initialize(
  File "D:\Development\depRL\deprl\vendor\tonic\utils\logger.py", line 272, in initialize
    current_logger = Logger(*args, **kwargs)
  File "D:\Development\depRL\deprl\vendor\tonic\utils\logger.py", line 79, in __init__
    create_resumed_results_path(config, env)
  File "D:\Development\depRL\deprl\vendor\tonic\utils\path_utils.py", line 10, in wrapper
    result = func(*args, **kwargs)
  File "D:\Development\depRL\deprl\vendor\tonic\utils\logger.py", line 56, in create_resumed_results_path
    folder = get_sorted_folders(folders[0][1])[-1]
  File "D:\Development\depRL\deprl\vendor\tonic\utils\logger.py", line 27, in get_sorted_folders
    sorted_folders = sorted(folders, key=get_datetime_key)
  File "D:\Development\depRL\deprl\vendor\tonic\utils\logger.py", line 18, in get_datetime_key
    date_time_str = s.split(".")[0] + s.split(".")[1]
IndexError: list index out of range

Hi tgeijten, what exactly did you run and when did this error occur?

Hi Pierre, here's an example of steps to reproduce the issue (Windows 10):

  1. Run: python -m deprl.main scone_run_h0918.yaml
  2. Wait until some checkpoints are generated
  3. Cancel the optimization
  4. Run again: python -m deprl.main scone_run_h0918.yaml

Any idea yet what could be causing this? If you point me at the right bit of code, I can have a look for myself 😁

hey Thomas,
some folder path isn't recognised correctly and then the string splitting tries to index something which doesn't exist.
I believe this error only happens on windows, because of some difference in how folder paths are handled.

I can take a look at the linux version to make sure it works. I'll try to get you a more precise update on it, which can help you fix it for windows

I'm relatively certain it's happening in this line:

folder = get_sorted_folders(folders[0][1])[-1]

Thanks for the update. Let me know if there's anything I can do to help testing on Windows.

Hey Thomas,
I pushed an update to the dev branch on the repo, that should print the path that is being loaded.
Can you try again after installing from the dev branch?

I also ran github actions on windows, but I can't run hyfydy, as I didn't install the license key on the github test cluster.
My tests worked, so it might be related to the specific path of the reloaded hyfydy experiment.

When running it again, can you search for a line that looks like this and report back:
Found earlier run, continuing training: Path is: ./tests/test_DEPRL\myoLeg

Thanks, this is what I get:

Found earlier run, continuing training: Path is: D:/Dropbox/Documents/SCONE/_output\sconerun_h0918_v1

I suppose this path doesn't include the date/time string, so the subsequent call get_datetime_key() fails?