smallcloudai/refact

Finetune failed with "No train files provided"

Closed this issue · 0 comments

GPU: 2080Ti
Start "Run finetune"
Finetune failed with raise RuntimeError("No train files provided")
Log:

stats: 7 good, 0 too large, 0 generated, 0 vendored
marking 7 files from 7_files.zip to which_set="train", to_db=False
total files 7
dedup...
after dedup 7 files
Reading /perm_storage/cfg/sources_filetypes.cfg
Will not overwrite '/tmp/unpacked-files/train_set.jsonl' because it is exactly the same as the current output
Will not overwrite '/tmp/unpacked-files/test_set.jsonl' because it is exactly the same as the current output
Will not overwrite '/tmp/unpacked-files/database_set.jsonl' because it is exactly the same as the current output
Loading status tracker...
Loading finetune configs...
Reading /perm_storage/cfg/finetune_filter.cfg
Reading /perm_storage/cfg/finetune.cfg
Loading file sets context...
Loading status tracker...
Loading finetune configs...
Reading /perm_storage/cfg/finetune.cfg
Calculating finetune optimal parameters
Retrieving dataset length per epoch, it may take a while...
Finetune is failed
Exception: No train files provided

fcfc48745c10 Caught exception:
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/refact_enterprise/finetune/finetune_train.py", line 13, in <module>
    main(models_mini_db)
  File "/usr/local/lib/python3.10/dist-packages/self_hosting_machinery/finetune/scripts/finetune_train.py", line 251, in main
    raise e
  File "/usr/local/lib/python3.10/dist-packages/self_hosting_machinery/finetune/scripts/finetune_train.py", line 225, in main
    finetune_cfg = copy.deepcopy(_build_finetune_config_by_heuristics(models_db))
  File "/usr/local/lib/python3.10/dist-packages/self_hosting_machinery/finetune/scripts/finetune_train.py", line 39, in _build_finetune_config_by_heuristics
    ds_len = get_ds_len_per_epoch(user_cfg['model_name'], cfg_builder)
  File "/usr/local/lib/python3.10/dist-packages/self_hosting_machinery/finetune/scripts/aux/dataset.py", line 51, in get_ds_len_per_epoch
    ds = create_train_dataloader(
  File "/usr/local/lib/python3.10/dist-packages/self_hosting_machinery/finetune/scripts/aux/dataset.py", line 87, in create_train_dataloader
    raise RuntimeError("No train files provided")
RuntimeError: No train files provided

Additional info:
If try "Run filter" that get OOM. If you change context from 4096 to 2048 in refact.py, than everything is working.

Another similar case:
If you have some problems with libs, e.g. if you delete flash-attn and try to run finetune, you get "No train files provided" error and gpu filtering logs like this:

Image