mlfoundations/dclm

Getting path issue when trying to load language model

Closed this issue · 2 comments

When running the processing using the language page enricher which I do via adding

    - func: detect_lang_whole_page_enricher
      model: fasttext

to my processing yaml

I get

(process_local_chunk pid=16263, ip=10.1.17.85)     return FastText(modelpath=path)
(process_local_chunk pid=16263, ip=10.1.17.85)   File "/home/ubuntu/miniconda3/envs/rapids-22.06/lib/python3.9/site-packages/fasttext/FastText.py", line 93, in init
(process_local_chunk pid=16263, ip=10.1.17.85)     self.f.loadModel(model_path)
(process_local_chunk pid=16263, ip=10.1.17.85) ValueError: /tmp/ray/session_2024-07-29_22-57-54_316975_7231/runtime_resources/working_dir_files/_ray_pkg_c26c2eecee54cf21/baselines/mappers/enrichers/language_id_enrichment_models/lid.176.bin cannot be opened for loading!

I specifically made sure to put the model in baselines/mappers/enrichers/language_id_enrichment_models/ before running the script but it doesn't seem to be copied over to the Ray directory when running.

The model expects a specific path when we load. Perhaps it might be better to download the model if it's not there. Can put up a PR if helpful

do you mind trying with the latest version of process.py that just got merged? would allow you to not have to use the ray working_dir

Marking as closed for now. Please reopen if the issue is not fixed by the latest version of the code.