Jyonn/ONCE

FileNotFoundError

Closed this issue · 6 comments

Hello,

I encountered an issue while running a command with a specific configuration for the repository. Below is the command I used:

python worker.py --embed config/embed/llama-token.yaml --model config/model/llama-naml.yaml --exp config/exp/llama-split.yaml --data config/data/mind-llama.yaml --version small --llm_ver 7b --hidden_size 64 --layer 0 --lora 0 --fast_eval 0 --embed_hidden_size 4096 --model config/model/llm/llama-nrms.yaml --max_news_batch_size 0

After executing the above command, I received the following error message:
FileNotFoundError: [Errno 2] No such file or directory: 'data/MIND-small/neg/meta.data.json'

I have already downloaded the MIND-small dataset as suggested, but it appears that there is no 'neg' directory within it. Could you please advise on how I might resolve this issue? I'm wondering if there's an additional step or configuration that I might be missing to generate or obtain the 'neg' directory within the MIND-small dataset.

Thank you for your assistance.

Hi,

Thanks for your attention to our work!
Please refer here for the solution. Just remove the neg line. Sorry for the inconvenience!

I removed the line referencing the 'neg' directory, and the initial FileNotFoundError was resolved. However, I've encountered a different error now. Here are the details:

loaded 94057 samples from data/MIND-small/user
[00:00:04] |Depots| Filter history with x in test phase, sample num: 1368829 -> 1328885
fpath Traceback (most recent call last):
File "/mnt/workspace/Legommenders/worker.py", line 488, in
worker = Worker(config=configuration)
File "/mnt/workspace/Legommenders/worker.py", line 58, in init
self.controller = Controller(
File "/mnt/workspace/Legommenders/loader/controller.py", line 49, in init
self.item_hub = DataHub(
File "/mnt/workspace/Legommenders/loader/data_hub.py", line 16, in init
self.depot = depot if isinstance(depot, UniDep) else DepotHub.get(depot)
File "/mnt/workspace/Legommenders/loader/depot/depot_hub.py", line 14, in get
depot = CachingDep(path, filter_cache=filter_cache)
File "/mnt/workspace/Legommenders/loader/depot/caching_depot.py", line 17, in init
super().init(store_dir, **kwargs)
File "/opt/conda/envs/env/lib/python3.10/site-packages/UniTok/unidep.py", line 25, in init
self.store_dir = os.path.expanduser(store_dir)
File "/opt/conda/envs/env/lib/python3.10/posixpath.py", line 232, in expanduser
print('fpath', path)
File "/opt/conda/envs/env/lib/python3.10/site-packages/oba/oba.py", line 34, in str
raise ValueError(f'Path {NoneObj.raw(self)} not exists')
ValueError: Path item.depot not exists

Hi,

It seems the item configuration is not correct. Can you provide the entire configuration dict which is printed at the beginning of the output?

[00:00:00] |Worker| python worker.py --embed config/embed/llama-token.yaml --model config/model/llama-naml.yaml --exp config/exp/llama-split.yaml --data config/data/mind-llama.yaml --version small --llm_ver 7b --hidden_size 64 --layer 0 --lora 0 --fast_eval 0 --embed_hidden_size 4096 --model config/model/llm/llama-nrms.yaml
[00:00:00] |Worker| {
"embed": {
"name": "llama-token",
"embeddings": [
{
"vocab_name": "llama",
"vocab_type": "numpy",
"path": "data/llama-token.npy",
"frozen": true
}
]
},
"model": {
"name": "LLAMA-NRMS.D64.L0.Lora0",
"meta": {
"item": "Llama",
"user": "Attention",
"predictor": "Dot"
},
"config": {
"use_news_content": true,
"max_news_content_batch_size": 0,
"same_dim_transform": false,
"embed_hidden_size": 4096,
"hidden_size": 64,
"neg_count": 4,
"news_config": {
"llm_dir": "/home/data1/qijiong/llama-7b",
"layer_split": 0,
"lora": 0,
"weights_dir": "data/MIND-small-Llama/llama-7b-split"
},
"user_config": {
"num_attention_heads": 8,
"inputer_config": {
"use_cls_token": false,
"use_sep_token": false
}
}
}
},
"exp": {
"name": "test_llm_layer_split",
"dir": "saving/MIND-small-Llama/LLAMA-NRMS.D64.L0.Lora0/llama-token-test_llm_layer_split",
"log": "saving/MIND-small-Llama/LLAMA-NRMS.D64.L0.Lora0/llama-token-test_llm_layer_split/exp.log",
"mode": "test_llm_layer_split",
"store": {
"layers": [
31,
30,
29,
27
],
"dir": "data/MIND-small-Llama/llama-7b-split"
},
"load": {
"save_dir": null,
"model_only": true,
"strict": true,
"wait": false
},
"policy": {
"device": "gpu",
"batch_size": 64
}
},
"data": {
"name": "MIND-small-Llama",
"base_dir": "data/MIND-small",
"news": {
"filter_cache": true,
"depot": "data/MIND-small/news-llama",
"order": [
"title",
"cat"
],
"append": [
"nid"
]
},
"user": {
"filter_cache": true,
"depots": {
"train": {
"path": "data/MIND-small/train"
},
"dev": {
"path": "data/MIND-small/dev"
},
"test": {
"path": "data/MIND-small/test"
}
},
"filters": {
"history": [
"x"
]
},
"union": [
"data/MIND-small/user"
],
"candidate_col": "nid",
"clicks_col": "history",
"label_col": "click",
"neg_col": "neg",
"group_col": "imp",
"user_col": "uid",
"index_col": "index"
}
},
"version": "small",
"llm_ver": "7b",
"hidden_size": 64,
"layer": 0,
"lora": 0,
"fast_eval": 0,
"embed_hidden_size": 4096,
"warmup": 0,
"simple_dev": false,
"batch_size": 64,
"acc_batch": 1,
"lora_r": 32,
"lr": 0.0001,
"item_lr": 1e-05,
"mind_large_submission": false,
"epoch_batch": 0,
"max_item_batch_size": 0,
"page_size": 512,
"patience": 2,
"epoch_start": 0,
"frozen": true,
"load_path": null,
"rand": {},
"time": {},
"seed": 2023
}
[00:00:00] |GPU| choose 0 GPU with 32507 / 32510 MB
[00:00:00] |Controller| dataset type: news
[00:00:00] |Controller| build column map ...
[00:00:01] |CachingDep| load 1 filter caches on
UniDep (2.0): data/MIND-small/test

    Sample Size: 1368829
    Id Column: index
    Columns:
            index, vocab index (size 1368829)
            imp, vocab imp (size 36576)
            uid, vocab uid (size 94057)
            nid, vocab nid (size 65238)
            click, vocab click (size 2)

[00:00:03] |Depots| Filter history with x in test phase, sample num: 1368829 -> 1328885

Hi,

There are several configurations that you should modify.

  • The main reason for this error is that you did not specify the correct item data path in mind-llama.yaml. Due to the Legommenders framework update, please first replace keys containing "news" with "item" in this file. Next, you should also check whether the depot path "data/MIND-small/news-llama" is correct. The path is based on the project root directory.
  • You should also change "llm_dir" in llama_split.yaml to your llama-7b path. Please ensure that the transformers library can load pretrain LLaMA model with this path. Also, please replace all keys containing "news" with "item" in this file.

I will correct the configurations as soon as possible.

I have corrected the configurations fault due to the framework update. Please use git pull to ensure Legommenders up-to-date.

By the way, since this issue is related to model training using another framework, I recommend you to open issues in Legommenders repository. Thanks!