Sapphire-star/PFCR

关于数据集的问题

Closed this issue · 12 comments

请问数据集只需要下载百度网盘中的数据就行了吗,然后放在代码心里面,修改datapath的路径,就可以了吗?

请问数据集只需要下载百度网盘中的数据就行了吗,然后放在代码心里面,修改datapath的路径,就可以了吗?

是的,还需要在配置文件中改成对应的路径和数据集名称,pretrain.yaml或者finetune.yaml

我按照这个方法,在准备项目代码阶段(python all_pq.py --gpu_id 0),但一直报如下错误,:WARNING clustering 8367 points to 256 centroids: please provide at least 9984 training points。预训练和微调阶段我给的路径都是:data_path: 'dataset/or-pantry/'。我不知道哪里出了问题,您能告诉我吗?

我按照这个方法,在准备项目代码阶段(python all_pq.py --gpu_id 0),但一直报如下错误,:WARNING clustering 8367 points to 256 centroids: please provide at least 9984 training points。预训练和微调阶段我给的路径都是:data_path: 'dataset/or-pantry/'。我不知道哪里出了问题,您能告诉我吗?

这个警报是因为onlineretail这个数据集的item数量比较少导致有的聚类内样本可能不足,是faiss库的预警,但是不影响可以运行。您也可以使用我已经处理好的index文件,同样也在百度网盘的数据集文件中,其中采用了使用了各个位数的index文件,你可以直接开始预训练或者其他训练过程。

我继续执行联合预训练代码“python fed_pretrain.py”,信息如下,似乎前面的都正常执行,但是后面报错ValueError,我询问了gpt,但是我不知道是伯乐库版本错误还是(我下载的recbole是1.2.0,低版本的起冲突了,所以我下载的版本比您的版本高),还是负采样没有指定(gpt是这么告诉我的),请求您再帮我看一下这个问题(qwq)

29 May 19:49 INFO P
The number of users: 13102
Average actions of users: 8.691015952980688
The number of items: 4899
Average actions of items: 23.24642711310739
The number of inters: 113861
The sparsity of the dataset: 99.82260966283076%
Remain Fields: ['user_id', 'item_id_list', 'item_id', 'item_length']
29 May 19:49 INFO Index path: dataset/or-pantry/OP/OP.OPQ32,IVF1,PQ32x8.strict.index
29 May 19:49 INFO Loading filtered index mapping.
29 May 19:49 INFO Converting indexes.
Traceback (most recent call last):
File "/data/liupei/PFCR-main/fed_pretrain.py", line 145, in
model, dataset = pretrain(args.d)
^^^^^^^^^^^^^^^^
File "/data/liupei/PFCR-main/fed_pretrain.py", line 115, in pretrain
pretrain_data_A = TrainDataLoader(config_A, pretrain_dataset_A, None, shuffle=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/recbole/data/dataloader/general_dataloader.py", line 41, in init
self._set_neg_sample_args(
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/recbole/data/dataloader/abstract_dataloader.py", line 168, in _set_neg_sample_args
raise ValueError(
ValueError: neg sampling by with dl_format [None] not been implemented.

我继续执行联合预训练代码“python fed_pretrain.py”,信息如下,似乎前面的都正常执行,但是后面报错ValueError,我询问了gpt,但是我不知道是伯乐库版本错误还是(我下载的recbole是1.2.0,低版本的起冲突了,所以我下载的版本比您的版本高),还是负采样没有指定(gpt是这么告诉我的),请求您再帮我看一下这个问题(qwq)

29 May 19:49 INFO P
The number of users: 13102
Average actions of users: 8.691015952980688
The number of items: 4899
Average actions of items: 23.24642711310739
The number of inters: 113861
The sparsity of the dataset: 99.82260966283076%
Remain Fields: ['user_id', 'item_id_list', 'item_id', 'item_length']
29 May 19:49 INFO Index path: dataset/or-pantry/OP/OP.OPQ32,IVF1,PQ32x8.strict.index
29 May 19:49 INFO Loading filtered index mapping.
29 May 19:49 INFO Converting indexes.
Traceback (most recent call last):
File "/data/liupei/PFCR-main/fed_pretrain.py", line 145, in
model, dataset = pretrain(args.d)
^^^^^^^^^^^^^^^^
File "/data/liupei/PFCR-main/fed_pretrain.py", line 115, in pretrain
pretrain_data_A = TrainDataLoader(config_A, pretrain_dataset_A, None, shuffle=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/recbole/data/dataloader/general_dataloader.py", line 41, in init
self._set_neg_sample_args(
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/recbole/data/dataloader/abstract_dataloader.py", line 168, in _set_neg_sample_args
raise ValueError(
ValueError: neg sampling by with dl_format [None] not been implemented.

您的错误确实是recbole版本问题,您用的recbole 1.2.0关于负样本的设置方法与1.0.1还是有很大差距的,建议您使用1.0.1版本

然后我还想请问一下,就是下载recbole==1.2.0很容易,但是下载recbole==1.0.1似乎有一大串麻烦,如下,请问您是如何下载的呢?

Installing build dependencies ... done
Getting requirements to build wheel ... done
ERROR: Exception:
Traceback (most recent call last):
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/cli/base_command.py", line 180, in exc_logging_wrapper
status = run_func(*args)
^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/cli/req_command.py", line 245, in wrapper
return func(self, options, args)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/commands/install.py", line 377, in run
requirement_set = resolver.resolve(
^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 95, in resolve
result = self._result = resolver.resolve(
^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_vendor/resolvelib/resolvers.py", line 546, in resolve
state = resolution.resolve(requirements, max_rounds=max_rounds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_vendor/resolvelib/resolvers.py", line 427, in resolve
failure_causes = self._attempt_to_pin_criterion(name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_vendor/resolvelib/resolvers.py", line 239, in _attempt_to_pin_criterion
criteria = self._get_updated_criteria(candidate)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_vendor/resolvelib/resolvers.py", line 230, in _get_updated_criteria
self._add_to_criteria(criteria, requirement, parent=candidate)
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_vendor/resolvelib/resolvers.py", line 173, in _add_to_criteria
if not criterion.candidates:
^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_vendor/resolvelib/structs.py", line 156, in bool
return bool(self._sequence)
^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 155, in bool
return any(self)
^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 143, in
return (c for c in iterator if id(c) not in self._incompatible_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 47, in _iter_built
candidate = func()
^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 182, in _make_candidate_from_link
base: Optional[BaseCandidate] = self._make_base_candidate_from_link(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 228, in _make_base_candidate_from_link
self._link_candidate_cache[link] = LinkCandidate(
^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 290, in init
super().init(
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 156, in init
self.dist = self._prepare()
^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 222, in _prepare
dist = self._prepare_distribution()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 301, in _prepare_distribution
return preparer.prepare_linked_requirement(self._ireq, parallel_builds=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/operations/prepare.py", line 525, in prepare_linked_requirement
return self._prepare_linked_requirement(req, parallel_builds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/operations/prepare.py", line 640, in _prepare_linked_requirement
dist = _get_prepared_distribution(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/operations/prepare.py", line 71, in _get_prepared_distribution
abstract_dist.prepare_distribution_metadata(
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/distributions/sdist.py", line 54, in prepare_distribution_metadata
self._install_build_reqs(finder)
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/distributions/sdist.py", line 124, in _install_build_reqs
build_reqs = self._get_build_requires_wheel()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/distributions/sdist.py", line 101, in _get_build_requires_wheel
return backend.get_requires_for_build_wheel()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/utils/misc.py", line 745, in get_requires_for_build_wheel
return super().get_requires_for_build_wheel(config_settings=cs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_impl.py", line 166, in get_requires_for_build_wheel
return self._call_hook('get_requires_for_build_wheel', {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_impl.py", line 321, in _call_hook
raise BackendUnavailable(data.get('traceback', ''))
pip._vendor.pyproject_hooks._impl.BackendUnavailable: Traceback (most recent call last):
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 77, in _build_backend
obj = import_module(mod_path)
^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/importlib/init.py", line 90, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 1387, in _gcd_import
File "", line 1360, in _find_and_load
File "", line 1310, in _find_and_load_unlocked
File "", line 488, in _call_with_frames_removed
File "", line 1387, in _gcd_import
File "", line 1360, in _find_and_load
File "", line 1331, in _find_and_load_unlocked
File "", line 935, in _load_unlocked
File "", line 995, in exec_module
File "", line 488, in _call_with_frames_removed
File "/tmp/pip-build-env-xicqr9rt/overlay/lib/python3.12/site-packages/setuptools/init.py", line 16, in
import setuptools.version
File "/tmp/pip-build-env-xicqr9rt/overlay/lib/python3.12/site-packages/setuptools/version.py", line 1, in
import pkg_resources
File "/tmp/pip-build-env-xicqr9rt/overlay/lib/python3.12/site-packages/pkg_resources/init.py", line 73, in
from pkg_resources.extern import appdirs
ImportError: cannot import name 'appdirs' from 'pkg_resources.extern' (/tmp/pip-build-env-xicqr9rt/overlay/lib/python3.12/site-packages/pkg_resources/extern/init.py)

然后就是对于您的这个回答,我没有太明白,您只提供了一个百度网盘的链接,请问哪个是您已经处理好的数据
Snipaste_2024-05-29_20-15-51

这个警报是因为onlineretail这个数据集的item数量比较少导致有的聚类内样本可能不足,是faiss库的预警,但是不影响可以运行。您也可以使用我已经处理好的index文件,同样也在百度网盘的数据集文件中,其中采用了使用了各个位数的index文件,你可以直接开始预训练或者其他训练过程。

然后我还想请问一下,就是下载recbole==1.2.0很容易,但是下载recbole==1.0.1似乎有一大串麻烦,如下,请问您是如何下载的呢?

Installing build dependencies ... done
Getting requirements to build wheel ... done
ERROR: Exception:
Traceback (most recent call last):
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/cli/base_command.py", line 180, in exc_logging_wrapper
status = run_func(*args)
^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/cli/req_command.py", line 245, in wrapper
return func(self, options, args)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/commands/install.py", line 377, in run
requirement_set = resolver.resolve(
^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 95, in resolve
result = self._result = resolver.resolve(
^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_vendor/resolvelib/resolvers.py", line 546, in resolve
state = resolution.resolve(requirements, max_rounds=max_rounds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_vendor/resolvelib/resolvers.py", line 427, in resolve
failure_causes = self._attempt_to_pin_criterion(name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_vendor/resolvelib/resolvers.py", line 239, in _attempt_to_pin_criterion
criteria = self._get_updated_criteria(candidate)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_vendor/resolvelib/resolvers.py", line 230, in _get_updated_criteria
self._add_to_criteria(criteria, requirement, parent=candidate)
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_vendor/resolvelib/resolvers.py", line 173, in _add_to_criteria
if not criterion.candidates:
^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_vendor/resolvelib/structs.py", line 156, in bool
return bool(self._sequence)
^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 155, in bool
return any(self)
^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 143, in
return (c for c in iterator if id(c) not in self._incompatible_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 47, in _iter_built
candidate = func()
^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 182, in _make_candidate_from_link
base: Optional[BaseCandidate] = self._make_base_candidate_from_link(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 228, in _make_base_candidate_from_link
self._link_candidate_cache[link] = LinkCandidate(
^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 290, in init
super().init(
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 156, in init
self.dist = self._prepare()
^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 222, in _prepare
dist = self._prepare_distribution()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 301, in _prepare_distribution
return preparer.prepare_linked_requirement(self._ireq, parallel_builds=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/operations/prepare.py", line 525, in prepare_linked_requirement
return self._prepare_linked_requirement(req, parallel_builds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/operations/prepare.py", line 640, in _prepare_linked_requirement
dist = _get_prepared_distribution(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/operations/prepare.py", line 71, in _get_prepared_distribution
abstract_dist.prepare_distribution_metadata(
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/distributions/sdist.py", line 54, in prepare_distribution_metadata
self._install_build_reqs(finder)
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/distributions/sdist.py", line 124, in _install_build_reqs
build_reqs = self._get_build_requires_wheel()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/distributions/sdist.py", line 101, in _get_build_requires_wheel
return backend.get_requires_for_build_wheel()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_internal/utils/misc.py", line 745, in get_requires_for_build_wheel
return super().get_requires_for_build_wheel(config_settings=cs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_impl.py", line 166, in get_requires_for_build_wheel
return self._call_hook('get_requires_for_build_wheel', {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_impl.py", line 321, in _call_hook
raise BackendUnavailable(data.get('traceback', ''))
pip._vendor.pyproject_hooks._impl.BackendUnavailable: Traceback (most recent call last):
File "/opt/Anaconda/envs/py38/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 77, in _build_backend
obj = import_module(mod_path)
^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/Anaconda/envs/py38/lib/python3.12/importlib/init.py", line 90, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 1387, in _gcd_import
File "", line 1360, in _find_and_load
File "", line 1310, in _find_and_load_unlocked
File "", line 488, in _call_with_frames_removed
File "", line 1387, in _gcd_import
File "", line 1360, in _find_and_load
File "", line 1331, in _find_and_load_unlocked
File "", line 935, in _load_unlocked
File "", line 995, in exec_module
File "", line 488, in _call_with_frames_removed
File "/tmp/pip-build-env-xicqr9rt/overlay/lib/python3.12/site-packages/setuptools/init.py", line 16, in
import setuptools.version
File "/tmp/pip-build-env-xicqr9rt/overlay/lib/python3.12/site-packages/setuptools/version.py", line 1, in
import pkg_resources
File "/tmp/pip-build-env-xicqr9rt/overlay/lib/python3.12/site-packages/pkg_resources/init.py", line 73, in
from pkg_resources.extern import appdirs
ImportError: cannot import name 'appdirs' from 'pkg_resources.extern' (/tmp/pip-build-env-xicqr9rt/overlay/lib/python3.12/site-packages/pkg_resources/extern/init.py)

您最好先创建一个环境,先装recbole之后再装别的环境

然后就是对于您的这个回答,我没有太明白,您只提供了一个百度网盘的链接,请问哪个是您已经处理好的数据 Snipaste_2024-05-29_20-15-51

这个警报是因为onlineretail这个数据集的item数量比较少导致有的聚类内样本可能不足,是faiss库的预警,但是不影响可以运行。您也可以使用我已经处理好的index文件,同样也在百度网盘的数据集文件中,其中采用了使用了各个位数的index文件,你可以直接开始预训练或者其他训练过程。

这些数据集与论文里的对应,它们都是已经处理好的,其中包括使用了VQ之后得到的index文件

然后就是对于您的这个回答,我没有太明白,您只提供了一个百度网盘的链接,请问哪个是您已经处理好的数据 Snipaste_2024-05-29_20-15-51

这个警报是因为onlineretail这个数据集的item数量比较少导致有的聚类内样本可能不足,是faiss库的预警,但是不影响可以运行。您也可以使用我已经处理好的index文件,同样也在百度网盘的数据集文件中,其中采用了使用了各个位数的index文件,你可以直接开始预训练或者其他训练过程。

代码里主要会用到的是.inter的文件和.index的文件

请问这种数据集类型怎么查看:Pantry.feat1CLS

我查询了不少资料,但是还未发现查看该文件的方法,如果您知道的话,希望您可以告知我一下

请问这种数据集类型怎么查看:Pantry.feat1CLS

我查询了不少资料,但是还未发现查看该文件的方法,如果您知道的话,希望您可以告知我一下

这是经过处理后的语义编码文件,原始数据处理方式我在readme给了链接,用UnisRec同样的方式处理