BeyonderXX/InstructUIE

训练时数据集下载出现问题,data_dir是不是下载的ie_instruction的路径呢?

vv521 opened this issue · 2 comments

vv521 commented

Traceback (most recent call last):
File "src/run_uie.py", line 560, in
main()
File "src/run_uie.py", line 296, in main
raw_datasets = load_dataset(
File "/root/miniconda3/envs/instruct-uie/lib/python3.8/site-packages/datasets/load.py", line 1694, in load_dataset
builder_instance.download_and_prepare(
File "/root/miniconda3/envs/instruct-uie/lib/python3.8/site-packages/datasets/builder.py", line 595, in download_and_prepare
self._download_and_prepare(
File "/root/miniconda3/envs/instruct-uie/lib/python3.8/site-packages/datasets/builder.py", line 683, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/root/miniconda3/envs/instruct-uie/lib/python3.8/site-packages/datasets/builder.py", line 1075, in _prepare_split
for key, record in utils.tqdm(
File "/root/miniconda3/envs/instruct-uie/lib/python3.8/site-packages/tqdm/std.py", line 1182, in iter
for obj in iterable:
File "/root/.cache/huggingface/modules/datasets_modules/datasets/uie_dataset/f3e8d02f5ffb4e66435bbe181a28a4403a3ee701bbd8a110780c5e90beb86581/uie_dataset.py", line 661, in _generate_examples
assert os.path.exists(ds_path)
AssertionError
[2023-09-06 10:52:42,767] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 1069
[2023-09-06 10:52:42,767] [ERROR] [launch.py:324:sigkill_handler] ['/root/miniconda3/envs/instruct-uie/bin/python', '-u', 'src/run_uie.py', '--local_rank=0', '--do_train', '--do_predict', '--predict_with_generate', '--model_name_or_path', '/root/InstructUIE-master/model_cache/t5-base', '--data_dir', '/root/InstructUIE-master/data/IE_INSTRUCTIONS/RE/ADE_corpus/train.json', '--task_config_dir', '/root/InstructUIE-master/configs/multi_task_configs', '--instruction_file', '/root/InstructUIE-master/configs/instruction_config.json', '--instruction_strategy', 'single', '--output_dir', 'output/t5-re-single', '--input_record_file', 'flan-t5.record', '--per_device_train_batch_size', '8', '--per_device_eval_batch_size', '16', '--gradient_accumulation_steps', '8', '--learning_rate', '5e-03', '--num_train_epochs', '5', '--deepspeed', 'configs/ds_configs/stage0.config', '--run_name', 't5-base-mult-mi-experiment', '--max_source_length', '512', '--max_target_length', '50', '--generation_max_length', '50', '--max_num_instances_per_task', '10000', '--max_num_instances_per_eval_task', '200', '--add_task_name', 'False', '--add_dataset_name', 'False', '--num_examples', '0', '--overwrite_output_dir', '--overwrite_cache', '--lr_scheduler_type', 'constant', '--warmup_steps', '0', '--logging_strategy', 'steps', '--logging_steps', '500', '--evaluation_strategy', 'no', '--save_strategy', 'steps', '--save_steps', '2000'] exits with return code = 1

我们的数据集都是提前下好的,这里是预处理到缓存,不存在下载的说法。

vv521 commented

谢谢您的答疑。另外我还有一个问题想请教一下,就是在训练时,IE_INSTRUCTION这个数据集中只留下RE和NER两个数据集就跑不通,但是留着全部数据集却跑通了,这是为什么呢?期待您的答复