Can you provide your original dataset and the processing file for extracting triples from OPENIE?
Closed this issue · 6 comments
Can you provide your original dataset and the processing file for extracting triples from OPENIE? I would like to learn more. Thank you very much
Hello. We have provided the original files of the Xsum dataset, as well as the triples extracted using OpenIE. Please refer to the "Preprocess Datasets" section in the ReadMe, which includes the relevant external links.
Hello. We have provided the original files of the Xsum dataset, as well as the triples extracted using OpenIE. Please refer to the "Preprocess Datasets" section in the ReadMe, which includes the relevant external links.
Thank you very much for your reply, but when generating the summary, it shows that the xsum-bin folder in the relevant link you provided is missing the test.source file. Can you provide it? Also, I would like to learn how to use OPENIE to extract Python files for triples. Would it be convenient to provide them? Thank you very much !
We have provided the original test.source file in the google driver now.
Please ref to https://github.com/philipperemy/stanford-openie-python for the usage of OPENIE.
And IMPORTANT, we do not provide a trained model. bart_rc.pt is just the weight of the pre-trained model, which needs to be fine-tuned by yourself.
Our model is built on fairseq. If you want to learn more, you can refer to the documentation: https://fairseq.readthedocs.io/en/latest/?badge=latest.
Building your own model with fairseq can be challenging, and you might need to read some parts of the fairseq source code. For educational purposes, it is generally advisable to use HuggingFace's Transformers library to construct your model, as it tends to be more user-friendly.
Feel free to ask me any questions.
@haruhi-sudo thank you very much! When I was training the model to the validation step, the following error occurred. I tried multiple modifications but still couldn't solve it. Do you have any suggestions on how to solve this problem? Thank you
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/deep/.conda/envs/torch3.7/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/home/deep/.conda/envs/torch3.7/lib/python3.7/site-packages/fairseq/distributed/utils.py", line 328, in distributed_main
main(cfg, **kwargs)
File "/home/deep/PRGEN/train.py", line 181, in main
valid_losses, should_stop = train(cfg, trainer, task, epoch_itr)
File "/home/deep/.conda/envs/torch3.7/lib/python3.7/contextlib.py", line 74, in inner
return func(*args, **kwds)
File "/home/deep/PRGEN/train.py", line 314, in train
cfg, trainer, task, epoch_itr, valid_subsets, end_of_epoch,
File "/home/deep/PRGEN/train.py", line 404, in validate_and_save
valid_losses = validate(cfg, trainer, task, epoch_itr, valid_subsets)
File "/home/deep/PRGEN/train.py", line 476, in validate
trainer.valid_step(sample)
File "/home/deep/.conda/envs/torch3.7/lib/python3.7/contextlib.py", line 74, in inner
return func(*args, **kwds)
File "/home/deep/PRGEN/trainer.py", line 1037, in valid_step
sample, self.model, self.criterion, **extra_kwargs
File "/home/deep/PRGEN/src/task/faithful_summary_task.py", line 282, in valid_step
metrics = self._inference_with_bleu(self.sequence_generator, sample, model)
File "/home/deep/PRGEN/src/task/faithful_summary_task.py", line 332, in _inference_with_bleu
gen_out = self.inference_step(generator, [model], sample, prefix_tokens=None)
File "/home/deep/.conda/envs/torch3.7/lib/python3.7/site-packages/fairseq/tasks/fairseq_task.py", line 541, in inference_step
models, sample, prefix_tokens=prefix_tokens, constraints=constraints
File "/home/deep/.conda/envs/torch3.7/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/deep/.conda/envs/torch3.7/lib/python3.7/site-packages/fairseq/sequence_generator.py", line 204, in generate
return self._generate(sample, **kwargs)
File "/home/deep/.conda/envs/torch3.7/lib/python3.7/site-packages/fairseq/sequence_generator.py", line 470, in _generate
assert step < max_len, f"{step} < {max_len}"
AssertionError: 60 < 60
This is strange because I don't have this problem. My guess is that the fairseq versions are different.
You can try changing this line to default='{"beam":6, "lenpen":1.0, "max_len_b":60, "min_len":10, "no_repeat_ngram_size":3, "match_source_len":1}', See if that solves the problem.