Create preprocessed training files: metadata.json is missing ids in the train.txt, test.txt and val.txt

Question

Create preprocessed training files: metadata.json is missing ids in the train.txt, test.txt and val.txt

shauryr opened this issue 4 years ago · 6 comments

When I run the following -

python specter/data_utils/create_training_files.py \
--data-dir data/training \
--metadata data/training/metadata.json \
--outdir data/preprocessed/

I get done getting triplets, success rate:0.00%

and my data-metrics.json looks like -

{
  "train": 0,
  "val": 0,
  "test": 0
}

I debugged the code and found that at line
there is a key error when self.metadata is called.
Looks like the ids in train.txt, val.txt and test.txt are not in the metadata.json file

Please help and share the correct metadata.json file

Answer 1 · 2020-09-08T01:54:44.000Z

I got the same problem.
It seems that metadata.json requires 'paper_id' in addition to 'title' and 'abstract'.

Answer 2 · 2020-09-25T00:12:59.000Z

The sample metadata file was updated and this should be fixed now. Let us know if you still have issues.

Answer 3 · 2020-10-27T10:40:25.000Z

I still have the same problem. Apparently, most paper_ids do not match. For example:

2020-10-27 11:38:16,851,851 ERROR [create_training_files.py:358] '1a090df137014acab572aa5dc23449b270db64b4'
2020-10-27 11:38:16,852,852 INFO [create_training_files.py:362] done getting triplets, success rate:0.00%,total: 15

Answer 4 · 2020-12-04T17:19:47.000Z

@armancohan any updates here?

Answer 5 · 2021-04-25T22:35:32.000Z

The data.json contains many ids that don't exist in metadata.json
I made up a new data.json that works
data.txt

Answer 6 · 2021-08-19T00:42:50.000Z

@yrrah thanks for the solution. It works for me!