Create preprocessed training files: metadata.json is missing ids in the train.txt, test.txt and val.txt
shauryr opened this issue ยท 6 comments
When I run the following -
python specter/data_utils/create_training_files.py \
--data-dir data/training \
--metadata data/training/metadata.json \
--outdir data/preprocessed/
I get done getting triplets, success rate:0.00%
and my data-metrics.json looks like -
{
"train": 0,
"val": 0,
"test": 0
}
I debugged the code and found that at line
there is a key error when self.metadata is called.
Looks like the ids in train.txt, val.txt and test.txt are not in the metadata.json file
Please help and share the correct metadata.json file
I got the same problem.
It seems that metadata.json requires 'paper_id' in addition to 'title' and 'abstract'.
The sample metadata file was updated and this should be fixed now. Let us know if you still have issues.
I still have the same problem. Apparently, most paper_ids
do not match. For example:
2020-10-27 11:38:16,851,851 ERROR [create_training_files.py:358] '1a090df137014acab572aa5dc23449b270db64b4'
2020-10-27 11:38:16,852,852 INFO [create_training_files.py:362] done getting triplets, success rate:0.00%,total: 15
@armancohan any updates here?