IndicoDataSolutions/RealKIE

About QA dataset (csv) format

Opened this issue · 1 comments

Hello,

Thanks for such an interesting dataset.
I do not need the whole set of scanned images, but only the textual information.
Is there a way to get the OCR-ed results and key-value pairs in CSV? I assume the output of dataset_scripts/get_dataset_qa.sh is what I am looking for.
The aws s3 sync download is much more massive than I thought, and I hope there is a workaround for it.

Best regards,
Jinu

Apologies for the delayed response. You should just be able to download using the following link patterns
https://s3.us-east-2.wasabisys.com/project-fruitfly/{dataset}/{split}.csv

For splits: train, test, val
and datasets: fcc_invoices, s1_pages, nda, charities, resource_contracts