comply to final HIPE data format
mromanello opened this issue · 2 comments
mromanello commented
Changes to implement:
- naming of files (e.g.
HIPE-2022-v1.0-ajmc-train-de.tsv
) - move the dataset's version number to the document metadata, and remove from file name
- add namespaces to document metadata (TBC)
- change
EndOfLine
toEndOfSentence
(because that's what it is) - add
language
metadata
mromanello commented
see memo here https://github.com/hipe-eval/HIPE-2022-internal/issues/5
mromanello commented
more metadata fields to add:
hipe2022:applicable_columns
ajmc:license
W.r.t. license: go for CC-BY or CC-BY-NC (tbd).