clarin-eric/ParlaMint

distro(TEI.ana) to teitok format conversion

Closed this issue · 0 comments

  • add script for conversion TEI.ana to teitok format to ParlaMint repository
    • inputs for conversion:
      • Build/Distro/ParlaMint-XX.TEI.ana (root and component files)
      • Build/Distro/ParlaMint-XX.conllu (-meta.tsv and -meta-en.tsv)
    • modification needed:
      • implement inlining <media>
      • change full filenames to <filename>.tt.xml
      • improve getting following and previous file (do not parse root multipletimes)
  • running make distro2teitok or make distro2teitok-XX
  • slurm enqueuing make slurm_distro2teitok
  • store result in Build/Teitok