Opusfilter fails to compress data when it is downloaded via moses
thfrkielikone opened this issue · 3 comments
thfrkielikone commented
Running this:
steps:
- type: opus_read
parameters:
corpus_name: OpenSubtitles
source_language: fi
target_language: en
release: v2018
preprocessing: moses
src_output: opensubtitles.fi.gz
tgt_output: opensubtitles.en.gz
suppress_prompts: true
Results in files opensubtitles.fi.gz and opensubtitles.en.gz that are in fact plain text.
svirpioj commented
Seems that there are also some other issues regarding the integration with the latest OpusTools using moses preprocssing, like setting output_directory
makes the process totally fail. I'll look into this, but I think the problems are on OpusTool's side (ping @miau1).
svirpioj commented
I suggest using the raw
or xml
options for preprocessing until we get this fixed.
svirpioj commented
Fixed in 3.2.0. It is now recommended to download corpora using the moses
preprocessing.