aalto-speech/morfessor

--output-newlines squeezes multiple newlines

flammie opened this issue · 2 comments

Command-line.

flammie@saarkaany ~/Koodit/mt-development/complexity-stats (2145) [01:21:54] 
$ cat > kolme
yksi

kolme
flammie@saarkaany ~/Koodit/mt-development/complexity-stats (2146) [01:22:08] 
$ morfessor -l europarl-v7.fi-en.fi.morfessor --output-format-separator '> <' --output-newlines --output-format '{analysis} ' -T - < kolme
INFO:morfessor.io:Loading model from 'europarl-v7.fi-en.fi.morfessor'...
INFO:morfessor.io:Done.
No training data files specified.
Segmenting test data...
INFO:morfessor.io:Reading corpus from '-'...
yksi 
kolme 
INFO:morfessor.io:Done.

Done.

There should be empty line between yksi and kolme. This is useful for machine translation pipeline where the tools commonly fail when lines don't match.

That looks like a bug indeed. I fixed this now temporarily in https://github.com/phsmit/morfessor/tree/newline_fix . I'll test it later and if it doesn't break anything the fix will be included in the next release.

This had been fixed in release 2.0.2.