--output-newlines squeezes multiple newlines
flammie opened this issue · 2 comments
flammie commented
Command-line.
flammie@saarkaany ~/Koodit/mt-development/complexity-stats (2145) [01:21:54]
$ cat > kolme
yksi
kolme
flammie@saarkaany ~/Koodit/mt-development/complexity-stats (2146) [01:22:08]
$ morfessor -l europarl-v7.fi-en.fi.morfessor --output-format-separator '> <' --output-newlines --output-format '{analysis} ' -T - < kolme
INFO:morfessor.io:Loading model from 'europarl-v7.fi-en.fi.morfessor'...
INFO:morfessor.io:Done.
No training data files specified.
Segmenting test data...
INFO:morfessor.io:Reading corpus from '-'...
yksi
kolme
INFO:morfessor.io:Done.
Done.
There should be empty line between yksi and kolme. This is useful for machine translation pipeline where the tools commonly fail when lines don't match.
psmit commented
That looks like a bug indeed. I fixed this now temporarily in https://github.com/phsmit/morfessor/tree/newline_fix . I'll test it later and if it doesn't break anything the fix will be included in the next release.
svirpioj commented
This had been fixed in release 2.0.2.