aalto-speech/morfessor

Segmented output format

valentinmace opened this issue · 2 comments

I used morfessor-segment -L en.model test.data > test.morf

It works, however the text in my resulting file test.morf has a word on each line. As I am using corpus with one sentence on each line I would like to have to same output format but I cannot find how to achieve that

Thanks in advance

Answer:

morfessor-segment -L en.model -o test.morf --output-format '{analysis} ' test.data --output-newlines

Maybe you want to add a --output-format-separator '@@ ' to cleary define where words are segmented

Yes, the output format options are documented at https://morfessor.readthedocs.io/en/latest/cmdtools.html#data-format-command-line-options, but not so clearly. Thanks for providing the answer, too!