Tokenizer + tags
Opened this issue · 0 comments
vchahun commented
Tokenizing bitext with <p>
tags fails:
echo "x ||| <p>x</p>" | ~/tools/cdec/corpus/tokenize-anything.sh
x |||<p> x</p>
Why these tags are in my corpus is another problem 😫
Opened this issue · 0 comments
Tokenizing bitext with <p>
tags fails:
echo "x ||| <p>x</p>" | ~/tools/cdec/corpus/tokenize-anything.sh
x |||<p> x</p>
Why these tags are in my corpus is another problem 😫