redpony/cdec

Tokenizer + tags

Opened this issue · 0 comments

Tokenizing bitext with <p> tags fails:

echo "x ||| <p>x</p>" | ~/tools/cdec/corpus/tokenize-anything.sh
x |||<p> x</p>

Why these tags are in my corpus is another problem 😫