Long sentences are not being removed apparently
Closed this issue · 2 comments
cgr71ii commented
Hi!
Either monofixer
or bifixer
should remove long sentences when the number of words is greater than 5000:
Line 195 in 1a91e3e
Line 215 in 1a91e3e
The problem is that, apparently, it seems that it is not working:
pip3 install bifixer==0.8.3
# monofixer
python -c "print('asd'); print(' '.join(['a']*6000)); print('asd')" \
| monofixer --scol 1 --ignore_duplicates -q - - es \
| wc -w
# 6002
python -c "print('asd'); print(' '.join(['a']*6000)); print('asd')" \
| monofixer --scol 1 --ignore_duplicates --ignore_long -q - - es \
| wc -w
# 6002
# bifixer
python -c "print('asd\tasd'); print('asd\t' + ' '.join(['a']*6000)); print('asd\tasd')" \
| bifixer --scol 1 --tcol 2 --ignore_duplicates -q - - en es \
| wc -w
# 6005
python -c "print('asd\tasd'); print('asd\t' + ' '.join(['a']*6000)); print('asd\tasd')" \
| bifixer --scol 1 --tcol 2 --ignore_long --ignore_duplicates -q - - en es \
| wc -w
# 6005
Am I doing something wrong?
Thank you!
mbanon commented
Long sentences are not being removed, they are just ignored (not processed, but outputted).
It's not correct at the documentation, I'm fixing it.
cgr71ii commented
Oh! Ok, thank you!