characters been replaced from original string
Opened this issue · 4 comments
Hi team
we're using this to split text to sentences, but we found that some charater been replaced after splitting
e.g, ASCII 32 and 160
how can I keep the orignal character, I need to do some comparing work with original text
hey yarnping - sure, I'm happy to help. You're right, it should never miss characters after splitting sentences.
Can you create an example of it failing?
thanks
than you, here 's the pic from sublime text
<script>
const text = "“I . . . maybe. I must say, the line between excellent career choice and critical life screwup is getting a bit blurry.”";
const doc = nlp(text);
const sentences = doc.sentences().out('array')
console.log(text);
console.log(sentences[0]);
</script>
hey yarnping, i think the unicode characters that were giving your trouble may be missing from your example text. This case works as expected for me:
nlp(`I . . . maybe. I must say, the line between `).debug()
maybe the github UI cleaned them up somehow? let me know if I can help reproducing this problem
thanks
example.txt
sure, here's the exmple text