PreProcessing
pratikghanwat7 opened this issue · 2 comments
pratikghanwat7 commented
Hello,
Are you doing any kind of preprocessing on input text? such as stopwords removal, tokenization, lemmatize, or any other text cleaning process?
dmmiller612 commented
In the basic setup, I am not doing any preprocessing. In some of my research over a year ago, I looked into that with different spacy operations, but results were largely inconclusive (I also didn't spend a lot of time on it). With the current library, this could be done by a custom SentenceHandler.
kingabzpro commented
I have used your model, it works perfectly for small sentences but it kinda breaks with larger documents. I wanted to give you more insight but right now I am busy with a project and return to you with a detailed explanation later.