dmmiller612/bert-extractive-summarizer

PreProcessing

pratikghanwat7 opened this issue · 2 comments

Hello,
Are you doing any kind of preprocessing on input text? such as stopwords removal, tokenization, lemmatize, or any other text cleaning process?

In the basic setup, I am not doing any preprocessing. In some of my research over a year ago, I looked into that with different spacy operations, but results were largely inconclusive (I also didn't spend a lot of time on it). With the current library, this could be done by a custom SentenceHandler.

I have used your model, it works perfectly for small sentences but it kinda breaks with larger documents. I wanted to give you more insight but right now I am busy with a project and return to you with a detailed explanation later.