Do we need remove punctuation and stop words when process the abstract text?

Question

Do we need remove punctuation and stop words when process the abstract text?

Closed this issue 6 years ago · 1 comments

In the script of preprocessing.py,I saw you save punctuations and some words that may be stopwords in "tokenized_train_pairs",should we filter these and will these affect the result?

Answer 1 · 2018-08-26T09:05:29.000Z

I don't really know how much the preprocessing would affect the results. The data is pretty noisy so I only keep a few punctuations (,. etc.).