memray/seq2seq-keyphrase-pytorch

Do we need remove punctuation and stop words when process the abstract text?

Closed this issue · 1 comments

In the script of preprocessing.py,I saw you save punctuations and some words that may be stopwords in "tokenized_train_pairs",should we filter these and will these affect the result?

I don't really know how much the preprocessing would affect the results. The data is pretty noisy so I only keep a few punctuations (,. etc.).