Do we need remove punctuation and stop words when process the abstract text?
Closed this issue · 1 comments
zhhengcs commented
In the script of preprocessing.py,I saw you save punctuations and some words that may be stopwords in "tokenized_train_pairs",should we filter these and will these affect the result?
memray commented
I don't really know how much the preprocessing would affect the results. The data is pretty noisy so I only keep a few punctuations (,. etc.).