Questions on 5 features
chesterkuo opened this issue · 1 comments
chesterkuo commented
This is awesome work here, for QA type solution.
I got one question on input of 5 features, especially for TF.
tf = [0.1 * math.log(wikiwords.N * wikiwords.freq(w.lower()) + 10) for w in d_dict['words']]
what's basic idea for this feature input to doc_rnn ?? Theory is ??
intfloat commented
The intuition behind TF feature is to let the model ignore unimportant words (usually some frequent stop words such as "the", "a").
As to the formula you mention, it is somewhat arbitrary. wikiwords.N * wikiwords.freq(w.lower()
is the absolute frequency of word w
, 0.1 * math.log
is to avoid extremely large numbers, and + 10
is just to make sure math.log
do not throw exceptions.
Other formulas might work as well.
Thank you.