kanyun-inc/commonsense-rc

Questions on 5 features

chesterkuo opened this issue · 1 comments

This is awesome work here, for QA type solution.

I got one question on input of 5 features, especially for TF.

tf = [0.1 * math.log(wikiwords.N * wikiwords.freq(w.lower()) + 10) for w in d_dict['words']]

what's basic idea for this feature input to doc_rnn ?? Theory is ??

The intuition behind TF feature is to let the model ignore unimportant words (usually some frequent stop words such as "the", "a").

As to the formula you mention, it is somewhat arbitrary. wikiwords.N * wikiwords.freq(w.lower() is the absolute frequency of word w, 0.1 * math.log is to avoid extremely large numbers, and + 10 is just to make sure math.log do not throw exceptions.

Other formulas might work as well.

Thank you.