Check how expensive it is to calculate or estimate combined df's of features (number of documents, where all expression features occurr)
Closed this issue · 2 comments
patrickfrey commented
Check how expensive it is to calculate or estimate combined df's of features (number of documents, where all expression features occurr)
patrickfrey commented
Currently the df's of expression features are just inherited from the rarest (AND) or most frequent (OR) child expression. Maybe the iterators on the set of documents (as ranges) where the feature occurs could be used for calculating or estimating a more accurate value.
patrickfrey commented
Estimated df calculation is too expensive because it requires a statistically relevant number of random access. Random access kills.