Can we speed up the conversion of Document to DocumentStorage?
Closed this issue · 2 comments
dmarklein commented
Can we speed up the conversion of Document to DocumentStorage?
dmarklein commented
This is questionable. This might have to wait until I reimplement this for the backend of the web app.
dmarklein commented
Ideas:
- move two re.sub calls in filter_text (lines 114 and 116) to act on entire text string at beginning of method
- instantiate cap_regex and punc_regex (lines 137 & 138) as member variables in init so they don't compile every method call
- use xrange instead of range throughout class
- add term occurrence counting to build_term_list() -- line 60:
if not term in term_list:
term_list[term] = {"tf":None,"count":0}
else:
term_list[term]["count"] += 1
and then reference self.term_list[term]["count"] in self.calculate_term_frequenct()
- Not really applicable to this class, but I need to replace as much string concatentation with "+" to use "".join()