danielmklein/WordCloud

Can we speed up the conversion of Document to DocumentStorage?

Closed this issue · 2 comments

Can we speed up the conversion of Document to DocumentStorage?

This is questionable. This might have to wait until I reimplement this for the backend of the web app.

Ideas:

  • move two re.sub calls in filter_text (lines 114 and 116) to act on entire text string at beginning of method
  • instantiate cap_regex and punc_regex (lines 137 & 138) as member variables in init so they don't compile every method call
  • use xrange instead of range throughout class
  • add term occurrence counting to build_term_list() -- line 60:
if not term in term_list:
    term_list[term] = {"tf":None,"count":0}
else:
    term_list[term]["count"] += 1      

and then reference self.term_list[term]["count"] in self.calculate_term_frequenct()

  • Not really applicable to this class, but I need to replace as much string concatentation with "+" to use "".join()