_NumbersModel.table_string_key degrades performance
Closed this issue · 3 comments
Thanks for maintaining the useful parser!
numbers-parser/src/numbers_parser/model.py
Line 226 in 8dcb81d
When handling a gigantic table, the line where generates a dictionary on every call extremely degrades the performance. Also the line key = max(string_lookup.values()) + 1
seems very expensive as well.
class TableStringsStore(object):
def __init__(self, entries):
self.entries = entries
self.lookup = {x.string: x.key for x in entries}
self.max_value = max(self.lookup.values()) if self.lookup else 0
def append_entry(self, entry):
self.entries.append(entry)
self.lookup[entry.string] = entry.key
self.max_value = max(self.max_value, entry.key)
def clear_field_container(self):
clear_field_container(self.entries)
self.lookup = {}
self.max_value = 0
This is just a sketch, but incrementally updating the string lookup dict and the max value via the wrapper like above did drastically improve the performance when saving a file. Annotating table_string_key()
with @lru_cache
also looked effective.
Many thanks for spotting that. I've refactored how I manage datalists which are used in various places and for the strings I should now be efficiently cacheing. There are other performance issues I am aware of in saving files but I've not done profiling on that yet.
Do keep the feedback coming - I'm very happy to see someone using the save functions.
You should see the improvements in 3.9.3.
I'm very looking forward to see the improvement and the upcoming release! Another low-hanging fruit I found was to utilize cache in iwafile.find_references
(called from containers.ObjectStore.update_dirty_objects
). For me the saving feature is fascinating because I think Numbers is a nice tool to present tabular data less cluttered than Excel. I appreciate your effort and keep up the good work!
Yes update_dirty_objects
is not overly smart. It is also overly greedy in computing new dependencies rather than tracking what's actually dirty.