petl-developers/petl

Generator support in fromdicts requires large amount of memory

arturponinski opened this issue · 2 comments

The PR: #569 which introduced generators support in fromdicts has increased memory usage on our production instances.

Problem description

Per itertools.tee docs:

This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().

This most likely is the cause. Due to this, the generator support should:

  1. Be moved to a separate method, ie. fromdictsgenerator
  2. The method should use a temporary file, similarly to how SortView does

The problem description does not describe a "memory leak"
Perhaps something like "Generator support in fromdicts requires large amounts of memory" would be a more appropriate title?

Fair point, description updated