harvard-lil/perma

Use `iterator()` when iterating through large querysets in Celery tasks

rebeccacremona opened this issue · 2 comments

Back when, I found that iterator() when used with values_list was causing querysets to be evaluated twice: we saw the (at the time expensive and un-optimized) queries running on the database twice. So, we removed iterator().

I can no longer reproduce that problem. Lots of things have changed in the meantime: Django upgrades, a migration from MySQL to Postgres, etc.

Let's put iterator() back and thereby use RAM more gently.

((I went so far as to spin up a mysql container, and wire it up to web, install fixtures and make a few dozen captures, and try to reproduce there: even there, I was not able to reproduce. Just some mild gaslighting underway... I'm going to move on, and not worry about it too much...))

Whoops, this is done already. 876f53f