mvantellingen/localshop

Improve the fetch_package task performance.

canassa opened this issue · 0 comments

The fetch_package task is the task responsible for accessing for fetching a package metadata from the PyPI API and creating the package models in Localshop. This task is called every time Localshop receives a pip install <package> command with <package> that was already not mirrored on Localshop. The task is also called once a day by the scheduler in order to update the packages.

The problem is that this task can take a long time to complete, especially if the package has many releases (e.g.: Celery). The problem is not API access but our own database. The task does too many INSERTS and too many queries.

Previously, this task code it was being executed in the view itself, which caused pip to timeout and retry the request (!), I "fixed" this by moving the code to task. But the code stills needs to be improved.

Some ideas:

  • Migrate some columns from the Release model to the Package model.
  • Try to use a bulk inserts.
  • Split the code in two functions, one for insert a brand new package into the system and one for updating an existing package in Localshop.