thoth-station/package-releases-job

Adjust logic in package releases

Closed this issue · 0 comments

Is your feature request related to a problem? Please describe.

It would be reasonable to adjust logic in package-releases to optimize its run (see recent cluster issues). There are two main things to be optimized to obtain better performance:

  • if the given index has configuration only_if_package_seen set to true, we do not need to iterate over all packages based on the index listing:
    a) it makes the whole checking process slow as there are required queries to the database with each package release check
    b) PyPI's /simple listing can be delayed for more than 24 hours to show packages on the listing that will cause delays in registered packages
    b) we should reuse "seen" packages from the database so that we optimize the retrieval

  • if the given index has configuration only_if_package_seen set to false, we can stick with the current logic

Describe alternatives you've considered

Keep the job as is - it works but is very inefficient and it takes hours to finish.