ofek/pypinfo

Non-normalised package name

jayvdb opened this issue · 4 comments

I would like to use the data to correlate with openSUSE package names, which use the 'real' name supplied in setup.py, i.e. not-normalised.

I've been doing a bit of research at hugovk/top-pypi-packages#4 and psincraian/pepy#128, and the raw data from bigquery can include this, with a very small perf hit.

The query only needs to change from selecting file.project to substr(max(file.filename),1,LENGTH(file.project)) , or more likely including both.

Note this does depend on using standard SQL ( #28 ).

Do we know the cost implications of those changes?

ofek commented

Hello there!

I'm unsure of the cost implications of this. Though, I'll approve whatever @hugovk thinks is best 🙂

I don't know the cost implications, I guess the best way is to test it out.

If it costs more, I'd suggest adding a switch to include the change. I'm nearly always out of quota.