Non-normalised package name
jayvdb opened this issue · 4 comments
I would like to use the data to correlate with openSUSE package names, which use the 'real' name supplied in setup.py, i.e. not-normalised.
I've been doing a bit of research at hugovk/top-pypi-packages#4 and psincraian/pepy#128, and the raw data from bigquery can include this, with a very small perf hit.
The query only needs to change from selecting file.project
to substr(max(file.filename),1,LENGTH(file.project))
, or more likely including both.
Note this does depend on using standard SQL ( #28 ).
Do we know the cost implications of those changes?
Hello there!
I'm unsure of the cost implications of this. Though, I'll approve whatever @hugovk thinks is best 🙂
I don't know the cost implications, I guess the best way is to test it out.
If it costs more, I'd suggest adding a switch to include the change. I'm nearly always out of quota.