aclark4life/vanity

Stats broken since January 2016

Themanwithoutaplan opened this issue Β· 43 comments

This is a minor niggle but it looks like vanity is not getting any updated statistics since about January. So for example vanity openyxl is confidently telling me that the package has never been downloaded.

@Themanwithoutaplan Yep, I think this is a PyPI issue.

Is there anything you can do about it? I like vanity. Outdated statistics, however, make it quite useless :-)

@SmokinCaterpillar I like it too! We need to ask @dstufft or someone from @pypa to help.

I think Donald is concentrating on getting Warehouse up to replacing PyPI. Should be more reliable once that's done.

As part of Warehouse I've been working on a new stats pipeline that should both be way more robust and provide a lot more insight into downloads.

@Themanwithoutaplan @dstufft Any ETA on Warehouse? Might be worth fixing whatever annoyance has broken stats again at least once more to get us through…

I think Warehouse is pretty close to being ready. Nobody likes touching the PyPI code and, given that it's been broken since January, I don't think another few days or weeks really matter.

Warehouse has a much clearer (and better) code base that will hopefully make it easier to maintain and more reliable. And help to add features.

@Themanwithoutaplan Great! Nope, another few days or week don't really matter. Months on the other hand …

They was talking about disable the stats because is distorted (mirrors counts and so on). Anybody can explain to me what is the Warehouse?

@ryukinix Ah, thanks for the cross ref. Warehouse is: https://github.com/pypa/warehouse

Oh, nothing, thanks you about that nice tool! Is a little sad doesn't works now, but is not your fault. xD

Warehouse looks interesting! We have some estimative when this will works in production? Would be nice have the vanity working again.

@ryukinix According to @Themanwithoutaplan "pretty close to being ready" … and we should only have to live with broken stats "another few days or weeks". Practically speaking though, since it's a (much appreciated) volunteer effort, I would be happy if it happened sometime in 2016, period.

Just to be clear. PyPI isn't using this data yet but it will be.

Sent from my iPhone

On May 25, 2016, at 8:39 AM, Alex Clark notifications@github.com wrote:

https://mail.python.org/pipermail/distutils-sig/2016-May/028986.html

β€”
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@dstufft Yeah understood, thanks! Presumably some aggressive vanity user could start consuming it then add support to vanity :-)

whistles and looks at his shoes.

Is this fixed? I'm seeing stats again …

screenshot 2016-06-20 16 24 20

Did you consider moving to using the BigQuery dataset, for the moment?

(As suggested here )

Yep, suggested above too. Updating vanity to use the BigQuery data set is possibly a way to get old "missing" data back.

Is it safe yet to remove the "stats broken" message from vanity? If so, I'll close this and make a new release.

It seems stats are broken again.

requests-2.12.1-py2.py3-none-any.whl    2016-11-16       624953
              requests-2.12.2.tar.gz    2016-11-30            0
requests-2.12.2-py2.py3-none-any.whl    2016-11-30            0
              requests-2.12.3.tar.gz    2016-12-01            0
requests-2.12.3-py2.py3-none-any.whl    2016-12-01            0

@noxdafox I think they've been broken since January, or at least not working consistently…

Sorry, I've had a lot more higher priority items. I would suggest using the BigQuery database instead of the API, although that doesn't (and can't, since some of that data simply doesn't exist anymore) get a cumulative count of downloads past a certain date. Currently that date is early 2016, but once I am able to backfill data it will be past a Jan 2014 date.

@dstufft that would work for me. From a library developer's perspective I'm mainly interested in what's been happening recently: are people updating so I can kill old stuff?

@dstufft I'm reading there:

Queries are charged against your account, but you get 1TB free per month and cached queries won't count against it.

Does this mean vanity will either have to ship with someone's personal credentials or ask the user to fill in their own credentials in a local config?

Sounds like this is end of easy-to-get stats on Python projects then. Too bad.

Is there a download stats section planned for warehouse?

I don't believe possessing a Google account to be a significant barrier to entry to accessing statistics. It is certainly more of a barrier than completely unauthenticated, but not much IMO.

Warehouse will not get anything as powerful as raw access to the BigQuery table but I would like to add some "high value" metrics for projects that they can view.

Warehouse will not get anything as powerful as raw access to the BigQuery table but I would like to add some "high value" metrics for projects that they can view.

Yes, that's what I meant; just a simple "download count in the last 30 days" or something along those lines. Something to brag about. πŸ˜‰

Yea something like that, though it is fairly low on my list of priorities since (A) it's non trivial to implement and (B) BigQuery is available.

I'm getting fairly reasonable numbers out of vanity again. Has something been silently fixed?

piem commented

hi there,

it seems not everything was fixed:

aubio_vanity

at least one person downloaded aubio 0.4.4 (me :-) ), some time ago already.

cheers, piem

is there any alterantive to vanity?

@MartinPyka Not that I know of…

Again, I'm getting reasonable numbers for various projects. Has this been silently fixed?

Could be related to pypi having switched to Warehouse even though this is still not quite finished.

Going to try and tackle this one on Aug 5 at this event:

If anyone has any tips, please feel free to post them here (I know nothing about BigQuery going in.)

Hi Alex, haven't worked with it myself but it's essentially a JSON API. httparchive is switching to it so you might be able to get some of an idea of how it works from that code, though it's all JS. One example is here http://jsfiddle.net/rviscomi/1r6dpctd/ if you look at the source.

I think the biggest problem will be whether you need to use credentials to access the data. If so you'll need to implement some kind of proxy somewhere. Based on the above example this may no longer be the case for public data sets. wget https://storage.googleapis.com/http-archive-beta.appspot.com/bytesJsTimeseries.json.

Best of luck!

ofek commented

@MartinPyka This is what people are using now https://github.com/ofek/pypinfo if you still need an alternative

@ofek Nice! Good to know this project exists. (Although I do take some offense to your statement "this is what people are using now …" srsly?)

ofek commented

@aclark4life Sorry about that, I meant no offense! It was regarding BigQuery usage, not download stats in general.

@ofek No prob! Just finished installing and testing pypinfo, very nice …