eprints/irstats2

top downloaded work authors not appearing in "top authors" list?

Closed this issue · 6 comments

I would expect that every author in the "most popular works" list (limited by date) should also appear on the "most popular authors" (limited by the same date) if the total downloads of the most popular work exceed that of other authors on the list of top authors. This looks not to be the case for our repository, so I am looking for advice as to what could be going wrong?

For example, I have this:

Top EPrints (with date range)
http://spectrum.library.concordia.ca/cgi/stats/report/most_popular_eprints?range=&from=20141031&to=20151231
Top Authors (same date range)
http://spectrum.library.concordia.ca/cgi/stats/report/most_popular_authors?range=&from=20141031&to=20151231

The first item on the "top Eprints" list has 11,406 downloads
However, the author of that item doesn't appear on the "top authors" list, even though the second entry in that "top author" table has only 6,887 downloads.
That doesn't make sense - the author of that 11K download should appear in the top author table.

Perhaps I am doing (or understanding) something incorrectly? Thanks so much for your help!

Have you set in your configuration the use_ids flag to 0?

Thank you! It was not set, I'm setting it now.
I found your post about that here:
http://www.eprints.org/tech.php/19363.html
Will the _process_stats _command in our crontab be enough to correct this, or will I have to run another command?
I will report back if that solves the problem.

I think you must recalculate from scratch, using the --setup option.

It looks like the regular "process_stats" command solved the problem, the top authors list is now making sense in relation to the top works list.
So now I'm not sure if I should run the --setup option.
In general, I would like to understand when running the --setup option is required?
I realize that is probably not an easy question to answer, but what does --setup do that the regular process_stats doesn't, which would require running it in this case?

it recreates all tables from scratch (regardless of processes that can be done incrementally - you may not want to reprocess millions of 'access' records too often).

'setup' needs to be run when major configuration changes are made (which requires a full reprocessing of the data)

hope this clarifies a little.

Thanks @sebastfr , I think I knew that, but what are examples of "major configuration changes" that requires full reprocessing? Is switching the use_ids flag to 0 an example of such a change?