adsabs/adsabs-dev-api

Metrics API

Closed this issue · 7 comments

Hola,

Is there any (unofficial or otherwise) documentation about the capabilities of the metrics API? Specifically if one were interested in sorting astronomers by some field (e.g., normalised citations), would something like that be available through the metrics API?

At the moment the only way I can see a way to do something like this would be to find highly cited papers (with a wildmask search, sorted by citations), get the names of the authors from the first X papers, then start searching for papers by those names in order to (reasonably) rank top people publishing by their normalised citations. However, for example, if one wanted to know the top 1000 astronomers as ranked by normalised citations, this becomes an expensive exercise.

So, I'm just wondering if the metrics API will have any kind of capabilities like this, or doing something like I propose is the best way forward for the immediate future.

Hi Andy

Basically, the metrics API returns the same results as the old API; only
the format has been changed a bit. I will update the README for the metrics
API soon to document the format.

cheers
--Edwin

Edwin Henneken ehenneken@cfa.harvard.edu
NASA Astrophysics Data System IT Specialist
Harvard - Smithsonian http://
http://adslabs.orgadslabs.org
Center for Astrophysics http://ads.harvard.edu
60 Garden St. MS 83, Cambridge, MA 02138 Room P-129

ORCID 0000-0003-4264-2450

On Fri, Sep 4, 2015 at 11:07 AM, Andy Casey notifications@github.com
wrote:

Hola,

Is there any (unofficial or otherwise) documentation about the
capabilities of the metrics API? Specifically if one were interested in
sorting astronomers by some field (e.g., normalised citations), would
something like that be available through the metrics API?

At the moment the only way I can see a way to do something like this would
be to find highly cited papers (with a wildmask search, sorted by
citations), get the names of the authors from the first X papers, then
start searching for papers by those names in order to (reasonably) rank top
people publishing by their normalised citations. However, for example, if
one wanted to know the top 1000 astronomers as ranked by normalised
citations, this becomes an expensive exercise.

So, I'm just wondering if the metrics API will have any kind of
capabilities like this, or doing something like I propose is the best way
forward for the immediate future.


Reply to this email directly or view it on GitHub
#14.

Hey Edwin,

Thanks for that! From reading through https://github.com/adsabs/metrics_service (to refresh myself on the API) it seems that it is easy to retrieve detailed metrics for given bibcodes. However it seems to me that it might be more difficult to aggregate these by authors in order to rank astronomers in the way I described.

For the given example (top ranked N authors by normalised citations) would you say that searching for highly-cited papers, then getting citation metrics for papers published by the authors of those papers, would currently be the most efficient way of compiling such a list?

the most efficient way would be to use functional queries - give me until later, i'll try to come up with an example...

do you want to run it against a list of bibcodes/authors? and normalized by the highest citation ocunt?

Hi Andy

Any practical way to make it happen, what you propose, can only be achieved expensively. Essentially, it really only makes sense if you either have curated publication lists for those astronomers, or if searching by ORCID has been implemented and OCRCIDs have been assigned. So, if you want to have lists sorted by a certain statistic or indicator, you first need to do a query to get all the papers for a given author and then generate the metrics overview for those records. We are looking into making metics generation more scalable and flexible, but that's still under development.

Note that different disciplines/fields have different citation rates/practices. Percentile based indicators usually are better, and there is also the Tori index, which removes discipline-dependent rates by means of its double normalization.

Whatever way you pick to generate metrics for a given author, you always will have the potential name ambiguity problem (and with authors publishing in both astronomy and physics, chemistry or biology journals, this gets even worse).

Andy

With "normalized citation count", I assume you mean the sum of the citations to the papers by a given author, divided by the number of authors of the paper that was cited, correct? So, Edward Witten has a very big normalized citation count, while most people in big collaborations don't.

ok, i failed - results of functional queries cannot be faceted, i tried pivot.facets but they are too slow (and for api users will time out); we run solr 4.8 and there is potentially a solution in solr5.0 - to compute stats for individual facets. However, these solutions are unrealistic because they are too slow - it would have to:

  1. search for astrophysics papers (i.e. topn(1000, database:astronomy, citation_count desc)
  2. facet the set by authors
  3. for every author do facet.pivot on citations
  4. compute the stat for result of 3

multivalued fields are very slow

however, the following could get you started - it returns authors of top cited papers in astronomy in year 2015, when you grab the facet (authors), you can then quickly collect metrics for these names (btw: metrics accepts a query, you don't need to search by bibcode only)

q=topn(1000%2C+database%3Aastronomy+AND+year%3A2015%2C+citation_count+desc)&sort=citation_count+desc&fl=bibcode%2Ccitation_count&wt=json&indent=true&facet=true&facet.field=author&facet.mincount=1

however, the usual caveats apply: name could belong to multiple people; the starting criteria are arbitrary (first 1000 papers)

it is not easy to compile the "scientific hitparade"

Thanks for the ideas on how to deal with this problem! I came up with some code to be able to do this kind of query, or at least approximate what the distribution looks like at the top end. I used @romanchyla 's idea by going from the top cited papers and then searching by authors. My code is not the most efficient query, but it certainly got the job done (and faster than what I expected).

Happy to close this issue if you are. Thanks again!