Datafable/epu-index

Endpoint for highest ranking article for a day

Closed this issue · 9 comments

Create an endpoint to retrieve the highest ranking article for a day. This will power the functionality described in #16. This functionality cannot be provided by the endpoint described in #25 which - if build - returns article information that is considered private and thus requires authentication.

If no (highest ranking) article is available for a certain day, an empty object should be returned (@bartaelterman: or what is the consensus here?)

URL

I propose: https://epu-index.herokuapp.com/api/highest-ranking-article

Options

format=json
date=yyyy-mm-dd (required)

Returns

{
    "article_title": "Pluto en zijn grootste maan Charon zoals je ze nog nooit zag",
    "article_url": "http://www.demorgen.be/wetenschap/pluto-en-zijn-grootste-maan-charon-zoals-je-ze-nog-nooit-zag-a2390841/",
    "article_newspaper": "De morgen"
}

date, epu and score could optionally be returned as well.

@bartaelterman, please review.

Updated return fields to use underscores to be more consistent with #45

date seems redundant to return. epu and score actually mean the same thing. Still needs to be added to the model though. It is possible that we don't have the score for articles published before 2013 (see #52)

  1. Do articles have scores? I thought they were ranked as positive/negative
  2. How far back in time are we able to show the highest ranking article?
  1. Yes, articles have individual scores ranging from minus infinity to infinity. If the score is > 0, the article is positive, otherwise it is negative.
  2. I have a file with articles from march 1994 until december 17, 2013. These articles where scraped and scored with the previous version of the software, and apparently, the individual score for the articles was not saved (or at least, I don't have it). Note that this list contains only positive articles. I know that because for instance on January 8, 2000, the epu index was 0.5 (meaning 1 positive article out of 2 journals scraped). If I look at the articles I got, I indeed find only one article that day. I repeated this for a couple of other days and it seems to fit.

We will start scraping articles from december 17 2013 onwards.

We could score the old articles because we'll implement the scoring model anyway (see #51 ) but I am personally a bit wary about that (what happens if we come up with different results? We could spend a lot of time figuring out what went wrong). I marked #52 as a question, so I'll ask the user about this.

Implemented ! Please test and report error or close the issue!

Only one remark:

If I add two articles, one with epu score 18 and one with no epu score (so null) both published on the same day. Then when I request the highest article for that day, I get the one with the empty score. That should be the other one.

Sorry, that was clearly a bug. This is now fixed, by considering EPU=Null as EPU=0.

Does that seems correct? Or should EPU=Null values totally excluded by this endpoint?

This is fine.