/dev_challenge_11_semifinal

DEV Challenge 11 - semifinal

Primary LanguagePython

Monitoring system for changes in news texts

Task - PDF file - Ukrainian language

News feed http://brovary-rada.gov.ua/documents/

To start the application, you need to run the command: docker-compose up

Ports 8000 and 3306 should be free..

Run news parser

Optional parameter page_limit (int), number of pages to scan (starting with the newest news). By default, it scans all pages.

Example
  • Request
$ curl -X GET
    http://0.0.0.0:8000/api/run_checker?page_limit=3
  • Response

HTTP/1.1 200

{"status": "ok"}
  • Error Response

HTTP/1.1 500

{"status": "Parsing error. See logs output."}

Get news list

Optional parameters (for pagination):

  • limit (int), Number of news in response. (Default = 20)

  • after (int), From which element to show next news

  • before (int), To what element to show news

Example
  • Request
$ curl -X GET
    http://0.0.0.0:8000/api/articles/?limit=10
  • Response

HTTP/1.1 200

{
  "paging": {
    "previous": "http://0.0.0.0:8000/api/articles/?limit=3&before=4",
    "cursors": {
      "after": 6,
      "before": 4
    },
    "next": "http://0.0.0.0:8000/api/articles/?limit=3&after=6"
  },
   "data": [
        {
          "status": "no changes | updated | deleted",
          "title": "Page title",
          "created_at": 1496241466,
          "updated_at": 1496241466,
          "content": "html content",
          "link": "http://brovary-rada.gov.ua/documents/27297.html",
          "id": 4
        }
    ]
}

To go to the next or previous page, you can use paging->previous or paging->next

  • Error Response

HTTP/1.1 400

{"status": "Error. See logs output."}

Get a list of changed news

The same parameters and the response as Get news list

Get a history of news updates by ID

The same parameters and the response as Get news list

Get deleted news list

The same parameters and the response as Get news list

Get one news by ID

Example
  • Request
$ curl -X GET
    http://0.0.0.0:8000/api/articles/one/1
  • Response

HTTP/1.1 200

{
    "status": "no changes | updated | deleted",
    "title": "Page title",
    "created_at": 1496241466,
    "updated_at": 1496241466,
    "content": "html content",
    "link": "http://brovary-rada.gov.ua/documents/27297.html",
    "id": 1

}
  • Error Response

HTTP/1.1 400

{"status": "Error. See logs output."}

Technologies

  • To implement the task has been used Tornado Web Server

  • Database MySQL

  • Parsing html pages - Python lib Beautiful Soup

Each saved news has 3 statuses(no changes, updated, deleted). Each time you start the parser (/api/run_checker), compare checksum content, If there is a difference - a new version of the document is saved, and the parent changes the status to updated. When a deleted document is detected, the saved status changes to deleted.

The parsing function is recursive, works until it loads the specified number of pages, and if this option is not specified, until it scans all the news.