Monitoring system for changes in news texts

Task - PDF file - Ukrainian language

News feed http://brovary-rada.gov.ua/documents/

To start the application, you need to run the command: docker-compose up

Ports 8000 and 3306 should be free..

Run news parser

GET http://0.0.0.0:8000/api/run_checker

Optional parameter page_limit (int), number of pages to scan (starting with the newest news). By default, it scans all pages.

Example

Request

$ curl -X GET
    http://0.0.0.0:8000/api/run_checker?page_limit=3

Response

HTTP/1.1 200

{"status": "ok"}

Error Response

HTTP/1.1 500

{"status": "Parsing error. See logs output."}

Get news list

GET http://0.0.0.0:8000/api/articles/

Optional parameters (for pagination):

limit (int), Number of news in response. (Default = 20)
after (int), From which element to show next news
before (int), To what element to show news

Example

Request

$ curl -X GET
    http://0.0.0.0:8000/api/articles/?limit=10

Response

HTTP/1.1 200

{
  "paging": {
    "previous": "http://0.0.0.0:8000/api/articles/?limit=3&before=4",
    "cursors": {
      "after": 6,
      "before": 4
    },
    "next": "http://0.0.0.0:8000/api/articles/?limit=3&after=6"
  },
   "data": [
        {
          "status": "no changes | updated | deleted",
          "title": "Page title",
          "created_at": 1496241466,
          "updated_at": 1496241466,
          "content": "html content",
          "link": "http://brovary-rada.gov.ua/documents/27297.html",
          "id": 4
        }
    ]
}

To go to the next or previous page, you can use paging->previous or paging->next

Error Response

HTTP/1.1 400

{"status": "Error. See logs output."}

Get a list of changed news

GET http://0.0.0.0:8000/api/articles/updated/

The same parameters and the response as Get news list

Get a history of news updates by ID

GET http://0.0.0.0:8000/api/articles/updated/history/:NEWS_ID

The same parameters and the response as Get news list

Get deleted news list

GET http://0.0.0.0:8000/api/articles/deleted/

The same parameters and the response as Get news list

Get one news by ID

GET http://0.0.0.0:8000/api/articles/one/:NEWS_ID

Example

Request

$ curl -X GET
    http://0.0.0.0:8000/api/articles/one/1

Response

HTTP/1.1 200

{
    "status": "no changes | updated | deleted",
    "title": "Page title",
    "created_at": 1496241466,
    "updated_at": 1496241466,
    "content": "html content",
    "link": "http://brovary-rada.gov.ua/documents/27297.html",
    "id": 1

}

Error Response

HTTP/1.1 400

{"status": "Error. See logs output."}

Technologies

To implement the task has been used Tornado Web Server
Database MySQL
Parsing html pages - Python lib Beautiful Soup

Each saved news has 3 statuses(no changes, updated, deleted). Each time you start the parser (/api/run_checker), compare checksum content, If there is a difference - a new version of the document is saved, and the parent changes the status to updated. When a deleted document is detected, the saved status changes to deleted.

The parsing function is recursive, works until it loads the specified number of pages, and if this option is not specified, until it scans all the news.

progsly/dev_challenge_11_semifinal

Monitoring system for changes in news texts

Run news parser

GET http://0.0.0.0:8000/api/run_checker

Example

Get news list

GET http://0.0.0.0:8000/api/articles/

Example

Get a list of changed news

GET http://0.0.0.0:8000/api/articles/updated/

Get a history of news updates by ID

GET http://0.0.0.0:8000/api/articles/updated/history/:NEWS_ID

Get deleted news list

GET http://0.0.0.0:8000/api/articles/deleted/

Get one news by ID

GET http://0.0.0.0:8000/api/articles/one/:NEWS_ID

Example

Technologies