Warning !

WIP

Intro

This project aims to create a platform to develop ranking algorithms for news i tried to maximize modularity to keep modifications easy

currently it only keeps track of scientific news but its trivial to add new news targets.

check the TODO file for problems

Folder Structure

123

Running your own server

Installation

front end uses semantic ui

Project has seed links by default but if you want to use your own links, edit the link_list.csv rss_list.csv category_list.txt

make sure you have mongodb, python installed

Install python requirements

$ pip install -r requirements.txt

Confirm settings on config.py
Run insert_links.py provides seed links to other url feeds for crawlers, meant to be crawled reqularly.
Run insert_categories.py (provides possible category names)
Run cronjobs.py sets the cronjob for crawler and ranker (check config.py for path )
Run get_news.py starts calling each crawler and collect data
Run rank_db.py queries collected news data and ranks them with available rankers. query has a specific date range (check config.py for date range )

Now you are ready to run the server !

python server.py

rank_db.py Explanation get_news.py Explanation

Collected Data

News Data

Not every field crawlers collect are required but can changed in the config.py file After crawler parses the data validate.py checks the data for specified key's existence

Field Key	Required ?	Comment
`title`	YES
`category`	YES
`url`	YES
page_type	NO	determines which crawler to use
date	NO	utc format
subtitle	NO	decription
author	NO
domain	NO	url's domain

Adding a new Ranker

Only requirement for each ranker package is that it accepts and returns a dict object.

You have to check for field existence since news data keys can vary ranker package should be located in the rank folder and start with the prefix rank_

After you create your ranker update the following files

project_root/rank/__init__.py (links package)
project_root/static/ranker_data.js add json to link your ranker to front end

ranker_data.js example json

    {
    "text":"Shortest Title",
    "value":"shortest_title",  
    "icon":"eye",     
    },

key	purpose
`text`	what user sees
`value`	must match with init.py file in your package
`icon`	possible values; Semantic UI Icons

Adding a new link

if you haven't started the server you should add it to link_list.csv or rss_list.csv

if you started the server use the add_link.py inside the utils folder it will handle category creation and insert the link to mongo collection crawl_target

License

MIT

ege-del/sayfa_dev