This tool analysis a database with over 12 million row and provides a set of usable results.
Following these instuctions will get you the Newspapwer Reporting Tool up and running.
pip8
-
Install VirtualBox
You can download it from here (Linux, Windows, OSX ) https://www.virtualbox.org/wiki/Download_Old_Builds_5_1 -
Install Vagrant
You can Download it from here (Linux, Windows, OSX)
https://www.vagrantup.com/downloads.html
to check if vagrant is succeffully installed, please run
$ vagrant --versoin
from the command line -
Download VM Configurations
$ git clone https://github.com/udacity/fullstack-nanodegree-vm.git # clone git repository
-
Download Reporting Tool Projct ( THIS )
$ git clone https://github.com/iYassr/ReportingTool.git
move folder 'Reporting Tool' into ' the cloned folder 'vagrant' - step 4 - -
Download the News Database
You can find it here https://d17h27t6h515a5.cloudfront.net/topher/2016/August/57b5f748_newsdata/newsdata.zip
unzip the file and move content to the cloned folder 'vagrant' - step 4 - -
Run Vagrant Instance and SSH to it
$ cd vagrant # cd to the cloned project folder
$ vagrant up # wait until finished, it might take more that few minitus
$ vagrant ssh # ssh to the already configured vm
$ cd /vagrant
- Import News Database Into Postgresql Server
$ psql -d news -f newsdata.sql # create tables and import data from newsdata.sql to news db
- Create Used Views Using Psql Shell
$ psql news # access news db from psql interactive shell
$ CREATE VIEW most_popular_articles as select articles.title, sum(stats.hits) from articles,
(select substr(log.path,10,50) as article_name, count(*) as hits from log
where path like '/article/%' group by path order by hits DESC) as stats
where articles.slug = stats.article_name group by articles.title
order by sum DESC LIMIT 3;
$ CREATE VIEW most_popular_authors as select authors.name, top_authors.sum from authors,
(select articles.author, sum(stats.hits) from articles,
(select substr(log.path,10,50) as article_name,
count(*) as hits from log where path like '/article/%'
group by path order by hits DESC) as stats
where articles.slug = stats.article_name group by articles.author
order by sum DESC) AS top_authors
where authors.id = top_authors.author;
$ CREATE VIEW most_errors as select to_char(date, 'FMMonth FMDD, YYYY'), err/total as ratio
from (select time::date as date,
count(*) as total,
sum((status != '200 OK')::int)::float as err
from log
group by date) as errors
where err/total > 0.01;
$ \q # to exit the shell
$ vagrant ssh # ssh to the already initilized vagrant instance
$ cd /vagrant/ReportingTool # cd into the projct folder
$ python3 main.py # run the program
python3 main.py
---------------------------------------
The 3 Most Popular Articles of All Time
Candidate is jerk, alleges rival --- 338647 Views
Bears love berries, alleges bear --- 253801 Views
Bad things gone, say good people --- 170098 Views
---------------------------------------
The Most Popular Authers of All Time
Ursula La Multa --- 507594 Views
Rudolf von Treppenwitz --- 423457 Views
Anonymous Contributor --- 170098 Views
Markoff Chaney --- 84557 Views
---------------------------------------
Days were Errors > 1%
July 17, 2016 --- 2.26268624680273 %
an effort was made to make this tool as simple as possible.
Function | Descreption |
---|---|
get_most_popular_articles() | returns curser with the 3 most populor articles as tuple (name, number of views) |
get_most_popular_authors() | returns curser with most popular authors as tuple (name, number of views) |
get_errors() | return curser with days with > %1 HTTP 404 Page not found Error (day, % of errors if > 1 ) |
to_print() | to print results of the query in a human-readable way |
db_connect() | to connect to database ( db, cursor ) |
get_most_popular_articles()
CREATE VIEW most_popular_articles as select articles.title, sum(stats.hits) from articles,
(select substr(log.path,10,50) as article_name, count(*) as hits from log
where path like '/article/%' group by path order by hits DESC) as stats
where articles.slug = stats.article_name group by articles.title
order by sum DESC LIMIT 3;
get_most_popular_authors()
CREATE VIEW most_popular_authors as select authors.name, top_authors.sum from authors,
(select articles.author, sum(stats.hits) from articles,
(select substr(log.path,10,50) as article_name,
count(*) as hits from log where path like '/article/%'
group by path order by hits DESC) as stats
where articles.slug = stats.article_name group by articles.author
order by sum DESC) AS top_authors
where authors.id = top_authors.author;
get_errors()
CREATE VIEW most_errors as select to_char(date, 'FMMonth FMDD, YYYY'), err/total * 100 as ratio
from (select time::date as date,
count(*) as total,
sum((status != '200 OK')::int)::float as err
from log
group by date) as errors
where err/total > 0.01;
iYassr
Thanks MiSK and Udacity for this amazing course.