Log Analysis Report Generator

Udacity Full Stack Developer Nanodegree Project 1

Overview

In this project, we execute complex SQL queries on a large database to extract useful and interesting statistics.

The database is part of a newspaper company's website - containing 3 tables:

articles - Contains the articles published in the newspaper.
authors - Contains the authors who have published above articles.
log - Contains a log of every HTTP request for each article recieved by the server and their status codes and timestamp.

The tool tries to answer these 3 questions:

What are the most popular three articles of all time?
Who are the most popular article authors of all time?
On which days did more than 1% of requests lead to errors?

Requirements

Setup

Install Vagrant and VirtualBox.
Clone or download fullstack-nanodegree-vm repository.
Clone or download current(fsnd-log-analysis) repository.
Download newsdata.zip and unzip to get the newsdata.sql file containing the database data.
Copy the contents of fsnd-log-analysis and file newsdata.sql into directory: fullstack-nanodegree-vm/vagrant/news (create directory news)
Bring up the Vagrant VM from inside fullstack-nanodegree-vm/vagrant using the following command (you may have to run this commands multiple times):

$ cd fullstack-nanodegree-vm/vagrant
$ vagrant up

The above step will bring up the VM and install all the requirments like Python3, PostgreSQL etc. Now launch the VM using:

$ vagrant ssh

Change directory into /vagrant/news & load the database data using:

$ cd /vagrant/news
$ psql -d news -f newsdata.sql

Check connection to database using:

$ psql -d news

news=> \dt;
          List of relations
 Schema |   Name   | Type  |  Owner
--------+----------+-------+---------
 public | articles | table | vagrant
 public | authors  | table | vagrant
 public | log      | table | vagrant
(3 rows)

news=> \q

Check if python package psycopg2 is installed:

$ pip3 list | grep psycopg2
psycopg2-binary     2.7.6.1

    ~ If not present, install it ~

$ pip3 install psycopg2-binary

Note: The database views required are created by the tool. No need to manually create them. If required look at the Database Views section below.

Running the tool

Ensure that steps from Setup are completed

From /vagrant/news directory inside VM run using:

$ python3 report_gen.py

Sample Output

Sample output can be viewed in output.txt

Database Views

Database Views required are auto-generated by the tool itself upon initial run.

The views it generates are:

-- Top Articles View
CREATE OR REPLACE VIEW top_articles AS
SELECT A.title, A.author, COUNT(*) AS views
FROM articles AS A, log AS L
WHERE L.path=CONCAT('/article/', A.slug)
GROUP BY A.title, A.author
ORDER BY views DESC;

-- Popular Authors View
CREATE OR REPLACE VIEW top_authors AS
SELECT U.name AS author, SUM(SUBQ.views) AS agg_views
FROM (SELECT A.title, A.author, COUNT(*) AS views
FROM articles AS A, log AS L
WHERE L.path=CONCAT('/article/', A.slug)
GROUP BY A.title, A.author) AS SUBQ, authors AS U
WHERE SUBQ.author=U.id
GROUP BY U.name
ORDER BY agg_views DESC;

-- Error Rate View
CREATE OR REPLACE VIEW error_log AS
SELECT SUBQ.day, SUBQ.error_rate
FROM (SELECT DATE(time) AS day,ROUND(100.0*SUM(CASE status WHEN '200 OK' THEN 0 ELSE 1 END)/COUNT(status), 2) AS error_rate
FROM log GROUP BY day) AS SUBQ
WHERE SUBQ.error_rate>1.0
ORDER BY error_rate DESC;

maneeshd/fsnd-log-analysis