Log Analysis Report Generator
Udacity Full Stack Developer Nanodegree Project 1
Overview
In this project, we execute complex SQL queries on a large database to extract useful and interesting statistics.
The database is part of a newspaper company's website - containing 3 tables:
-
articles
- Contains the articles published in the newspaper. -
authors
- Contains the authors who have published above articles. -
log
- Contains a log of every HTTP request for each article recieved by the server and their status codes and timestamp.
The tool tries to answer these 3 questions:
-
What are the most popular three articles of all time?
-
Who are the most popular article authors of all time?
-
On which days did more than 1% of requests lead to errors?
Requirements
Setup
-
Install
Vagrant
andVirtualBox
. -
Clone or download fullstack-nanodegree-vm repository.
-
Clone or download current(fsnd-log-analysis) repository.
-
Download newsdata.zip and unzip to get the
newsdata.sql
file containing the database data. -
Copy the contents of
fsnd-log-analysis
and filenewsdata.sql
into directory:fullstack-nanodegree-vm/vagrant/news
(create directory news) -
Bring up the Vagrant VM from inside
fullstack-nanodegree-vm/vagrant
using the following command (you may have to run this commands multiple times):
$ cd fullstack-nanodegree-vm/vagrant
$ vagrant up
- The above step will bring up the VM and install all the requirments like Python3, PostgreSQL etc. Now launch the VM using:
$ vagrant ssh
- Change directory into
/vagrant/news
& load the database data using:
$ cd /vagrant/news
$ psql -d news -f newsdata.sql
- Check connection to database using:
$ psql -d news
news=> \dt;
List of relations
Schema | Name | Type | Owner
--------+----------+-------+---------
public | articles | table | vagrant
public | authors | table | vagrant
public | log | table | vagrant
(3 rows)
news=> \q
- Check if python package
psycopg2
is installed:
$ pip3 list | grep psycopg2
psycopg2-binary 2.7.6.1
~ If not present, install it ~
$ pip3 install psycopg2-binary
Note: The database views required are created by the tool. No need to manually create them. If required look at the Database Views
section below.
Running the tool
Ensure that steps from Setup are completed
From /vagrant/news
directory inside VM run using:
$ python3 report_gen.py
Sample Output
Sample output can be viewed in output.txt
Database Views
Database Views required are auto-generated by the tool itself upon initial run.
The views it generates are:
-- Top Articles View
CREATE OR REPLACE VIEW top_articles AS
SELECT A.title, A.author, COUNT(*) AS views
FROM articles AS A, log AS L
WHERE L.path=CONCAT('/article/', A.slug)
GROUP BY A.title, A.author
ORDER BY views DESC;
-- Popular Authors View
CREATE OR REPLACE VIEW top_authors AS
SELECT U.name AS author, SUM(SUBQ.views) AS agg_views
FROM (SELECT A.title, A.author, COUNT(*) AS views
FROM articles AS A, log AS L
WHERE L.path=CONCAT('/article/', A.slug)
GROUP BY A.title, A.author) AS SUBQ, authors AS U
WHERE SUBQ.author=U.id
GROUP BY U.name
ORDER BY agg_views DESC;
-- Error Rate View
CREATE OR REPLACE VIEW error_log AS
SELECT SUBQ.day, SUBQ.error_rate
FROM (SELECT DATE(time) AS day,ROUND(100.0*SUM(CASE status WHEN '200 OK' THEN 0 ELSE 1 END)/COUNT(status), 2) AS error_rate
FROM log GROUP BY day) AS SUBQ
WHERE SUBQ.error_rate>1.0
ORDER BY error_rate DESC;