/log-analysis-ud

An internal reporting tool that will use information from the database to discover what kind of articles the site's readers like

Primary LanguagePython

Log Analysis

  • An internal reporting tool that will use information from the database to discover what kind of articles the site's readers like.
  • The database contains newspaper articles, as well as the web server log for the site. The log has a database row for each time a reader loaded a web page.
  • It won't take any input from the user. Instead, it will connect to that database, use SQL queries to analyze the log data, and print out the answers to some questions.

Inspiration

  • Project made for Udacity NanoDegree.

Getting Started

  • Clone it by running command: git clone https://github.com/yash2code/log-analysis-ud.git

  • This project makes use of the Linux-based virtual machine (VM).

  • Download the Data

    You will need to unzip this file after downloading it. The file inside is called newsdata.sql. Put this file into the vagrant directory, which is shared with your virtual machine.

  • To load the data, use the command psql -d news -f newsdata.sql

  • Make Views

  • Run python log-analysis.py

Version

  • Version 0.1

Views

  • author_view

     create view author_view as
     select authors.name, count(articles.author) as views from articles, log, authors
     where log.path = '/article/' || articles.slug and articles.author = authors.id
     group by authors.name order by views desc;
  • error_view

     create view error_view as
     select Date,Total,Error, (Error::float*100)/Total::float as Percent from
     (select time::timestamp::date as Date, count(status) as Total,
     sum(case when status = '404 NOT FOUND' then 1 else 0 end) as Error from log
     group by time::timestamp::date) as output
     where (Error::float*100)/Total::float > 1.0 order by Percent desc;

Output or Results

  • What are the most popular three articles of all time?
		  article       | views
	--------------------+--------
	 candidate-is-jerk  | 338647
	 bears-love-berries | 253801
	 bad-things-gone    | 170098
	(3 rows)

  • Who are the most popular article authors of all time?
		       name          | views
	------------------------+--------
	 Ursula La Multa        | 507594
	 Rudolf von Treppenwitz | 423457
	 Anonymous Contributor  | 170098
	 Markoff Chaney         |  84557
	(4 rows)

  • On which days did more than 1% of requests lead to errors?
		   date    | total | error |     percent
	------------+-------+-------+------------------
	 2016-07-17 | 55907 |  1265 | 2.26268624680273
	(1 row)