Logs Analysis Project, part of the Udacity Full Stack Web Developer Nanodegree.
The project is a reporting tool that uses information from a database containing newspaper articles and the web server log for a website. The reporting tool should answer the following questions:
- What are the most popular three articles of all time?
- Who are the most popular article authors of all time?
- On which days did more than 1% of requests lead to errors?
The project code requires the following software:
- Python
- psycopg2
- PostgreSQL
- Linux-based virtual machine (VM) Vagrant
news_logs_analysis.py
- The Python program that connects to the PostgreSQL database, executes the SQL queries and displays the results.README.md
- This read me file.new_logs_output.txt
- The text output of thenews_logs_analysis.py
This project makes use of [Udacity's Linux-based virtual machine (VM)] configuration which includes all of the necessary software to run the application.
- Download Vagrant and install.
- Download Virtual Box and install.
- Download the fsnd-virtual-machine.zip and extract to a directory or your choice.
- Download the newsdata.sql (extract from newsdata.zip (not provided here though)) and news_logs_analysis.py files from the respository and move them to your vagrant directory within your VM.
vagrant up
to start up the VM.vagrant ssh
to log into the VM.cd /vagrant
to change to your vagrant directory.psql -d news -f newsdata.sql
to load the data and create the tables.- Run the two
CREATE VIEW
commands below. python news_logs_analysis.py
to run the reporting tool.
This view is used to only show the dates and the total number of requests to the website (good or bad) done on that day
CREATE VIEW total_request AS SELECT time::date AS day, count(*) AS total_req
FROM log
GROUP BY time::date
ORDER BY time::date desc;
This view is used to only show the dates and the total number of bad or requests to the website done on that day
CREATE VIEW failed_request AS SELECT time::date AS day, count(*) AS num_failed_req
FROM log WHERE status not LIKE '%200%'
GROUP BY day
ORDER BY day;