A reporting tool that prints out reports (in plain text) based on the data in a database using python. The project is part of the Udacity Full Stack Web Developer Nanodegree.
Built with
- Python 3
- Postgresql
PreRequisites
- Python3
- Vagrant
- VirtualBox
- Postgresql
Setup
- Clone this repo
git clone https://github.com/oreilm49/logs-analysis
- Install vagrant
- Install VirtualBox
- Download and unzip sql database into the project directory
- Download the vagrant set up files. Copy the Vagrantfile into the project directory.
Run
- Launch Vagrant by & log in by running
> vagrant up > vagrant ssh
- Load database by running the following sql
psql -d news -f newsdata.sql
- Run database analysis
python reporting.py
Database
There are three tables in the database
- Authors: contains information about article authors.
- Articles: all information on each article inluding content, slug, title and a foreign key linking to the Authors table.
- Log: Info relating to each individual page request per article. Describes each request by URI path, method, timestamp, ip and the HTTP status.
There are also two views:
- articles_info: creates a relationship between each article, it's author and number of views.
CREATE VIEW articles_info AS SELECT articles.title AS article, authors.name AS author, count(log.path) AS views FROM articles, authors, log WHERE CONCAT('/article/',articles.slug) = log.path AND articles.author = authors.id AND log.path LIKE '%/article/%' GROUP BY articles.title, authors.name ORDER BY views DESC;
- day_requests: describes total number of requests per article page per day, and the total of each HTTP status.
CREATE VIEW day_requests AS SELECT time::date as day, count(*) FILTER (WHERE status = '200 OK') AS ok, count(*) FILTER (WHERE status = '404 NOT FOUND') AS bad, count(*) AS total FROM log GROUP BY day;
API
Three functions provide insights on the database:
- top3Articles() aptly named, returns the top three articles of all time.
- topAuthors() returns the list of article authors ranked by article views.
- notFoundRate() displays days where the % of page requests was above 1%.