This project is to build a python and SQL reporting tool that summarizes data from a large postgres database.
This project makes use of Linux-based virtual machine (VM) as the preceding lessons.
Vagrant is the software that configures the VM and lets you share files between your host computer and the VM's filesystem.
VirtualBox is the software that actually runs the virtual machine.The supported version of VirtualBox to install is version 5.1. Newer versions do not work with the current release of Vagrant.
Download this repo: https://github.com/udacity/fullstack-nanodegree-vm
and from your terminal cd
to the vagrant folder
From the vagrant subdirectory run the command:
vagrant up
When vagrant up
is finished running run the command below to log int to
the installed Linux VM:
vagrant ssh
Download the data here: https://d17h27t6h515a5.cloudfront.net/topher/2016/August/57b5f748_newsdata/newsdata.zip
Unzip it and put the newsdata.sql
into the vagrant folder and clone
this project inside the vagrant folder
To load the data, cd
into the vagrant
directory and use the command:
psql -d news -f newsdata.sql
More information about the command:
psql
- the PostgresSQL command line program-d news
- connect to the database news-f newsdata.sql
- run the SQl statements in the file newsdata.sql
Running the above command will connect to the installed database server and execute the SQl commands in the downloaded file, creating tables and populating them with data.
Once the data load into the database, connect to the database with this
command psql -d news
and explore the tables using the \dt
and
\d table_name
commands and select
statements
The database includes 3 tables:
- Authors
- Articles
- Log
To run the program from vagrant run python reportdb.py
from the command
line
The objective of this project is to create a reporting tool that prints
out reports based on the data in the database. This reporting tools is just
a simple Python program using the psycopg2
module to connect to the database.
Here are the questions the reporting tool answer:
-
What are the most popular tree articles of all time? Which articles have been accessed the most? The program present this information as a sorted list with the most popular article at the top.
-
Who are the most popular article authors of all time? When summing up all of the articles each author has written, which authors get the most page views? The program present this as a sorted list with the most popular author at the top.
-
On which days did more than 1% of requests lead to errors? The log table include a column status that indicates the HTTP status code that the news site sent to the user's browser.