The goal is to create a reporting tool that prints out reports (in plain text) based on the data in the database. This reporting tool is a Python program using the psycopg2 module to connect to the database. In this project, we'll work with data that could have come from a real-world web application, with fields representing information that a web server would record, such as HTTP status codes and URL paths. The web server and the reporting tool both connect to the same database, allowing information to flow from the web server into the report.
- Install VirtualBox
- Setup the Vagrant
- Clone fullstack-nanodegree-vm repository.
- Download the data from here.
- After unzipping you'll find a newsdata.sql file inside.
- Copy the content of this repository, by cloning it from this link
- Copy newdata.sql
What are the most popular three articles of all time?
Which articles have been accessed the most? Present this information as a sorted list with the most popular article at the top.
Who are the most popular article authors of all time?
That is, when you sum up all of the articles each author has written, which authors get the most page views? Present this as a sorted list with the most popular author at the top.
On which days did more than 1% of requests lead to errors?
The log table includes a column status that indicates the HTTP status code that the news site sent to the user's browser.
- Go to the downloaded FSND-Virtual-Machine folder, install the vagrant inside the vagrant folder then run the following command:
$ vagrant up
- To connect to the vagrant use the following command:
$ vagrant ssh
- Change directory to /vagrant using the command:
$ cd /vagrant
- To exit the vagrant:
$ exit
- To load the database in our vagarant
psql -d news -f newsdata.sql
The database includes three tables:
- The authors table includes information about the authors of articles.
- The articles table includes the articles themselves.
- The log table includes one entry for each time a user has accessed the site.
-
Run
psql -d news
to connect to the database. -
Explore the tables using the
\dt
and\d table
commands andselect
statements.
- ar_view (for more information, visit the screenshot folder on this repository) repo link
create view ar_view as select title,author,count(*) as my_views from log,articles where
log.path like concat('%',articles.slug) group by articles.title,articles.author
order by my_views desc;
- l_view (for more information, visit the screenshot folder on this repository) repo link
create view l_view as select date(time),round(100.0*sum(case log.status when '200 OK'
then 0 else 1 end)/count(log.status),2) as "error" from log group by date(time)
order by "error" desc;
$ python LogAnalysis.py
Top 3 articles are :-
Article "Candidate is jerk, alleges rival" has 338647 views.
Article "Bears love berries, alleges bear" has 253801 views.
Article "Bad things gone, say good people" has 170098 views.
Most popular authors are :-
Author "Ursula La Multa" has 507594 views.
Author "Rudolf von Treppenwitz" has 423457 views.
Author "Anonymous Contributor" has 170098 views.
Author "Markoff Chaney" has 84557 views.
The days on which more than 1% of requests lead to errors :-
The day 2016-07-17 had 2.26% errors.