/udacity-logs-analysis

Logs Analysis project for Udacity FSND

Primary LanguagePython

News.py

news.py is a command line program that answers some useful queries related to the news database. It is useful for getting information on top articles, top authors, and to find the days on which a large number of requests to the news website led to an error.

It uses the psycopg2 library to run SQL queries against the news database.

Setting up

To run the program, you will need to install the provided virtual machine. It comes with PostgreSQL set up with the news database (without the actual data, you will need to import it separately). Steps for getting set up on Ubuntu are explained below, but you can adapt them for whichever machine you're on.

Install Vagrant

$ sudo apt update
$ sudo apt install vagrant

Download the VM configuration

Download this file: news-virtual-machine.zip and unzip it. This will give you a directory called news-virtual-machine.

Open up a terminal and cd into the vagrant directory inside. Then run this command: vagrant up.

This will download an entire Ubuntu based virtual machine from the network. This may take some time depending on your connection speed.

Connect to the virtual machine

Your virtual machine should be running now. To connect to it, run vagrant ssh

Logging out and in

To log out of your virtual machine, run type CTRL+D. To stop the running vm, run vagrant halt.

If you reboot your computer, or you have stopped your virtual machine, start it again by running vagrant up the same way as you did before. Note: This will not download the virtual machine again from the network.

Download the program

While you're connected to your virtual machine, cd into the vagrant directory.

$ cd /vagrant

Then clone the repository into this directory by running $ git clone https://github.com/davidaik/udacity-logs-analysis.git

If git is not installed, install it by running
$ sudo apt install git

This will give you a directory called udacity-logs-analysis which will contain the news.py file along with others.

Setting up the database

To get any useful output form the program, we will need some database with proper data to work with. Download this file: newsdata.zip. Extract it and you will get a file called newsdata.sql. Copy this file into your udacity-logs-analysis directory.

Import the data into your database.

Now, we need to import the data from the newsdata.sql file into your PostgreSQL database which comes with the given VM.

To do this, on your virtual machine, cd into the udacity-logs-analysis directory and run
psql -d news -f newsdata.sql

Now we're ready to use the program.

Python Dependencies

You will need to install the following python modules on your VM.

psycopg2:
$ pip3 install psycopg2

psycopg2-binary: To avoid The psycopg2 wheel package will be renamed warning.
$ pip3 install psycopg2-binary

If you're still getting the warning, uninstall both psycopg2 and psycopg2-binary and run the following in order.

$ pip3 install psycopg2
$ pip3 install psycopg2-binary

Usage

The program is written for Python 3, but it runs with no problems under Python 2.

To use the program, cd into the directory containing the news.py file and execute it with the available query types.

The basic structure of commands for running the program is shown below.

$ python3 news.py QUERY_TYPE -n NUMBER_OF_ROWS

QUERY_TYPE is a required argument that can be replaced with any of the following:

top-articles : Print top articles
top-authors : Print top authors
one-percent-error-days : Print days on which >1% of requests led to an error

-n NUMBER_OF_ROWS is an optional argument, where NUMBER_OF_ROWS is a numerical value. Use it to set the number of rows returned by the query.

1. Print top articles in the database

$ python3 news.py top-articles -n 3

Example output:

$ python3 news.py top-articles -n 3
Candidate is jerk, alleges rival - 338647 views
Bears love berries, alleges bear - 253801 views
Bad things gone, say good people - 170098 views

2. Print top authors

$ python3 news.py top-articles -n 3

Example output:

$ python3 news.py top-articles -n 3
Ursula La Multa - 507594 views
Rudolf von Treppenwitz - 423457 views
Anonymous Contributor - 170098 views

3. Print days on which >1% of page requests led to an error

$ python3 news.py one-percent-error-days -n 3

Example output:

$ python3 news.py one-percent-error-days -n 3
July 17, 2016 - 2.262686% errors
July 1, 2016 - 1.001734% errors
July 25, 2016 - 3.953257% errors

License

MIT License

Copyright (c) 2019 David Heisnam

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.