Logs Analysis

this project for collecting some information from the news database news database has 3 tables articles author logs

prerequisite

  • python 3
  • VirtualBox
  • Vagrant
  • PostgreSQL

Enviroment setup

download the db from here https://d17h27t6h515a5.cloudfront.net/topher/2016/August/57b5f748_newsdata/newsdata.zip then open it by the command

psql -d news -f newsdata.sql

Data Preparation

insert the view to the db by opening the pgadmin then views and create new view and insert the view sql code

running the code

using vagrant up and vagrant ssh to run and login to the VM,create the views and then run the python file to display results

views added to the database

to help in the third query

CREATE OR REPLACE VIEW public."timeStatus" AS
SELECT time::date , 
(100*SUM(CASE WHEN status LIKE '404%' then 1 else 0 end) /COUNT(status))AS res
FROM log

GROUP BY time::date
ORDER BY res DESC;

What are the most popular three articles of all time?

SELECT title,Count(path) FROM log,Articles 
        where  SUBSTRING(path,10,50)=slug 
        GROUP BY title ORDER BY Count(path) DESC  LIMIT 3

Who are the most popular article authors of all time?

SELECT name,COUNT(path)  FROM log,Articles
           INNER JOIN authors ON articles.author=authors.id
           where SUBSTRING(path,10,50)=slug GROUP BY name ORDER BY COUNT(path) DESC

On which days did more than 1% of requests lead to errors?

timeStatus is a view

SELECT * FROM public."timeStatus"
           where res>1