This program analyzes the log details from the given data from logs provided. This is an internal reporting tool that will use information from the database to discover what kind of articles the site's readers like.
- log_analysis.py
- Vagrantfile
- log_analysis_output.png
- Vagrant (Download)
- Virtualbox (Download)
- log_analysis.py is the python file that calculates and displayes the required output. This contains the sql code to fetch the necessary information from databse,python code to populate and display the desired output.
- Vagrantfile is the file that has the configuration to install and start the Virtual Machine. Using this file we can start the VM and databases.
- log_analysis_output.png is the image file that displays the output after the python code is run.
- Download all the files to your local machine from repository.
- Extract the files into a folder(make sure all required files are in same folder).
- Open a command prompt on your computer and run
vagrant up
to start the VM. - Once the VM is up, run
vagrant ssh
to login to the VM. - Once you get into the VM command line, navigate to the shared folder from you local machine
/vagrant
. This will have below 4 files.- catalog
- forum
- tournament
- Vagrantfile
- Dowload the data set from here and copy the sql file into the shared folder(
/vagrant
). - Once the
newsdata.sql
is copied into shared folder, run below command to import Database.- Command:
psql -d news -f newsdata.sql
- Command:
- Run
psql -d news
to login to the databsenews
. - Run
\dt
to explore or view the tables. - Copy the python code to your shared folder and execute as below command.
- Command:
python log_analysis.py
- Command:
- If the code runs without errors, below output is displayed.
vagrant@vagrant:/vagrant$ python log_analysis.py
==================================================
Most Popular 3 Articles:
Candidate is jerk, alleges rival ==> 338647 views
Bears love berries, alleges bear ==> 253801 views
Bad things gone, say good people ==> 170098 views
==================================================
Most Popular Article Authors of All Time:
Ursula La Multa ==> 507594 views
Rudolf von Treppenwitz ==> 423457 views
Anonymous Contributor ==> 170098 views
Markoff Chaney ==> 84557 views
==================================================
More than 1% of Requests Errors:
2016-07-17 ==> 2.26% errors
To fetch the top 3 articles of all time
, below is the view created.
create or replace view popular_articles as
select title, count(title) as views from articles,log
where log.path = concat('/article/',articles.slug) group by title
order by views desc limit 3;
To fetch the Most Popular Article Authors of All Time
, below is the view created.
create or replace view popular_authors as
select authors.name,count(articles.author) as views from articles, log, authors where log.path = concat('/article/',articles.slug) and articles.author = authors.id group by authors.name order by views desc;
To fetch the More than 1% of Requests Errors
, below is the view created.
create or replace view log_status as
select Date,Total,Error,(Error::float*100)/Total::float as Percent from
(select time::timestamp::date as Date, count(status) as Total, sum(case when status = '404 NOT FOUND' then 1 else 0 end) as Error from log
group by time::timestamp::date) as result
where (Error::float*100)/Total::float > 1.0 order by Percent desc;
Please feel free to make changes to the code and contribute to make the code work better.