/log-analyser

Primary LanguageRubyMIT LicenseMIT

Circle CI Coverage Status Gem Version GitHub code size in bytes Gem

Log-Analyser

About

Simple ruby library to read and parse web-server's log files and aggregate pageview data.

TL;DR

check minimal instructions

Install log-analyser gem. After instantiating log-analyser's PageviewsLogAggregator class with the path to the logfile:
- the method all will return the pageview count
- whilst method unique will return the unique pageview count.

Table of Contents

click to expand the index

Installation

Gem

To use log-analyser in your application, add this line to your Gemfile:

gem 'log-analyser'

Or install it yourself as:

$ gem install log-analyser

Gem Usage

#!/usr/bin/env ruby

require 'pageviews_log_aggregator'

file_path = '/Users/dmazzei/projects/personal/ruby/sp_test/log-analyser/resources/webserver.log'
log_aggregator = LogAnalyser::PageviewsLogAggregator.new(file_path)

puts "\nAll pageviews"
log_aggregator.all.each do |key, value|
  puts "#{key&.to_s&.ljust(28, '.')} | #{value}"
end

puts "\nUnique pageviews"
log_aggregator.unique.each do |key, value|
  puts "#{key&.to_s&.ljust(28, '.')} | #{value}"
end

image

Project

Install the Ruby version specified in .ruby-version
Clone the project and install Bundler

git clone git@github.com:DMazzei/log-analyser.git
cd log-analyser
gem install bundler

Setup:

Run the initial setup

$ bin/setup

If you need to reinstall dependencies or something alike:

$ bundle install

Usage

Call ./bin/parse_pageview_file.rb passing a logfile path as argument, it will return the pageview count ordered from most to less viewed.
Check --help for more options

image

An example log can be found in 📁resources folder:

$ ./bin/parse_pageview_file.rb --file 'resources/webserver.log'
|--------------------------------------------------|
| All pageviews                                    |
|--------------------------------------------------|
| /about/2.................... | 90                |
| /contact.................... | 89                |
| /index...................... | 82                |
| /about...................... | 81                |
| /help_page/1................ | 80                |
| /home....................... | 78                |
|--------------------------------------------------|

The -u or --unique option will also display the unique pageview count:

$ ./bin/parse_pageview_file.rb --file 'resources/webserver.log' -u

And any specific page can be filtered with -p or --page:

$ ./bin/parse_pageview_file.rb --file 'resources/webserver.log' -p '/index'
|--------------------------------------------------|
| View count for page: /index                      |
|--------------------------------------------------|
| All pageviews                                    |
|--------------------------------------------------|
| /index...................... | 82                |
|--------------------------------------------------|

Logs and Pageviews

Definitions

📄 A pageview is defined as a view of a page on your site that is being tracked by the Analytics tracking code. If a user clicks reload after reaching the page, this is counted as an additional pageview. If a user navigates to a different page and then returns to the original page, a second pageview is recorded as well.

📃 A unique pageview, as seen in the Content Overview report, aggregates pageviews that are generated by the same user during the same session. A unique pageview represents the number of sessions during which that page was viewed one or more times.

Log Formatting

The library is prepared to parser text files, containing one entry per line, in the format: \page_name identifier.

A space must separate the page name (first column) from the user identifier (e.g. IP address):

/help_page/1 126.318.035.038
/contact 184.123.665.067
/home 184.123.665.067

Development

Start with the project:

$ git clone git@github.com:DMazzei/log-analyser.git
$ cd log-analyser
$ gem install bundler
$ bundle install

And the world is your oyster...

You can also run $ bundle exec console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run $ bundle exec rake install. To release a new version, update the version number in version.rb, and then run $ bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Linter (rubocop)

Rubocop is used as code analyser and maintain code formatting (as well as some best practices).

Use $ bundle exec rake rubocop to run the checks.

Test coverage

Coverage Status

Use $ bundle exec rspec or $ bundle exec rake spec:all to run all the tests.

✅ To run only unit-tests

$ bundle exec rake spec:unit

✅ To run only integration tests

$ bundle exec rake spec:integration

The test coverage is handled by rspec, simplecov and coveralls. Status and coverage history can be checked here.

Deployment

Following the creation of a Pull Request a CI workflow is triggered in CircleCI, that can be checked here.
This workflow consist in building the library; Running rubocop and rspec to validate integrity and code quality; And lastly generating and pushing a feature-gem that can be used for development and tests.

After passing all checks and requirements on github, a PR can be merged as soon as it is reviewed and approved. The master branch merge process will trigger the deployment process on CircleCI, and this workflow ends with the generation of a tagged-gem.

The whole deployment process will finish by building and tagging a new gem version and pushing it to rubygems.org.

⚠️ To merge changes into master, the version must be bumped up, otherwise the deployment will fail!
The version must be updated in version.rb.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/DMazzeig/log-analyser.

Next Steps

  • One conundrum faced that can be reviewed, deciding between:
    • reading the file whilst aggregation data, preserving memory - e.g. using Set;
    • loading data into memory and leaving aggregation and count to be dealt later, gaining flexibility and performance;
  • Extend the accepted logfile format;
  • Add more options for sorting and filtering;
  • Automate library version bump up;

License

The gem is available as open source under the terms of the MIT License.