The folder structure of project is as follows:
lib -> Ruby code ie Parser class
logs -> logs files
spec -> rspec ie unit test cases related files
Gemfile -> have all the necessary gems required to run the project
I have used rspec for unit testing the class.
- rspec version: 3.9
- ruby version: ruby-2.6.6
Pass log file path while initialising the Parser class. like: parser = Parser.new("./logs/webserver.log")
The class is exposing following public methods:
- list_views: input: you can sort ('ASC' || 'DSC'): it would return an array of hash: [{"/home"=>78}, {"/help_page/1"=>80}, {"/about"=>81}, {"/index"=>82}, {"/contact"=>89}, {"/about/2"=>90}] if you pass sort as anything other than "ASC" it would just sort in DSC order
- uniq_views: same input/output as list_view Extra bit, commented out since not in requirement
- most_views_by_ip
-
Reading file line by line, that way memory usage is way lower, because when the line is processed then it's garbage collected, that way the size of the Objects Freed is quite high. (refer: https://tjay.dev/howto-working-efficiently-with-large-files-in-ruby/)
-
I have used Hash data structure to parse the log data, because they are much faster for retrieving data than arrays and linked lists. A sorted array could find a particular value in O(log n) with binary search. However, a HashMap can check if it contains a particular key in O(1).
-
Created parser object in such a manner that all the information is preserved, later on if we want to see which page was hit my which IP the most, we have that information.
Scope of improvement:
- we can add further reporting tools on the top of the existing framework, gems like ReportBuilder which will give graphical views: https://github.com/rajatthareja/ReportBuilder/
- we can use buffer in case of really long input