Sessions Analytics

You’re in charge of implementing a new analytics “sessions” view. You’re given a set of data that consists of individual web page visits, along with a visitorId which is generated by a tracking cookie that uniquely identifies each visitor. From this data we need to generate a list of sessions for each visitor.

The data set looks like this:

{ "events": [ { "url": "/pages/a-big-river", "visitorId": "d1177368-2310-11e8-9e2a-9b860a0d9039", "timestamp": 1512754583000 }, { "url": "/pages/a-small-dog", "visitorId": "d1177368-2310-11e8-9e2a-9b860a0d9039", "timestamp": 1512754631000 }, { "url": "/pages/a-big-talk", "visitorId": "f877b96c-9969-4abc-bbe2-54b17d030f8b", "timestamp": 1512709065294 }, { "url": "/pages/a-sad-story", "visitorId": "f877b96c-9969-4abc-bbe2-54b17d030f8b", "timestamp": 1512711000000 }, { "url": "/pages/a-big-river", "visitorId": "d1177368-2310-11e8-9e2a-9b860a0d9039", "timestamp": 1512754436000 }, { "url": "/pages/a-sad-story", "visitorId": "f877b96c-9969-4abc-bbe2-54b17d030f8b", "timestamp": 1512709024000 } ] } Given this input data, we want to create a set of sessions of the incoming data. A sessions is defined as a group of events from a single visitor with no more than 10 minutes between each consecutive event. A visitor can have multiple sessions. So given the example input data above, we would expect output which looks like:

{ "sessionsByUser": { "f877b96c-9969-4abc-bbe2-54b17d030f8b": [ { "duration": 41294, "pages": [ "/pages/a-sad-story", "/pages/a-big-talk" ], "startTime": 1512709024000 }, { "duration": 0, "pages": [ "/pages/a-sad-story" ], "startTime": 1512711000000 } ], "d1177368-2310-11e8-9e2a-9b860a0d9039": [ { "duration": 195000, "pages": [ "/pages/a-big-river", "/pages/a-big-river", "/pages/a-small-dog" ], "startTime": 1512754436000 } ] } } Notes Timestamps are in milliseconds. Events may not be given in chronological order. The visitors in sessionsByUser can be in any order. For each visitor, sessions to be in chronological order. For each session, the URLs should be sorted in chronological order For a session with only one event the duration should be zero Each event in a session (except the first event) must have occurred within 10 minutes of the preceding event in the session. This means that there can be more than 10 minutes between the first and the last event in the session.

How to Run

Run command ruby runner.rb from your console.

Note: The json file must be placed in the input folder. A file from the problem has been included for your convenience.

Input File:

  • sessions_data.json

Testing

Tests were done with rspec.
rspec spec/<filename>.rb - individual test
rspec - run all tests

Test Files:

  • file_reader_spec.rb - tests for reading the file
  • events_transformer_spec.rb - tests for file input being parsed correctly
  • sessions_analyzer_spec.rb - tests for printing the output

Improvements

  • The analyze method in sessions analyzer can be improved to be more performant