datatable-exercise: A Ruby repository from coderaven

Object Properties Log Explorer Exercise¶ ↑

There’s a Parallel Branch¶ ↑

UPDATE: Added a sidekiq based background job to make the csv importing faster for bulky csv importing. (See parallel_processing branch)

The parallel branch involves usage of SideKiq for Job queueing, Google Drive for S3 Substitute on uploading files and handling them later

The process:

Upload -> Saved to GDrive (public folder) -> SideKiq queues a File Open to GDrive Path -> From File Open Parsing Sidekiq queues each processing of row of csv to ensure only valid data is recorded in the database.

Description¶ ↑

We have a CSV (Comma Separated Values) file of N-rows for the following Objects:

ObjectA: {property1, property2, property3}
ObjectB: {property1, property2, property3}
.
..
...

object_id | object_type | timestamp | object_changes
:-------: | :---------: | :--------: | :------------
 1        |  ObjectA    |  412351252 | {property1: "value1", property3: "value2"}
 1        |  ObjectB    |  456662343 | {property1: “another value1"}
 1        |  ObjectA    |  467765765 | {property1: "altered value1", property2: “random value2"}
 2        |  ObjectA    |  451232123 | {property2: “some value2"}
...       |  ...        |  ...       | ...

The CSV columns are:

- **object_id:** is a unique identifier per-object type.
- **object_type:** denotes the object type.
- **timestamp:** 
- **object_changes:** the properties changed for specified object at **timestamp**.

The program should run behind a simple http interface, accept a file upload (the csv file in this case) and then allow users to query the system about states of objects at a specific point of time .e.g. what were the properties associated with ObjectA holding Id=100 On May 4th, 2016 12:00:00AM UTC ?

For simplicity, assume a moderate number of classes/type (no more than 3) and moderate/acceptable number of properties per-class (with simple-serializable data types).

Ruby version

ruby 2.3.1

System dependencies (For Tests and Database Drivers - Only tested and deveoped on Mac OSX)

$ bundle install

$ brew install mongodb

$ brew services start mongodb

$ npm install phantomjs
Configuration

Mongoid configuration can be done through a mongoid.yml that specifies your options and clients. The simplest configuration is as follows, which sets the default client to “localhost:27017” and provides a single database in that client named “mongoid”.

development:
clients:
  default:
    database: mongoid
    hosts:
      - localhost:27017

Rails Applications

I have already configured the mongoid.yml but if you want to specify, you can generate a config file by executing the generator and then editing myapp/config/mongoid.yml to your heart’s desire. Mongoid will then handle everything else from there.

$ rails g mongoid:config

Database creation / initialization

Since Mongodb is a document based database - it does not need any migrations whatsoever so you can just immediately proceed.

In a clean / empty state, you can upload your csv file on the running app (e.g. localhost:3000) or by running a premade database seeder (contains 4 default given records and 100 randomly generated records):

$ rake db:seed

How to run the test suite

I used Rspec in conjuction with Capybara, PhantomJs, FactoryGirl, and Guard. I have already deleted any tests from Rails generator. So just run:

$ rake

or for the automated testing:

$ bundle exec run guard
(then press enter on the guard command to execute all)

Services (job queues, cache servers, search engines, etc.)

I have used the SmarterCSV gem to make csv importing faster, but reading its documentation - it can still make the uploading faster by using job queues. Currently this is still under development.

Deployment instructions

Note that the deployed environment I am using is Heroku’s free-tier which only has 25mb of Ram and shared CPU allocation. This might become really slow on csv with 100,000+ datasets. But it only is slow on the upload part - I have architect the system to do ajax / lazy loading to fasten up.

$ heroku create
$ heroku addons:create mongolab:sandbox

then on your mongoid.yml:

production:
clients:
  default:
   # The standard MongoDB connection URI allows for easy
   # replica set connection setup.
   # Use environment variables or a config file to keep your
   # credentials safe e.g. <%= ENV['MONGODB_URI'] %>.
   uri: <%= ENV['MONGODB_URI'] %>

   options:
     # The default timeout is 5, which is the time in seconds
     # for a connection to time out.
     # We recommend 15 because it allows for plenty of time
     # in most operating environments.
     connect_timeout: 15

coderaven/datatable-exercise

Object Properties Log Explorer Exercise¶ ↑

There’s a Parallel Branch¶ ↑

Description¶ ↑