NoSQL Project : Simulation of 2016's US Election Counting

This project aims to simulate the US election counting in streaming. During the simulation, we have to handle a fault of the database.

Architecture

The architecture chosen is the following

  • An AWS cluster of 5 micro instances
  • 2 instances are NAT (a secondary and a primary) which host the software and http site
  • 3 instances are MongoDB replica sets

ETL

The raw data are CSV files, one by states, with one line per vote, with timestamp, state and the candidate. The timestamp is the same for each vote inside one file so we proceeded this way :

  • Import the data into mongoDB with a bash script invoking Mongoimport
  • Aggregate the data by timestamp / state and candidate and store it in an other collection
  • Dump the first collection

Data retrieval and display

The backend is realised with python, we use Pymongo to connect to the Database and export the aggregated data in JSON. The frontend is realised in HTML/CSS/JS (d3.js), the system works as follow :

  • Connect to the DB, try to export the data (and manage the fault-tolerance in case of one of the replica set fails)
  • Export them in JSON
  • Periodically load the data in AJAX in the index.html
  • Refresh the map, number of Great Electors by candidates, and the winner's estimation with fresher data