arunkpatra/athena

Programming Challenge

Closed this issue · 1 comments

Why?

  1. The immediate goal is to gain some preliminary insights from the data. The data is transactional data and some reference data, essentially written once but read many times for analytical purposes.

How?

  1. It would make sense to have transactional data (and reference data as well) loaded into S3, and then use a variety of tools to look at the data.
  2. We use Amazon Redshift to start with. We copy data from S3 into Redshift and will do some EDA. Later on, we will attempt using Spectrum instead of copying data over to Redshift.
  3. We will consume the Redshift queries in a thin API layer (REST APIs).

What?

  1. Model card data, customer data, merchant data and transaction log data.
  2. Do EDA to get some meaningful insights.
  3. Expose insights via REST APIs. Spring Boot stack.
  4. Time permitting, do a ReactJS UI
  5. The overarching objective is to have a fully working model that works end to end for which a demonstration can be done. This demonstration should exhibit, sound engineering practices, architectural maturity, design, logical thinking and coding capabilities.

Exclusions

  1. The model in this challenge is not expected to work with massive scale. The analytical queries would have opportunities to be tuned to work for scale at a later time progressively.

What next?

See #18

Closing this for now.