Please fork this repository and send the link to anup.saund@gmail.com
This test has been designed to give candidates an opportunity to demonstrate their knowledge of ETL, SQL and Data Warehousing.
Feel free to use any language or technology you are most comfortable with.
Save SQL files to SQL
, exports to EXPORT
and scripts to SCRIPTS
.
-
Design and create a SQL schema to store the
amazon-cell-phone
files. -
Write a script to load the
.csv
data into the tables created in step1
. -
Write a script to export the table data into the
EXPORT
folder in.csv
format. -
Write a script to search Amazon for an ASIN and then update the fields in the
items
table. -
Download GSMARENA database from https://www.kaggle.com/arwinneil/gsmarena-phone-dataset
-
Write an ETL script to:
- Find a match on
items.title
with a GSMArena Model. - Save
4
additonal meta fields intoitems
from the GSMArena data. - Transform the
network_technology
field into a;
delimited string and save this into a new field initems
- Find a match on
-
Write a SQL statement which to display the following;
ASIN, TITLE, Count of reviews from
reviews
table. -
Using the data from step 7, generate a Pie chart in Microsoft Excel/gSheets to show the top 10 products with the highest number of reviews. Export file to -
Exports
-
Given you had 4 months of development time, how would you go about setting up an automation for step
6
that runs hourly, 7 days a week. (Include technology arangement, alternatives, reasons why and pricing) -
Several stakeholders love the report built in step 7. How would you go about making this available for them to run on demand?