Fake Shop dbt

Welcome to Fake Shop. We sell products in various categories to customers. We also like to send emails to these customers.

This dbt project will contain all our data models to help the business make decisions and understand the performance of the company.

However the analytics engineer left before we could finish the project and we need your help to answer some key questions.

The data for the Fake Shop was generated using the python script you can find in data_generator folder. You will need to install the faker python package if you want to run this script yourself. Running the script yourself will allow you to generate more records, however there is a sample of the csv files that are generated by the script in the seeds folder of the project.

A few staging models have been created in the models folder. There are some issues with these models that will need to be resolved before you can continue to help us build out the other data models.

We make use of snowflake as our datawarehouse. If you want to run things yourself you can setup your own warehouse for free for 30 days here https://signup.snowflake.com/. Alternativley use a database that you are comfortable working in.

You can choose to run this project in dbt cloud (Sign up for a free account here https://discover.getdbt.com/free-account/) or you can run it locally. (There are guides on dbt's website on how to do this.)

If you would prefer to run dbt locally instead of in dbt cloud, using Deep Channel (https://www.deepchannel.com/) makes it a lot easier (gives the local dev process a bit of a dbt cloud experience). Alternatively feel free to use any IDE you prefer.

Before you begin

A link to a public Git repository with your final solution must be provided within 48 hours of receipt of the test. (let us know if you need more time) Fork this repo. To help understand how you approach the problem, we will assess your use of source control and how you build to the final solution, checking what is committed along each step (hint: frequent push)

Actions:

  1. Read the yml files in the seeds folder to get a feel for the data.
  2. Fix any errors with the staging models and ensure that they all follow the same conventions.
  3. Create the necessary models (in the appropriate folder in the models folder) to answer the following questions. In the documentation (yml files) for each of the models explain why you chose to model the data in that way. Make sure you include relevant tests for each of the models you create. Create sql files in the analyses folder if a model is not neccesary to answer the question.
  • Which month is the busiest for orders?
  • Which month do we make the most money?
  • Who are our top 10 most loyal customers?
  • Which categories do we sell the most products in?
  • Have we sent any emails to someone who is unsubscribed?
  • What is the least popular product?
  • What product do we issue the most refunds for?
  • The fact email table should contain the last email_id for the email that was sent previously in a column called last_sent_email_id. Use a window function to acheive this.
  1. We have no way of currently tracking when a customer subscribes or unsubscribes, can you create a model that would track this?
  2. Present your findings in an easy to understand manner (you can use any tool that would make this easiest, markup files, google slides, docs, diagrams, graphs, miro boards etc)

Bonus Question:

  1. Oh no, it seems as if we haven't actually been discounting products when they are on sale.
  • Can you find an issue in our fake_shop_data_generator.py file and correct it?
  • How would we track when products go in and out of sale, and be able to test that this doesn't happen again in the future?
  1. Are there any other interesting insights you can share?

Finally

Email the URL of your repo to the NOTHS people team. Please ensure it is publicly accessible.

Helpful Tips

If you struggle completing the task or have concerns over certain aspects that is okay – just highlight it to us when you submit your task for evaluation.

Explain what you couldn't get working and steps you took to solve the problem. Whilst we want to see completed tasks it is just as important for us to see how you approached an issue and attempted to find a solution.

Do not overthink your solution. Keep it simple and use what you know. Write tests for your models.