/Promo-Abuser

Promo Abuser Data Engineering Task for Noaat.

Apache License 2.0Apache-2.0

Noaat task

Introduction

In a big digital world where stores give out special codes for discounts, there's a problem. Some customers are trying to trick the system. They make lots of accounts to use the same discount promo codes over and over again.

But don't worry! There are smart folks called data engineers. They're like digital detectives. They're on a big mission to stop this cheating and make things fair again.

Dataset Explanation.

You will be given 3 datasets of 10,000 users with 1,000 promo codes, and promo code usages.

IMPORTANT NOTE

Think twice before you implement the needed solution, don't go for shiny solution, you may find the needed solution should be simple and straightforward.

Some information about the datasets

Dataset was completely built using python faker library, and it's not real data, although it's structured like real data. and It was taken into consideration some aspects regarding the promo code usage, and user registration. Also, user attributes like phone, device, location, and IP were carefully generated to make the dataset more realistic. (That could be a hint or no ;))

Your task as a data engineer is to do the following:

  1. Propose an ETL pipeline to transfer data from a PostgreSQL table to a data warehouse. Consider utilizing diagrams to illustrate the pipeline.
  2. Utilize an Analytics tool to develop a dashboard showcasing:
    • The top 10 users with the highest promo code usage.
    • The top 10 promo codes with the most usage.
    • A timeline of promo code usage.
    • A timeline of user registrations.
  3. Develop an Algorithmic solution/or AI model to predict whether a user is engaging in promo code abuse.
  4. Deploy the Algorithmic solution/or AI as an API.
  5. Create straightforward documentation detailing how to utilize the API.
  6. Prepare simple documentation outlining the deployment process of the API.
  7. Craft easy-to-follow instructions on how to access the dashboard.
  8. Provide commentary on intriguing discoveries from the data, your problem-solving approach, and any assumptions made throughout the project.

Additionally, consider the following questions when preparing your submission:

  1. How will you handle data validation and cleaning during the ETL process?
  2. What scalability considerations are important when designing the ETL pipeline?
  3. What metrics will you use to evaluate the performance of the Statistical solution/or AI model?
  4. How will you ensure the security of the deployed API?
  5. What steps will you take to monitor the performance and reliability of the deployed API?
  6. How will you handle potential errors or failures in the ETL pipeline or API deployment?

Deliverables

Your submission should include the following (please provide a link to a GitHub repository, adding a README.md file with the following information):

  1. A detailed explanation of the ETL pipeline, including diagrams.
  2. A dashboard developed using an Analytics tool.
  3. An Algorithmic solution/or AI model to predict promo code abuse based on a new user using the application.
  4. An API deployed to predict promo code abuse.
  5. Documentation on how to utilize the API.
  6. Documentation on the deployment process of the API.

Timeframe

You are expected to complete this task within an overage of 4-5 hours and maximum of 6 hours. However, feel free to allocate your time as you see fit within this timeframe. Also, please provide a brief explanation of how you spent your time.

Note

We understand that the time frame is limited, and we do not expect a perfect solution. We are more interested in understanding your thought process and how you approach the problem. We also want to evaluate which areas of the task you prioritize and how you manage your time.

Evaluation Criteria

Your submission will be evaluated based on the following criteria:

  1. The quality of the ETL pipeline and its documentation.
  2. The quality of the dashboard and its documentation.
  3. The quality of the Algorithmic solution/or AI and its documentation.
  4. The quality of the API and its documentation.
  5. The quality of the deployment documentation.
  6. The quality of the commentary on the data, problem-solving approach, and assumptions made throughout the project.

Submission

Please submit your work by providing a link to a GitHub repository containing the required deliverables. Ensure that the repository is accessible, add me as a collaborator, and provide any necessary instructions for reviewing your work.

Always Remember

  • (Make it work, Make it better!)
  • Progress Over Perfection (It's okay to leave //TODOs for future improvements).

Good Luck! 🚀