Data Engineering Interview Task

Your task is to create a project, where you will load and transform the data from the provided dataset. This is not exactly a data analysis task.

PLEASE DO NOT SPEND MORE THAN 3 HOURS OF YOUR TIME

Task

  1. Prepare a basic structure for your project. You might want to include:
    1. Package requirements using your favourite package manager (pip, poetry etc.)
    2. .gitignore
    3. README
  2. Prepare a code that will:
    1. Load the provided CSV files
    2. Create following aggregations:
      1. Age distribution
      2. Annual income distribution
      3. Annual income correlated with age
    3. Save the results in selected binary format.
    4. Save the results in selected serialization format.
  3. Create unit tests using your favourite testing library.
  4. Create a dockerfile that will run the tests inside the container.
  5. Create a public git repository and provide us with a link.

Dataset

Link to Mall Customers Dataset on Kaggle

The dataset is provided in data folder as a zipped csv file.

Additional Questions

If you have any questions regarding the tasks, please do not hesitate to contact us.