/Data-Engineering-Challenge

In this challenge, I solved a data engineering problem.

Primary LanguageJupyter Notebook

Data-Engineering-Challenge

In this challenge, I solved a data engineering problem. Mainly, it involved doing complex feature engineering on a big data set. I used the following stack to solve the given problem:

  • Pypspark 2.4 (for doing all feature engineering)
  • Docker (for launching spark cluster in a local mode)

The notebook above is a good reference point if you want to see how pandas + pyspark is used together to create complex features.