DE-Kaggle-Airflow-Azure

Goal

This project is based on video tutorial by Darshil Parmar.

The goal of this project is to perform Basic ETL Data Pipeline on Kaggle data using various tools and technologies, including Azure Storage, Python, Virtual Machnine, Airflow Data Pipeline Tool.


What Did I Learned?

  • Whole project of Tutorial was based on AWS, Here, I implemented whole project on Azure.
  • Hands On Experinece with Python, Azure, Azure Storage Account, VM, Airflow.
  • End-to-End Pipeline with Airflow.
  • Challenges Overcame
    • VM Setup & Dependencies Download Error
    • Kaggle API directly download on local directory. I want to download directly on Azure Blob Storage. Here, I came to know about tempfile library which downloads on temperory location and then can upload.
    • During Installing Airflow in Ubuntu VM, Airflow shows some error so need to create virtual-environment(venv) and then install in that venv.

Tools Used

Python Jupyter vscode

a VM Azure Storage Account

Azure Storage Account


Data Architecture

Architecture