/Data-Cleaning-Pipeline-ETL

This project extracts data from Azure datalake gen 2 storage, transforming it and then transferring it to SQL database.

Data-Cleaning-Pipeline-ETL

Click here to see the dataset.

Problem Statement

The problem was to extract data from two csv files with the records of 40k plus rows from the azure datalake gen2 storage then combine those two files with the the SQL join to create a single table then perform some cleaning like removing null values, unnecessary columns and then transfer it to azure SQL database.

The Json files

The First Portfolio Project.json file contains information about the ADF pipeline, including the pipeline name, description, and the resources that make up the pipeline. The manifest.json file contains information about the dependencies and structure of the ARM template of the pipeline in Azure DataFactory.

Workflow

Untitled Diagram drawio

Pipeline Structure

Capture

Data at the Destination (SQL database)

Capture2