- Adwords API - Google Adwords API
- Azure Data Lake Store - NoSQL Data Store for Adwords Dummy Data + Potential Other Sources
- Azure Data Factory - Data Pipelining tool for SQL & NoSQL Integration
- Azure Data Lake Analytics - Data Transformation Tool
- PowerBI - Visualization tool
The purpose of this mini project was to demonstrate data integration of multiple data sources into the Azure Data Lake pipeline, and visualize in PowerBI.
- Data Lakes Handle unstructured to semiunstructured data types very well
- Data does not need to be structured to a set schema, thus can be read very easily and quickly
- Data is loosely structured in a lake, work can be very agile and changes can be made on the fly.
- Integrating new data sources is very easy, data does not need to be transformed prior entering the lake.
- Created different dummy data sources with
- AWS S3 - Simple Storage Service - Adwords Dummy Data
- Local Python API: connects my Adwords account to Azure Data Lake
-Create an Azure Data Lake Store (When you first create it, theres no data)
-Create a data pipeline and a job process to run batch integration jobs between S3 Data to ADLS.
- Here I'm transferring over all 75 TSV files containing dummy Adwords data into our ADLS Store.
- What the Dataset looks like
-Once data has been loaded into ADLS, it has to be transformed to be visualized or used by data scientists/analysts alike. -We need to re-compile all 75 TSV files into 1 TSV in order to continue with our work.
-In order to transform the data at hand, we need to write a U-SQL Script (Combination between SQL + .NET Code) -This helps clear out some of the Missing/Null Values -Sets correct Encoding (Unicode-UTF-8)
- We weren't allowed to use our Developer Token until it was approved by Google. Couldn't pull data from our production account.
- Some of the data is not transformmed 100% correctly, but for the purpose of demonstration, this is a quick way to create a data pipeline that can allow analysts/scientists to visualize data sources quickly and responsively.