Cloud Data Engineering Course Content

For Training Contact 91 - 8374899166

This Repository has Cloud Data Engineering Training Materials developed by Myla Ram Reddy.

Please contact Renuka for Training and Exam DP-203: Data Engineering on Microsoft Azure @ 8374899166(whatsapp)

Python Basic Level

Install Anaconda
understand markdown language
How to write Python code in normal notepad
How to write Python code in spyder
How to write Python code in Visual Studio Code
How to write Python code in in jupyter/ JupyterLab
Different Python Objects
int
float
complex
str
bool
range
Data Structures
list
Dict
Tuple
Set
Mutable Vs Immutable
Read items of str /list/Dict/Tuple/Set/range ..etc
index
slice
fancy
Operators
Comparision(>,<,>=,<=,...)
Logical/bool(and/or/not)
Numpy logical (logical_and/logical_or/logical_not)
Control Flows
input
if elif elif ... else
while loop
break
continue
for loop

Advanced Python

System_Defined_Functions
create functions
function parameter
manadatory parameters
optional parameters
flexiable parameters
key value flexiable parameters
LEGB_scope_of_objects_of_functions
Methods
Modules
User_defined_packages
system_defined_packages
Iterables & Iterators
Lambda_Functions
Syntax Errors and Exceptions
List comprehensions
OOPs_Introduction_Classes_Objects_Attributes_Methods
OOPs_Inheritance_and_MRO
OOPs_Encapsulation
OOPs_Polymorphism

BigData

BigData Introduction

What is BigData
BigData properties
When to choose bigdata

BigData VM Installation

Oracle Virtual box installation
Cloudera VM installation
winscp Installation
Putty Installation

Linux commands

Working with folders
create folder
remove folder with files
remove folder without files
understanding VI editor
working with Files
create a file
copy file
move file
remove file
cat command
understanding permissions
grep command
find command
... etc

HDFS

mkdir command
put command
get command
CopyFromLocal command
CopyToLocal command
rm Command
merge command
... etc

Hive

Hive Metastore
Hive Managed Tables
Hive External Tables
Hive Operations
Hadoop File Formats and its Types
Different ways to connecting hive
Partitioning
Bucketing

Sqoop

Sqoop Introduction
sqoop list-tables
Sqoop Eval
Sqoop Import
Sqoop Export
Import All Tables
Import table from mysql to hive

Pyspark

Spark Introduction
Spark Architecture
Spark Environment Setup (optional)
Spark RDD with Python
Spark RDD with Scala
Spark DF
Spark SQL
Spark Structured Streaming

ADF(Azure Data Factory)

Introduction

ETL Introduction.
ELT Introduction
Different ETL Tools
Azure Data Factory Introduction
Azure Data Factory - Important Concepts in ADF
ADF Architecture
Create Azure Free Account with credit card
Create Azure Free Account with out credit card

Storage Account

Introduction
What is subscription
What is resource group
create resource group
Create Storage Account
Differences among LRS/GRS/ZRS/GZRS
Difference between Hot and Cool Tiers
Create Data Lake Gen 2
Create Containers
Create Folders
Upload Files
Override Files
Download Files
Edit Files
Preview Files in different formats

Azure SQL Database

Create SQL Database
Create Sql Server
Create Username and password
Allow Azure resources and selected IPS access
Create tables and insert data
Query Tables
Install SSMS
Access Azure SQL Database using SSMS

Linked Service

Create Linked Service to BLOB
Create Linked Service to Azure SQL Database
Create Linked Service to MSFT SQL Server
Create Linked Service to Batch Account
.... etc
Test Linked Service Connection

Integration Run Times

What is Integration Run Time
Types of IRs
Azure integration runtime.
Self-hosted integration runtime.
Azure-SQL Server Integration Services (SSIS) integration runtime.
Install Self-Hosted IR
Configuration of Self-Hosted IR

DataSets

Create Source Datasets
Create Sink Datasets -
Preview data
Create Lookup datasets
Understand and preview data

BLOB to BLOB Pipeline

Create Pipeline
Map source Dataset
Map Sink Dataset
Debug
Trigger
Understand output of run steps
Understand Json log in each step

Azure Storage Account Integration with ADF

Copy multiple files from blob to blob
Filter activity - Dynamic Copy Activity
Get File Names from Folder Dynamically
Copy Activity Behavior in ADF
Copy Activity Performance Tuning in ADF
Get Count of files from folder in ADF
Validate copied data between source and sink in ADF

Azure SQL Database integration with ADF

Azure SQL Databases - Introduction - Relational databases in Azure
Overwrite and Append Modes in Copy Activity in ADF

Incremental Load

What is full load
What is incremental load
types of incremental loads
Incrementally load data from Azure SQL Database to Azure Blob storage
Incrementally load data from multiple tables in SQL Server to a database in Azure SQL Database
Incrementally copy new and changed files based on LastModifiedDate
Incrementally copy new files based on time partitioned file name

Logic Apps

Send Succeeded mail of ADF pipeline with run stats
Send Failed mail of ADF pipeline with error message
Branching and chaining activities

Azure Devops

Create organization
create project
create Git main branch
configure Git to ADF
create a branch in ADF
publish ADF work in Git branch
delete git branches
understand commit in git
understand and debug merge conflicts

DataBricks

DBFS(DataBricks File System)

What is DBFS
Navigate around DBFS
Understanding path of DBFS

Compute (creating clusters)

what is cluster
create cluster
map cluster to notebook

Workspace (Creating notebooks and working with notebooks)

Understand workspace
create folders
organize content in the workspace

Spark Introduction
Spark Architecture
Creating RDDs (Reslient Distributed Dataset)

what is RDD
create RDD
Query RDD

Creating DataFrame

what is DF
create DF
add columns to DF
drop columns from DF
query required data from DF
.. etc

Reading and writing the Data From semi-structrured formats
Reading JSON Files SingleLine/ MultiLine / Complex
Reading XML Files
Reading CSV / TSV Files
Reading and writing the Data From structrured formats
Reading data from MySql / SQL SERVER / Oracle etc..
Reading and writing the Data From BIG DATA formats

Parquet
ORC
AVRO
... etc

Reading and writing the Data From AWS S3
Reading and writing the Data From Azure Blob
PySpark Joins
PySpark Union / UnionAll
Scopes
Delta Lake
ACID Transactions
Delta Live Tables
COPY INTO
Auto Loader
Convert Parquet or Iceberg data to Delta Lake
Scheduling the jobs

SandBox

https://docs.microsoft.com/en-us/learn/modules/create-azure-storage-account/5-exercise-create-a-storage-account?ns-enrollment-type=learningpath&ns-enrollment-id=learn.store-data-in-azure

rritec/Azure-Cloud-Data-Engineering