/Cloud-Data-Engineering

Azure Data Factory

Primary LanguageJupyter Notebook

Cloud Data Engineering Course Content

For Training Contact 91 - 8374899166


This Repository has Cloud Data Engineering Training Materials developed by Myla Ram Reddy.

Please contact Renuka for Training and Exam DP-203: Data Engineering on Microsoft Azure @ 8374899166(whatsapp)


Python Basic Level

  1. Install Anaconda
  2. understand markdown language
  3. How to write Python code in normal notepad
  4. How to write Python code in spyder
  5. How to write Python code in Visual Studio Code
  6. How to write Python code in in jupyter/ JupyterLab
  7. Different Python Objects
  8. int
  9. float
  10. complex
  11. str
  12. bool
  13. range
  14. Data Structures
  15. list
  16. Dict
  17. Tuple
  18. Set
  19. Mutable Vs Immutable
  20. Read items of str /list/Dict/Tuple/Set/range ..etc
  21. index
  22. slice
  23. fancy
  24. Operators
  25. Comparision(>,<,>=,<=,...)
  26. Logical/bool(and/or/not)
  27. Numpy logical (logical_and/logical_or/logical_not)
  28. Control Flows
  29. input
  30. if elif elif ... else
  31. while loop
  32. break
  33. continue
  34. for loop

Advanced Python

  1. System_Defined_Functions
  2. create functions
  3. function parameter
  4. manadatory parameters
  5. optional parameters
  6. flexiable parameters
  7. key value flexiable parameters
  8. LEGB_scope_of_objects_of_functions
  9. Methods
  10. Modules
  11. User_defined_packages
  12. system_defined_packages
  13. Iterables & Iterators
  14. Lambda_Functions
  15. Syntax Errors and Exceptions
  16. List comprehensions
  17. OOPs_Introduction_Classes_Objects_Attributes_Methods
  18. OOPs_Inheritance_and_MRO
  19. OOPs_Encapsulation
  20. OOPs_Polymorphism

BigData

  1. BigData Introduction
  • What is BigData
  • BigData properties
  • When to choose bigdata
  1. BigData VM Installation
  • Oracle Virtual box installation
  • Cloudera VM installation
  • winscp Installation
  • Putty Installation
  1. Linux commands
  • Working with folders
  • create folder
  • remove folder with files
  • remove folder without files
  • understanding VI editor
  • working with Files
  • create a file
  • copy file
  • move file
  • remove file
  • cat command
  • understanding permissions
  • grep command
  • find command
  • ... etc
  1. HDFS
  • mkdir command
  • put command
  • get command
  • CopyFromLocal command
  • CopyToLocal command
  • rm Command
  • merge command
  • ... etc
  1. Hive
  • Hive Metastore
  • Hive Managed Tables
  • Hive External Tables
  • Hive Operations
  • Hadoop File Formats and its Types
  • Different ways to connecting hive
  • Partitioning
  • Bucketing
  1. Sqoop
  • Sqoop Introduction
  • sqoop list-tables
  • Sqoop Eval
  • Sqoop Import
  • Sqoop Export
  • Import All Tables
  • Import table from mysql to hive
  1. Pyspark
  • Spark Introduction
  • Spark Architecture
  • Spark Environment Setup (optional)
  • Spark RDD with Python
  • Spark RDD with Scala
  • Spark DF
  • Spark SQL
  • Spark Structured Streaming

ADF(Azure Data Factory)

  1. Introduction
  • ETL Introduction.
  • ELT Introduction
  • Different ETL Tools
  • Azure Data Factory Introduction
  • Azure Data Factory - Important Concepts in ADF
  • ADF Architecture
  • Create Azure Free Account with credit card
  • Create Azure Free Account with out credit card
  1. Storage Account
  • Introduction
  • What is subscription
  • What is resource group
  • create resource group
  • Create Storage Account
  • Differences among LRS/GRS/ZRS/GZRS
  • Difference between Hot and Cool Tiers
  • Create Data Lake Gen 2
  • Create Containers
  • Create Folders
  • Upload Files
  • Override Files
  • Download Files
  • Edit Files
  • Preview Files in different formats
  1. Azure SQL Database
  • Create SQL Database
  • Create Sql Server
  • Create Username and password
  • Allow Azure resources and selected IPS access
  • Create tables and insert data
  • Query Tables
  • Install SSMS
  • Access Azure SQL Database using SSMS
  1. Linked Service
  • Create Linked Service to BLOB
  • Create Linked Service to Azure SQL Database
  • Create Linked Service to MSFT SQL Server
  • Create Linked Service to Batch Account
  • .... etc
  • Test Linked Service Connection
  1. Integration Run Times
  • What is Integration Run Time
  • Types of IRs
  • Azure integration runtime.
  • Self-hosted integration runtime.
  • Azure-SQL Server Integration Services (SSIS) integration runtime.
  • Install Self-Hosted IR
  • Configuration of Self-Hosted IR
  1. DataSets
  • Create Source Datasets
  • Create Sink Datasets -
  • Preview data
  • Create Lookup datasets
  • Understand and preview data
  1. BLOB to BLOB Pipeline
  • Create Pipeline
  • Map source Dataset
  • Map Sink Dataset
  • Debug
  • Trigger
  • Understand output of run steps
  • Understand Json log in each step
  1. Azure Storage Account Integration with ADF
  • Copy multiple files from blob to blob
  • Filter activity - Dynamic Copy Activity
  • Get File Names from Folder Dynamically
  • Copy Activity Behavior in ADF
  • Copy Activity Performance Tuning in ADF
  • Get Count of files from folder in ADF
  • Validate copied data between source and sink in ADF
  1. Azure SQL Database integration with ADF
  • Azure SQL Databases - Introduction - Relational databases in Azure
  • Overwrite and Append Modes in Copy Activity in ADF
  1. Incremental Load
  • What is full load
  • What is incremental load
  • types of incremental loads
  • Incrementally load data from Azure SQL Database to Azure Blob storage
  • Incrementally load data from multiple tables in SQL Server to a database in Azure SQL Database
  • Incrementally copy new and changed files based on LastModifiedDate
  • Incrementally copy new files based on time partitioned file name
  1. Logic Apps
  • Send Succeeded mail of ADF pipeline with run stats
  • Send Failed mail of ADF pipeline with error message
  • Branching and chaining activities
  1. Azure Devops
  • Create organization
  • create project
  • create Git main branch
  • configure Git to ADF
  • create a branch in ADF
  • publish ADF work in Git branch
  • delete git branches
  • understand commit in git
  • understand and debug merge conflicts

DataBricks

  1. DBFS(DataBricks File System)
  • What is DBFS
  • Navigate around DBFS
  • Understanding path of DBFS
  1. Compute (creating clusters)
  • what is cluster
  • create cluster
  • map cluster to notebook
  1. Workspace (Creating notebooks and working with notebooks)
  • Understand workspace
  • create folders
  • organize content in the workspace
  1. Spark Introduction
  2. Spark Architecture
  3. Creating RDDs (Reslient Distributed Dataset)
  • what is RDD
  • create RDD
  • Query RDD
  1. Creating DataFrame
  • what is DF
  • create DF
  • add columns to DF
  • drop columns from DF
  • query required data from DF
  • .. etc
  1. Reading and writing the Data From semi-structrured formats
  2. Reading JSON Files SingleLine/ MultiLine / Complex
  3. Reading XML Files
  4. Reading CSV / TSV Files
  5. Reading and writing the Data From structrured formats
  6. Reading data from MySql / SQL SERVER / Oracle etc..
  7. Reading and writing the Data From BIG DATA formats
  • Parquet
  • ORC
  • AVRO
  • ... etc
  1. Reading and writing the Data From AWS S3

  2. Reading and writing the Data From Azure Blob

  3. PySpark Joins

  4. PySpark Union / UnionAll

  5. Scopes

  6. Delta Lake

  7. ACID Transactions

  8. Delta Live Tables

  9. COPY INTO

  10. Auto Loader

  11. Convert Parquet or Iceberg data to Delta Lake

  12. Scheduling the jobs

SandBox

https://docs.microsoft.com/en-us/learn/modules/create-azure-storage-account/5-exercise-create-a-storage-account?ns-enrollment-type=learningpath&ns-enrollment-id=learn.store-data-in-azure