- A cloud-based data integration service that allows you to orchestrate and automate data movement and data transformation
- Microsoft's main data engineering tool
- PaaS solution
- Data integration service
- Orchestration and automation
- Data transformation
- ETL vs ELT
- ETL
- Transform before loading
- Designed for realibility
- ELT
- Transform after loading (on DW)
- Designed for agility
- ETL
- ADF has event based trigge
- For example, when a file is added to a blob storage
- ADF can run SSIS packages
- Two versions
- ADF v2 the current and improved version
- Build options
- Portal
- PowerShell
- .NET
- Python
- REST
- ARM
- Portal
- Highly integrated
- DevOps
- Key Vault
- Monitor
- Automation
- Not a data storage
- Activity: Copy, Move, Transform, etc.
- Pipeline: Logical grouping of activities - unit of work
- Integration Runtime: Compute infrastructure that executes the activities within a pipeline
- Trigger: Power switch
- Linked Service: Where data is? A connection to source, Blog Storage, SQL Server, Amazon S3, Salesforce, etc.
- Dataset: data structures or files that are ingested or processed by Azure Data Factory, such as tables, blobs, or files.
- Three types of activities:
- Data Movement
- Data Transformation
- Data Control
- Data Flow execution
- Data Movement execution
- Dispatch of Activities
- SSIS Package execution
- Linked Services are like connection strings
- Two types:
- Data Stores
- External Compute
- Datasets are about data structure
- SQL Database/Table
- Blob Storage/File
- Schedule: Periodic, forward looking only (next schedule time), runs on dates configured, fire and forget
- Tumbling Window: Time-sliced - run every X number of hours, supports backfill, runs at fixed interval, retry, concurrency, system variables
- Event-based: Fired on Blob creation or deletion
- More than just copy - serialization, compression, column mapping
- 83 connectors and counting
- 6 different formats
- Enterprise features - incremental copy, resume from last failed run, etc.