The repo simulates query load, optimizes performance, and offers practical guidance for building data warehouses with Azure Synapse Analytics. 🚀
The project follows the following architecture,
- data: The datasets, fact, and dimension tables for the data warehouse
- Queries_Part4.sql: SQL code for querying and analyzing the created data warehouse
- setup.json: ARM (Azure Resource Manager) template. It is a block of code that defines the infrastructure and configuration for the project
- setup.ps1: PowerShell script to provision the Azure Synapse Workspace along with the tables of the data warehouse (configured in setup.sql)
- setup.sql: SQL script for creating the tables of the data warehouse
- table_creation_codes.sql: SQL script to create the fact and dimension tables with dedicated SQL pool
- An active Azure subscription
- Knowledge of Azure Data Fundamentals, Good to begin from here
- Clone the Repository:
git clone https://github.com/tahhnik/Designing-Large-Scale-Data-Warehouse-with-Azure-Synapse-Analytics.git
Clone the repository inside Azure workspace through PowerShell on Azure Portal
git clone https://github.com/tahhnik/Designing-Large-Scale-Data-Warehouse-with-Azure-Synapse-Analytics.git
-
Explore the Directories: Navigate into each directory to find detailed automation scripts, SQL codes for queries, and configurations.
-
Follow the Blog: Implementation details and insights are documented in the associated series of blog posts in Medium.
⚡If you want to skip the initial processes like data modeling, and schema design and directly jump onto building the warehouse on Azure Synapse Analytics, follow the Blog Three,
⚡If you want to skip the building warehouse processes on Azure Synapse Analytics and directly jump onto querying the tables with SQL (T-SQL in this context), follow the Blog Four
- Azure Synapse Analytics: Enterprise analytics service for data warehouses and big data systems.
- Azure Portal: Unified console to manage Azure resources.
- Azure Stream Analytics: Real-time stream processing engine.
- Azure Machine Learning: Cloud service for ML project lifecycle.
- Azure Data Lake Storage Gen2: Scalable storage for data lakes.
- Power BI: Business analytics service for data visualization.
- Azure Function Apps: Serverless applications for event-driven scenarios.
- Azure Cosmos DB: Globally distributed, multi-model database.
To implement this project, follow the step-by-step guide in our detailed blog post. Learn how each tool plays a crucial role in creating and scaling a data warehouse on Azure.
Blog One: From Design to Deployment: Data Warehousing with Azure Synapse Analytics (Part 1: Architecture)
![Final Draft](https://private-user-images.githubusercontent.com/25973761/324171688-a0ef0dbe-8486-40f3-b817-0a4068f5e432.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTU3NDUzNDMsIm5iZiI6MTcxNTc0NTA0MywicGF0aCI6Ii8yNTk3Mzc2MS8zMjQxNzE2ODgtYTBlZjBkYmUtODQ4Ni00MGYzLWI4MTctMGE0MDY4ZjVlNDMyLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA1MTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNTE1VDAzNTA0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTg1ZjQ0ODI3ZmMwZjc1M2MzZDY0OTM3YThmOTEyYmQ0MzA5NzgzYzEwMGRhNTZmYjY1NDQ3ZDU1YzBkNjdhOTEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0._xkIUAGg3Ix63t56hA66Z5XLqvAowCyis8QWSEangrU)
Components:
- The architecture of the data warehouse
- The details of the data pipeline
- Brief Discussions of the tools and processes used
Blog Two: From Design to Deployment: Data Warehousing with Azure Synapse Analytics (Part 2: Data Modeling and Schema Design)
![Final Draft (3)](https://private-user-images.githubusercontent.com/25973761/324171701-21347ad3-03d5-47de-a624-672ece59e261.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTU3NDUzNDMsIm5iZiI6MTcxNTc0NTA0MywicGF0aCI6Ii8yNTk3Mzc2MS8zMjQxNzE3MDEtMjEzNDdhZDMtMDNkNS00N2RlLWE2MjQtNjcyZWNlNTllMjYxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA1MTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNTE1VDAzNTA0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTU5ODhiNTY5MDc4OGQwMTZhNTE1YmMwMDQxYzc5NDgzYzZjZDQ5ZjA0MWU1M2VmOTVkNWQ2ZTlkZjI4ZjNlZGEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.v4qUs6zTgzMccYgwGFRfQTRqYqD83Honnc159WTW6Ic)
Components:
-
Exploring the attributes of each logical entity in the context of retail companies
-
The details of the schema and developing the snowflake schema
-
Brief Discussions of the data model
Blog Three: From Design to Deployment: Data Warehousing with Azure Synapse Analytics (Part 3: Design and Creation)
![Final Draft (3)](https://private-user-images.githubusercontent.com/25973761/324162415-12131ee8-2585-4080-b3f3-2fa9ed5929f2.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTU3NDUzNDMsIm5iZiI6MTcxNTc0NTA0MywicGF0aCI6Ii8yNTk3Mzc2MS8zMjQxNjI0MTUtMTIxMzFlZTgtMjU4NS00MDgwLWIzZjMtMmZhOWVkNTkyOWYyLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA1MTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNTE1VDAzNTA0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTAyNGE2ZTdkYmYxYTIwODdjNTQ0ZWRjNDc3ODM5NTliYzhlYTgxYmI5MGQ4OWJkZGU3MDlmYjg3ODE1YzI5NTkmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.ymMo-zJgN2w6vEEw_jS1se_a0POS-8FcB3yJf9qThTQ)
Components:
-
Provisioning the Azure Synapse Analytics Workspace with UI and ARM templates
-
Provisioning dedicated SQL pool within Azure Synapse Analytics Workspace
-
Creating the SQL database and tables (facts and dimensions)
-
Loading data into the tables
Blog Four: From Design to Deployment: Data Warehousing with Azure Synapse Analytics (Part 3: Querying the Data Warehouse) (is being written at this moment)
![Final Draft (3)](https://private-user-images.githubusercontent.com/25973761/324183789-7ec03419-2e5a-41ea-a72a-796eb0256366.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTU3NDUzNDMsIm5iZiI6MTcxNTc0NTA0MywicGF0aCI6Ii8yNTk3Mzc2MS8zMjQxODM3ODktN2VjMDM0MTktMmU1YS00MWVhLWE3MmEtNzk2ZWIwMjU2MzY2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA1MTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNTE1VDAzNTA0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTU0YWMxM2NmYWQxMDhlZGM4NGNmODhiMzRlY2FhZDY0ODcwYTQ5YWZiMWIyZmUyNDFlYTE1YWJhMjE5ZTMyYmQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.q31w1zxitBj6NW-AI24RjbvvLOREcCAlH-uRR3RwxTo)
Components:
- Querying the data warehouse
- Showcasing the analytical capabilities of Azure Synapse Analytics
My humble gratitude to my friends and family who are the constant support of my works and endeavors