This dataset simulates real-time agricultural sensor data, providing a rich source of information to design and construct a robust data engineering solution. The primary objective of this project is to extract, transform, and load (ETL) the raw sensor data into structured fact and dimension tables in any data warehouse.
Load the raw data from all source tables (SensorDataRaw, WeatherDataRaw, SoilDataRaw, CropDataRaw, PestDataRaw, IrrigationDataRaw, LocationDataRaw) into staging tables within the Data Warehouse Schema. Check code here.
Create the schema for the data warehouse. Cleanse and preprocess the data in the staging tables. Handle missing values, data type conversions, and data quality checks. Transform the data into a format suitable for fact and dimension Tables. Check code here.
Based on the cleaned and transformed data, create dimension Tables for attributes such as Location, Time, Crop Type, Pest Type, and Irrigation Method. Populate these dimension tables. Check code here.
Identify fact Tables based on key performance indicators (KPIs) or metrics of you think would be useful for the data consumers. Create and populate the initial fact tables using the cleaned and transformed data. Check code here.
I also include scripts for exporting the facts table from a SQL query to a csv file directily from the database using the COPY
command.
You can find the scripts in Crop Facts, Check code here.
Define primary keys for dimension Tables. Define foreign keys in fact Tables to establish relationships with dimension Tables. Activities here was also covered after task two was completed. Check code here.
Implement appropriate indexing strategies to optimize query performance. Analyze the query patterns and optimize the data model accordingly. Perform data quality checks and address any anomalies. Check code here.
Document the data model, including schema structure, table relationships, and data dictionary. Conduct final testing to ensure the accuracy and consistency of the data. Prepare the data warehouse for querying by data analysts and scientists. Check code here.
Because of the size of this files, I will provide link to download the files below.
The facts table below contain 100 rows sample due to the cost of the download. Full table data is available on request.
Sub Dimension Files such as Irrigation Method, Crop Type Link.
Crop and Soil Dimension Tables.
Irrigation and Sensor Dimension Tables.
Pest and Weather Dimension Tables.
Sub Dimension Files such as Irrigation Method, Crop Type Link Location and Time Dimension Tables.