This project involves extracting data from a CSV file, transforming it, and loading it into a staging table and eventually into a production database. The goal is to demonstrate proficiency with SQL Server Integration Services (SSIS) by handling various data formats and potential errors that may arise during the ETL process.
- Source: CSV file located at
data.csv
. - Data Flow Task: Reads the CSV file and loads the data into a staging table after running a script for date format inconsistency and converting the data types with a derived column transformation.
- Execute SQL Task: Removes records with NULL UserID's, full duplicates, and non-existant Staging ID's in the staging table.
- Conditional Split: Moves suspicious records (with NULL values or other issues) into the ErrorRecords table, and good records forward towards production.
- OLE DB Production Destination: After merge joining the staging and production table on UserID, applied a conditional split to update existing records and add new records to the production table.
-
Suspicious Data Handling:
- Records identified as suspicious (e.g., duplicates, NULL values, special characters, impossible dates, incorrect emails, etc.) were moved from the staging table to the
ErrorRecords
table for further evaluation.
- Records identified as suspicious (e.g., duplicates, NULL values, special characters, impossible dates, incorrect emails, etc.) were moved from the staging table to the
-
Production Table Updates:
- Complete and valid records from the staging table were successfully transferred to the production table.
- Existing records in the production table were updated with new information.
- New records were added to the production table.