This project provides sample datasets and scripts that demonstrate how to manage Slowly Changing Dimensions (SCDs) with Apache Hive's ACID MERGE capabilities. Using ACID MERGE allows all updates to be applied atomically, ensure readers see all updates or no updates, and handles failure scenarios, rather than requiring application developers to build these things themselves.
Also included is data that simulates a full data dump from a source system, followed by another data dump taken later.
The objective is to merge the data using different styles of slowly-changing dimension strategies
These examples cover Type 1, Type 2 and Type 3 updates.
- Hortonworks Data Platform (HDP) 2.6 or later
- OR Apache Hive 2.2 or later
- Clone this repository onto your Hadoop cluster
- Run load_data.sh to stage data into HDFS
- From Hive CLI or beeline, run
hive_type1_scd.sql
,hive_type2_scd.sql
andhive_type3_scd.sql