This demo offers an example of moving workloads from EMR to CDP. It automates the following:
- Creating an EMR cluster and uploading Worldwide Bank data and setting up Hive testbench (via nifi flow)
- Downloading Hive one Tez application logs from EMR and upload them to WXM (via nifi flow)
- Creating a CDP cluster and setting up Hive testbench (via nifi flow)
- Downloading EMR glue information and generating Hive DDL in CDP
- AWS cli: Configure AWS cli with your credentials and region
- CDP cli: Configure CDP cli with your credentials and region
- WXM:
- Docker
- Uploader and WXM (uploader.tar and altus.json), see wiki
- Local nifi instance
git clone https://github.com/paulvid/emr_to_cdp.git
- Add WXM files to local path:
cp [local_path]/altus.json [local_path_to_clone]/scripts/wxm/
cp [local_path]/uploader.tar [local_path_to_clone]/scripts/wxm/
- Upload worldwidebank data to S3 bucket (follow the same directory structure)
- Add template to your nifi instance
- Setup variables with your data
Paul Vidal - Initial work - LinkedIn