Connect to it and create the schema objects and load the data into the MySQL database.
MySQL Script Location: https://github.com/nmukerje/glue/blob/master/scripts/salesdb.sql
$> mysql -f -u mysqldb -h <mysqldb-host>.rds.amazonaws.com -p mysqldb < salesdb.sql
Redshift
Launch a Redshift instance in the same VPC.
Create the Schema objects
Redshift script location [https://github.com/nmukerje/glue/blob/master/scripts/redshift-schema.sql]
Use SQLWorkbench/J to execute the above script in Redshift. Redshift JDBC driver is here:[http://docs.aws.amazon.com/redshift/latest/mgmt/configure-jdbc-connection.html]
Steps
Create Database connections to the MySQL Database.
* Crawler name : demo-mysql
* IAM role : <Glue Default Role>
* Data store : JDBC
* Connection : demo-mysql
* Include path : salesdb/%
* Frequency : Run on demand
* Database : Add Database -> demo-mysql
Create Crawler to the Redshift Database
* Crawler name : demo-mysql
* IAM role : <Glue Default Role>
* Data store : JDBC
* Connection : demo-mysql
* Include path : salesdb/%
* Frequency : Run on demand
* Database : Add Database -> demo-mysql
Create Crawler to the Redshift Database.
* Crawler name : demo-redshift
* IAM role : <Glue Default Role>
* Data store : JDBC
* Connection : demo-redshift
* Include path : redshiftdb/public/%
* Frequency : Run on demand
* Database : Add Database -> demo-redshift
Run Crawlers on the MySQL and Redshift Databases.
Crawlers should create 7 tables for MySQL and 5 tables for Redshift in the respective databases.
Open the Zeppelin notebooks below:
GlueNotebook2-MySQL2Redshift.json : Demonstrates a MySQL to Redshift load
Uses Glue DynamicFrames
Uses joins to denormalize MySQL tables
Uses User Defined Functions for computed columns
GlueNotebook3-MySQL2Redshift-Incr : Demonstrates a MySQL to Redshift incremental load by timestamp
Uses external checkpointing to DynamoDB
Uses Spark SQL for transforms and computed columns
Uses Spark Redshift package for inserting data to Redshift
Vacuums Redshift table as a post load statement
GlueNotebook5-Redshift2S3.json : Demonstrates a Redshift to S3 unload
Uses Spark Redshift package UNLOAD from Redshift
Writes data to parquet in S3 partitioned by chosen columns