AWS Data Pipeline Samples

MsSqlRdsToS3Template is a template to connect to AWS RDS MS SQL and export data to a file in an S3 bucket. To run this template you need to upload sqljdbc4.jar driver to an S3 bucket. The driver can be found here: https://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=11774

DynamoDBTableToRedshiftTemplate loads data from a DynamoDB table to a Redshift table.

MutilpleDependencies is an example when Action1 and Action2 can be run in parallel, but Action3 should wait for the completion both actions.

SupportDW_Create is an example of a Support Data Warehouse creation from historical files saved in an S3 bucket. The process loads D_Calendar, D_Priorities, Products, Analysts, Cases and Logs tables in parallel. At the next stage, D_Products dimensional table is created and the products hierarchy is flatten in the load SQL script, D_Analysts dimensional table is created and analysts historical data are loaded in the slowly changing dimension type 2. F_Cases (fact table) is created and loaded at the last stage, when dimensional data are ready (surrogate keys are used) and logs are available to calculate times a case spent in each status.

SupportDW_Update is a similar to SupportDW_Create but at the first stage the data are loaded from an application tables in MS SQL and logs data are loaded from a DynamoDB table.

Installation

If you never used AWS Data Pipeline before you need to create AWS IAM roles to run the samples using AWS CLI.

$> aws datapipeline create-default-roles
Create the pipelineId by calling the aws data pipeline create-pipeline command.

$> aws datapipeline create-pipeline --name MsSqlRdsToS3Template --unique-id MsSqlRdsToS3Template

You will receive a pipelineId like this.

{
"pipelineId": "df-078827623PVY9KS3XNLM"
}
Download the MsSqlRdsToS3Template.json sample pipeline definition and adjust parameters values in the file to your environment. Or you can provide your parameter values in aws data pipeline put-pipeline-definition command.
Upload and validate your pipeline definition by calling the aws data pipeline put-pipeline-definition command.

$> aws datapipeline put-pipeline-definition --pipeline-id df-078827623PVY9KS3XNLM --pipeline-definition file://MsSqlRdsToS3Template.json

If your pipeline definition is valid you will receive a message like this. Otherwise, correct the file and repeat the command.
{
"validationErrors": [],
"errored": false,
"validationWarnings": []
}
Activate the pipeline by calling the aws datapipeline activate-pipeline command. This will cause the pipeline to start running.

$> $> aws datapipeline activate-pipeline --pipeline-id df-078827623PVY9KS3XNLM

There is no output in this command
Check the status of your pipeline

>$ aws datapipeline list-runs --pipeline-id df-078827623PVY9KS3XNLM

KaterynaD/aws_data_pipeline_samples

AWS Data Pipeline Samples

Installation

$> aws datapipeline create-default-roles

$> aws datapipeline create-pipeline --name MsSqlRdsToS3Template --unique-id MsSqlRdsToS3Template

$> aws datapipeline put-pipeline-definition --pipeline-id df-078827623PVY9KS3XNLM --pipeline-definition file://MsSqlRdsToS3Template.json

$&gt; $&gt; aws datapipeline activate-pipeline --pipeline-id df-078827623PVY9KS3XNLM

>$ aws datapipeline list-runs --pipeline-id df-078827623PVY9KS3XNLM

$> $> aws datapipeline activate-pipeline --pipeline-id df-078827623PVY9KS3XNLM