AWS Concurrent Data Orchestration Pipeline EMR Livy
This code demonstrates the architecture featured on the AWS Big Data blog (https://aws.amazon.com/blogs/big-data/ ) which creates a concurrent data pipeline by using Amazon EMR and Apache Livy. This pipeline is orchestrated by Apache Airflow.
Description of the project folders
cloudformation
This folder contains the cloudformation template that spins up the Airflow infrastructure.
dags/airflowlib
This folder contains reusable code for Amazon EMR and Apache Livy.
dags/transform
This folder contains sample transformation scala code which transforms the movielens data files from csv to parquet.
dags/movielens_dag.py
This script contains the code for the DAG definition. It basically defines the Airflow pipeline.
License
This library is licensed under the Apache 2.0 License.