CloudMigrationAutomator

This project deals Automating the entire flow from moving the necessary data to cloud storage using GCP clinet libraries which is used in java program to upload the input files, Jars, Templates etc to cloud storage to running a series of Spark Jobs on Cloud Dataproc with the help of Workflow template.

Stackdriver is used for email notofcation, monitoring and logs

project Structure

The project is split into mutiple modules:-

Java program - to create bucket, upload files
Conversion - for transforming different file formats like csv, txt to parquet for faster big data processing
Pre Processing - to deal with making all the dataset to a specific type (like person specific or area specific)
Merging - After preprocessing the datasets need to linked together based on some attribute
Model - to generate the points for each person based on some condition

How to run

Go to APIs and Services page of GCP Console and click on credentials
create credentials with servie account keys
Select json and new service account
provide role of project owner
Build Conversion, linking, pre processing and model maven projects in local
Copy input files, jars to specific folders in the main project src/main/resources/filesToUpload
Main project has all the folders necessary folders created
Run the main project with args to be given (input, jars, template location in src/main/resources etc)
verify bucket is created with necessary files uploaded to bucket
Open shell in GCP
create a yaml file and copy the template content to it
Run gcloud dataproc workflow-templates create cloud-migration-demo-template

gcloud dataproc workflow-templates import cloud-migration-demo-template --source cloud-migration-demo.yaml

gcloud dataproc workflow-templates instantiate-from-file --file cloud-migration-demo.yaml

gcloud dataproc workflow-templates delete cloud-migration-demo-template

Snapshots

Cloud Storage Bucket

Cloud Dataflow Template running in Cloud Shell

Stack Driver Dashboard

Alert Policy

Email Notification from StackDriver

anandnandagiri/CloudMigrationAutomator

CloudMigrationAutomator

project Structure

How to run

Snapshots