spark_batch_deployment: A Python repository from subratcall

Spark Batch Deployment

This is an example of a big data pyspark pipeline deployment.

Build the Docker Compose
```
 docker-compose build
```
Run Docker Compose
```
 docker-compose up
```
Open Airflow at localhost:8080, go to Admin/Connections and set a new connection:
- conn_id: 's3_default'
- S3 type connection
- Login : your-access-key
- Password : your-secret-key
- Extra: {"region_name":"us-east-1"}
Assuming you've spun up your EMR cluster, go to 'titanic_training_emr' DAG, switch on and run
Then, go to 'titanic_prediction_emr' DAG, switch on and run
Stop your Docker Compose and free resources
```
 docker-compose down
```