Big Data Processing Using Spark & Airflow

Spark

  • Deployed Spark using Docker
  • Defined Spark Sessions, created views & executed SQL queries on flights' departure delay data

Airflow

  • Deployed Airflow in a Docker container
  • Created a workflow using directed acyclic graphs (DAGs) & tasks for executing simple python function
  • Inspected logs to make sure DAG ran successfully