
Examples on how to submit spark jobs through Apache Livy

An AWS EMR cluster with Apache Livy and Apache Spark running. Access to S3 buckets, where files can be stored and Spark job output can be written The code in this repository has been tested on AWS EMR cluster for jobs which run from 30 min to 24 hrs.


This is the main piece fo code which creates,monitors and deletes livy session


This code is used to create a presto connection and then execute queries


Stores user credentails


Shows an example of how to connect to Livy and submit a job


This is the pyspark code which is called in pyspark_submitjob.py and submitted to a spark livy. The file is stored in a S3 bucket and ouput of this job is also stored iin a bucket.


Example of an airflow dag which cshows an ETL flow of connecting to Presto, where a query is executed and a date is returned. The date is passed to another task via XCOM, where it is passed as an arguement to a Spark job which is then submitted to Livy


This ia a pyspark job which is submitted to Livy via the airflow dag