EMR Serverless

Table of contents

What's included

The repo is to supplement the youtube video on emr severless.

Set up

  1. Create EMR Notebook Role
  • Open IAM and create the IAM role for the EMR notebook using the emr notebook role json
  • Attach AmazonElasticMapReduceEditorsRole policy
  • Attached AmazonS3FullAccess policy
  1. Create EMR Servlerless Execution Role
  1. Create S3 bucket
  • Open S3 console
  • create S3 bucket to use for the demo
  1. Create Folder To use in S3 Bucket
  • Create a scripts folder
  • Create a customers folder (We use this to upload a CSV to)
  • Create a query-results folder
  • Upload files to folders

Main Tutorial

Studio Setup

  1. Naviagte to EMR home from the AWS Console and select EMR Studio from the left handside.

  2. Select Get Started

  3. Select Create Studio

  4. Insert Studio name

  5. Under Networking and Security select your default VPC and 3 public subnets.

  6. Select the EMR Studio role emr-notebook-role-tutorial created duing the Set Up Work stage

  7. Select the S3 bucket created duing the Set Up Work stage. (This will be your own customer bucket name)

  8. Select the Studio access URL

Spark App Setup

  1. Select applications under serverless from the left handside menu

10 Select create application from the top right

  1. Enter a name for the application. Leave the type as Spark and click create application

  2. Click into the application via the name

  3. Click submit job

  4. Name job and select the service role created in the set up steps.

  5. Click Submit Job

  6. job status will go from pending -> running -> success.

Hive App Setup

  1. Create Application from applications

  2. Name and select Hive application

  3. Open hive application

  4. Submit the job

  5. Name the hive job, select hive script (change bucket name in script),and select service role.

  6. Copy and paste Hive config (change bucket name in json).

  7. Submit Job and monintor. Job status will go from pending -> running -> success.

  8. Navigate to Glue databases and click emrdb

  9. Look at table created

  10. Bonus - select data using athena and the created table.

Creators

Johnny Chivers

Useful Links

Enjoy 🤘