Yelp/mrjob

support docker on EMR 6.x AMIs

Closed this issue · 3 comments

EMR now has explicit support for Docker on 6.x AMIs. mrjob should integrate with this

Probably should add the following options:

  • docker_image: name of the docker image
  • docker_registry: hostname of the docker registry (defaults to centos, which is Docker Hub)
  • docker_client_config: the config file on HDFS, needed to log into ECR
  • docker_mounts: a list of volumes to mount inside docker. Defaults to ['/etc/passwd:/etc/passwd:ro']

Magic mrjob can provide:

  • adding the right lines to emr_config
  • adding the right variables to cmdenv
  • copying the client config into HDFS

We'll need to bootstrap mrjob in setup if we want to be able to access it inside Docker.