Create A Python Enviroment with All Packages Installed for EMR

Prerequisite

Docker Desktop for Mac

Create dev-packages.tar.gz

  • add a package edit pyproject.toml OR uv add <package>

  • build the package on local machine

# PY in [py3.7, py3.11]
# DOCKER_DEFAULT_PLATFORM in [linux/amd64, linux/arm64]

# EMR 6.x
make build-image   PY=py37 DOCKER_DEFAULT_PLATFORM=linux/amd64
make build-package PY=py37 DOCKER_DEFAULT_PLATFORM=linux/amd64

# EMR 7.x
make build-image   PY=py311 DOCKER_DEFAULT_PLATFORM=linux/amd64
make build-package PY=py311 DOCKER_DEFAULT_PLATFORM=linux/amd64
  • On EMR as user hadoop
# EMR 6.x
tar zxvf dev-packages-py37-x86_64.tar.gz

# EMR 7.x
tar zxvf dev-packages-py311-x86_64.tar.gz