/infrastructure-AWS-EMR

Scripts to setup Amazon EMR cluster

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Amazon EMR with Dask

Scripts and documentation on how to deploy an Amazon EMR cluster with Dask for running the high spatial resolution modelling can be found here.

  • spark-emr-hello-world is a quick-start example on setting up an Amazon EMR cluster with Spark using AWS CLI and running a test job.
  • dask-emr-hello-world is a quick-start example on setting up an Amazon EMR cluster with Dask using AWS CLI and running a test job.
  • conda-pack Setting up an Amazon EMR cluster with pre-installed packages
  • dask-ecs Setting up Dask with Amazon ECS and Docker
  • dask-emr-terraform Terraform module for Amazon EMR with Dask
  • cgc-test contains a test analysis notebook and illustrates how to set up the environment.
aws ce get-cost-and-usage --time-period Start=2021-03-01,End=2021-03-31 --metrics "BlendedCost" "UnblendedCost" "UsageQuantity" --granularity MONTHLY

Example pricing:

  • The price for an on-demand m5.xlarge EC2 instance is 0.192 USD/hour. There is an additional EMR price of 0.048 USD/hour. For a 3-node EMR cluster, the total price is 0.72 USD/hour or 17.28 USD/day.
  • The price for an on-demand m5.24xlarge EC2 instance is 4.608 USD/hour. There is an additional EMR price of 0.27 USD/hour. For a 3-node EMR cluster, the total price is 14.63 USD/hour or 351.22 USD/day.