Batch workshop for HPC

This workshop assumes that you run in the AWS Oregon Region (us-west-2)

Workshop setup

Prepare the Docker image

  • Download the workshop example code:

    • git clone https://github.com/ubik76/aws-batch-ws
    • cd aws-batch-ws
  • Edit the shell scripts to be executable

    • chmod a+x *.sh
  • Create the docker image

    • docker build -t awsbatch/fetch_and_run .
  • Create an ECR repository

    • aws ecr create-repository --repository-name fetch-and-run
    • ECR_REPOSITORY_URI=$(aws ecr describe-repositories --repository-names fetch-and-run --output text --query 'repositories[0].[repositoryUri]')
  • Push the docker image to the repository:

    • $(aws ecr get-login --no-include-email --region us-west-2)
    • docker tag awsbatch/fetch_and_run:latest $ECR_REPOSITORY_URI
    • docker push $ECR_REPOSITORY_URI
  • Configure IAM role

    • In the IAM console, choose Roles, Create New Role.
    • Under type of trusted entity, choose AWS service then Elastic Container Service.
    • For use case, select Elastic Container Service Task, and choose Next: Permissions.
    • On the Attach Policy page, type “AmazonS3FullAccess” into the Filter field and then select the check box for that policy. Then, choose Next:Tags
    • Add the Tag, for example: Key=Name; Value=workshop
    • Enter a name for your new role, for example: batchJobRole, and choose Create Role. You see the details of the new role.
  • Create a simple job script and upload to S3

    • Replace with the S3 bucket name you have created before
    • aws s3 cp myjob.sh s3://<bucket>/myjob.sh
    • aws s3 cp myjobarray.sh s3://<bucket>/myjobarray.sh

Configure AWS Batch

Create Your Compute Environment

Compute environments can be seen as a computational cluster. They can consist of one or several instance kinds or just the number of cores you would like in it.

To create a compute environment we will follow these steps:

  • Select Managed Compute Environment (CE), to let AWS Batch manage the auto-scaling of EC2 resources for you.
  • Name your Compute Environment.
  • Let Batch create a new service Role so it can manage resources on your behalf.
  • Let Batch create a new instance role to allow instances to call AWS APIs on your behalf.
  • Select your EC2 key-pair (or none in our case)

Once done scroll down to configure the rest of the CE (please use the default values for the other parameters)

  • Provisioning model: SPOT
  • Maximum Price: 100
  • Allowed Instance Types: optimal
  • Allocation Strategy: SPOT_CAPACITY_OPTIMIZED
  • Add a tag called "Name" and as a value choose a name for your instances created with Batch
  • Then click on Create to build your new Compute Environment.

Setup a Job Queue

Now we will setup a Job Queue. This is where you will submit your jobs. Those will be dispatched to the Compute Environment(s) of your choosing by order of priority.

  • Chose a name for your queue, for example "test-queue"
  • Define a priority (1-500). This defines the priority of a Job Queue when a Compute environment is shared accross Job Queues (for example a Production Job Queue with a priority of 500 and a R&D Job Queue with a priority of 250).
  • Select the Compute Environment created previously.
  • Then create your Job Queue.

Setup a Job Definition

Go the the Job Definition screen and create a new one.

  • Select a job definition name, for example "test-def"

  • Input 5 for the number of attempts before declaring a job as failed.

  • Input 100 for the time between attempts in seconds.

  • Add the job role previously defined for ECS tasks to access the output S3 bucket on your behalf.

  • Add the container image with the repositoryUri generated when creating our ECR repository. If in doubt you can get the URI by running the command below in your terminal:

    $(aws ecr describe-repositories --repository-names fetch-and-run --output text --query 'repositories[0].[repositoryUri]')

  • For vCPUs, enter 1. For Memory, enter 500

  • For User, enter “nobody”.

  • Environment Variable

    This will tell to the application running in your container where to export data. Use the variable name EXPORT_S3_BUCKET_URL and the value corresponds to the bucket you have previously created.

    You have to specify also the BATCH_FILE_S3_URL to your script, for example: BATCH_FILE_S3_URL=s3://batch-workshop-87d7dd41/myjobarray.sh

    Finally, you have to specify the type of file: BATCH_FILE_TYPE=script

  • Choose Create job definition.

Describe your environment

Now what we configured Batch, let’s take a look at what we have with the following commands

  • aws batch describe-compute-environments
  • aws batch describe-job-queues
  • aws batch describe-job-definitions

Run your first job

  • In the AWS Batch console, choose Jobs, Submit Job.

  • Enter a name for the job, for example: script_test.

  • Choose the latest job definition.

  • For Job Queue, choose the queue you have defined before, for example: test-queue.

  • For Command, enter myjob.sh 60.

Try from the command line:

  • aws batch submit-job --job-name my-job --job-queue myqueue --array-properties size=10 --job-definition mydef --container-overrides vcpus=1,memory=50,command=["myjobarray.sh","10"]