This workshop assumes that you run in the AWS Oregon Region (us-west-2)
-
Login to the Event Engine
-
Start the Cloud9 IDE
-
Create and S3 Bucket:
BUCKET_POSTFIX=$(uuidgen --random | cut -d'-' -f1)
aws s3 mb s3://batch-workshop-${BUCKET_POSTFIX}
- Take note of the bucket name you have just created
-
Download the workshop example code:
git clone https://github.com/ubik76/aws-batch-ws
cd aws-batch-ws
-
Edit the shell scripts to be executable
chmod a+x *.sh
-
Create the docker image
docker build -t awsbatch/fetch_and_run .
-
Create an ECR repository
aws ecr create-repository --repository-name fetch-and-run
ECR_REPOSITORY_URI=$(aws ecr describe-repositories --repository-names fetch-and-run --output text --query 'repositories[0].[repositoryUri]')
-
Push the docker image to the repository:
$(aws ecr get-login --no-include-email --region us-west-2)
docker tag awsbatch/fetch_and_run:latest $ECR_REPOSITORY_URI
docker push $ECR_REPOSITORY_URI
-
Configure IAM role
- In the IAM console, choose Roles, Create New Role.
- Under type of trusted entity, choose AWS service then Elastic Container Service.
- For use case, select Elastic Container Service Task, and choose Next: Permissions.
- On the Attach Policy page, type “AmazonS3FullAccess” into the Filter field and then select the check box for that policy. Then, choose Next:Tags
- Add the Tag, for example: Key=Name; Value=workshop
- Enter a name for your new role, for example: batchJobRole, and choose Create Role. You see the details of the new role.
-
Create a simple job script and upload to S3
- Replace with the S3 bucket name you have created before
aws s3 cp myjob.sh s3://<bucket>/myjob.sh
aws s3 cp myjobarray.sh s3://<bucket>/myjobarray.sh
Compute environments can be seen as a computational cluster. They can consist of one or several instance kinds or just the number of cores you would like in it.
To create a compute environment we will follow these steps:
- Select Managed Compute Environment (CE), to let AWS Batch manage the auto-scaling of EC2 resources for you.
- Name your Compute Environment.
- Let Batch create a new service Role so it can manage resources on your behalf.
- Let Batch create a new instance role to allow instances to call AWS APIs on your behalf.
- Select your EC2 key-pair (or none in our case)
Once done scroll down to configure the rest of the CE (please use the default values for the other parameters)
- Provisioning model: SPOT
- Maximum Price: 100
- Allowed Instance Types: optimal
- Allocation Strategy: SPOT_CAPACITY_OPTIMIZED
- Add a tag called "Name" and as a value choose a name for your instances created with Batch
- Then click on Create to build your new Compute Environment.
Now we will setup a Job Queue. This is where you will submit your jobs. Those will be dispatched to the Compute Environment(s) of your choosing by order of priority.
- Chose a name for your queue, for example "test-queue"
- Define a priority (1-500). This defines the priority of a Job Queue when a Compute environment is shared accross Job Queues (for example a Production Job Queue with a priority of 500 and a R&D Job Queue with a priority of 250).
- Select the Compute Environment created previously.
- Then create your Job Queue.
Go the the Job Definition screen and create a new one.
-
Select a job definition name, for example "test-def"
-
Input 5 for the number of attempts before declaring a job as failed.
-
Input 100 for the time between attempts in seconds.
-
Add the job role previously defined for ECS tasks to access the output S3 bucket on your behalf.
-
Add the container image with the repositoryUri generated when creating our ECR repository. If in doubt you can get the URI by running the command below in your terminal:
$(aws ecr describe-repositories --repository-names fetch-and-run --output text --query 'repositories[0].[repositoryUri]')
-
For vCPUs, enter 1. For Memory, enter 500
-
For User, enter “nobody”.
-
Environment Variable
This will tell to the application running in your container where to export data. Use the variable name EXPORT_S3_BUCKET_URL and the value corresponds to the bucket you have previously created.
You have to specify also the BATCH_FILE_S3_URL to your script, for example: BATCH_FILE_S3_URL=s3://batch-workshop-87d7dd41/myjobarray.sh
Finally, you have to specify the type of file: BATCH_FILE_TYPE=script
-
Choose Create job definition.
Now what we configured Batch, let’s take a look at what we have with the following commands
aws batch describe-compute-environments
aws batch describe-job-queues
aws batch describe-job-definitions
-
In the AWS Batch console, choose Jobs, Submit Job.
-
Enter a name for the job, for example: script_test.
-
Choose the latest job definition.
-
For Job Queue, choose the queue you have defined before, for example: test-queue.
-
For Command, enter myjob.sh 60.
aws batch submit-job --job-name my-job --job-queue myqueue --array-properties size=10 --job-definition mydef --container-overrides vcpus=1,memory=50,command=["myjobarray.sh","10"]