- The image we are testing with currently is a m4.xlarge
- The disk where work is done must be encrypted:
- when launching the instance, add a volume and ensure "Encrypted" is selected
- when setting up the workflow, we'll ensure the directories being used are located on that disk
- Security rules used on this testing box: ssh from 206.108.127.0/24
sudo mkfs.ext4 /dev/xvdb
sudo mount /dev/xvdb /mnt
cd /mnt
sudo chmod 777 .
mkdir datastore
mkdir workflows
sudo ln -s /mnt/datastore /datastore
sudo ln -s /mnt/workflows /workflows
curl -sSL https://get.docker.com/ | sudo sh
sudo usermod -aG docker ubuntu
# log out then back in!
exit
Progress is tracked by the workflow moving .json files to and from folders using Git. Each .json file represents a job. The workflow looks for jobs in the folder queued-jobs
and performs git commands to move each job to subsequent other folders (and push changes to the repo) according success or failure of a given step for that job.
Further description of the Git Order System and the home of the tracking folders are at https://github.com/ICGC-TCGA-PanCancer/s3-transfer-operations
The workflow uses the Collaboratory CLI to upload to the backend storage https://github.com/CancerCollaboratory/cli
We store a tar file in s3://oicr.private.images that contains the following files:
a) github.pem (A ssh key for commiting to github in an automated fashion)
b) client.jks (A keystore used by the Collaboratory CLI tool)
c) s3cfg (An s3cmd config file used to aggregate Collaboratory log files.)
d) token (A text file containing an auth token for the Collaboratory tool.)
Get this file from the S3 bucket:
Install s3cmd and configure with your credentials (interactive).
sudo apt-get install s3cmd
s3cmd --configure
Download "store-and-forward1.1.tar.gz" and unpack
cd /home/ubuntu
s3cmd get s3://oicr.docker.private.images/store-and-forward1.1.tar.gz
mkdir /home/ubuntu/.gnos/
tar xvzf store-and-forward1.1.tar.gz
mv /home/ubuntu/store-and-forward/* /home/ubuntu/.gnos/
Copy your gnos pem key to /home/ubuntu/.gnos/gnos.pem
sudo apt-get install openjdk-7-jre-headless
cd /workflows
wget https://seqwaremaven.oicr.on.ca/artifactory/seqware-release/com/github/seqware/seqware-distribution/1.1.1/seqware-distribution-1.1.1-full.jar
s3cmd get s3://oicr.workflow.bundles/released-bundles/Workflow_Bundle_StoreAndForward_1.0.9_SeqWare_1.1.0.zip
java -cp seqware-distribution-1.1.1-full.jar net.sourceforge.seqware.pipeline.tools.UnZip --input-zip Workflow_Bundle_StoreAndForward_1.0.9_SeqWare_1.1.0.zip --output-dir /workflows/Workflow_Bundle_StoreAndForward_1.0.9_SeqWare_1.1.0
docker run -h master -it -v /var/run/docker.sock:/var/run/docker.sock -v /home/ubuntu/.gnos:/home/ubuntu/.gnos -v /datastore:/datastore -v /workflows:/workflows -v <your local ini file>:/workflow.ini seqware/seqware_whitestar_pancancer:1.1.1 bash -c "seqware bundle launch --ini /workflow.ini --dir /workflows/Workflow_Bundle_StoreAndForward_1.0.9_SeqWare_1.1.0/ --engine whitestar --no-metadata"
Get a copy of template.ini
found at https://github.com/ICGC-TCGA-PanCancer/s3-transfer-operations/blob/master/scripts/template.ini and modify the field collabToken
with the access token that Vitalii provided to you.
Generate the .ini file using json2ini.py
found at https://github.com/ICGC-TCGA-PanCancer/s3-transfer-operations/blob/master/scripts/json2ini.py
The script requires pystache:
sudo apt-get install python-pip
sudo pip install pystache
Select a .json file from https://github.com/ICGC-TCGA-PanCancer/s3-transfer-operations/tree/master/testing/queued-jobs
and use the script json2ini.py
found at https://github.com/ICGC-TCGA-PanCancer/s3-transfer-operations/blob/master/scripts/json2ini.py then run:
python json2ini.py [input json file] [template file] [output folder]