/topmed-workflows

a place for topmed workflows

Primary LanguageWDLBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

Topmed Workflows

About

The original pipelines were assembled and written by Hyun Min Kang (hmkang@umich.edu) and Adrian Tan (atks@umich.edu) at the Abecasis Lab at the University of Michigan

See the variant calling pipeline and alignment pipeline repositories

Installing dependencies on your local system

1. Cloud SDK (gcloud, gsutil)

If you are on Debian / Ubuntu, follow the instructions on Cloud SDK. After you execute gcloud init the installer asks you to log in and you should respond with Y, head to the provided URL, copy the code and past it to the prompt. After that it will ask you for the cloud project you want to use, so you need to input the GCP Project ID. I picked us-west1-b as the region.

Configuration and credentials file

export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)"
echo "deb http://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
gcloud auth login

After that run gcloud auth application-default --help and follow the instructions. Briefly, run

gcloud iam service-accounts create <pick-a-username>
gcloud iam service-accounts keys create key.json --iam-account=<the-username-you-just-picked>@<your-service-account-name>.iam.gserviceaccount.com

That should print something like

created key [<some long integer>] of type [json] as [key.json] for [<username-you-picked>@<your-service-account-name>.iam.gserviceaccount.com]

You can check in the Google Cloud Platform console under IAM Service Accounts. That account you just created should be in the list.

Next create an environment variable that points to the file key.json:

export GOOGLE_APPLICATION_CREDENTIALS=key.json

Providing credentials to your application

To run workflows of data stored on gcloud you need to set an environment variable GOOGLE_APPLICATION_CREDENTIALS, which holds the path to the credentials file.

2. Broad's execution engine cromwell

cromwell is a Java executable and requires a Java Runtime Engine. Follow the instruction here for a complete installation.

3. Dockstore

For Dockstore to run you need to install the Java Runtime Engine. Find installation instructions for Dockstore here (you need to be logged in to Dockstore).

Running workflows

Provisioning reference files

To copy contents of a SDK bucket to your local system (or a VM) use

gsutil -u [PROJECT_ID] cp gs://[BUCKET_NAME]/[OBJECT_NAME] [OBJECT_DESTINATION]

Checker workflows

A WDL and a JSON file to test checker workflows are in the test_data directory. You need to adjust all paths in the JSON file to the paths on your system before running the checker. It has been tested with cromwell-31.jar. To run the checker workflow for the WDL aligner navigate to respective directory (usually it has checker in its name) and run

java -Dconfig.file=<location_to_file> -jar ~/bin/<cromwell_version>.jar run <checker-workflow>.wdl -i  <checker-workflow>.json