- Install Python SDK and Command Line Tools
pip3 install dxpy
eval "$(register-python-argcomplete dx|sed 's/-o default//')"
- For MacOX with zsh, enable tab completion by running the following command
autoload -Uz compinit && compinit
autoload bashcompinit && bashcompinit
eval "$(register-python-argcomplete dx|sed 's/-o default//')"
The datset contains WGS sequencing results from 200,025 samples with 2*151 reads.
- All of the sample meta files can only access by JupyterLab,
dx cat
command is not allowed to view those file at the local termianl.
- The strWAS paper:
- Remove potential DNA contamination.
- Remove withdrawn participants, indicated by non-positive IDs in the sample file as well as by IDs in email communications.
- 487,279 individuals remained at this step
- QC file, subsetted the non-withdrawn individuals (take only White-British populationi)
- The Nature 150,119 UKBiobank samples paper
- 13 individuals were sequenced in duplicated.
- 11 individuals were withdrew consent from time of sequencing to time of analysis.
- 135 people don't have microarray data.
-
Install tools for local access DNAnexus.
-
Prepare files for running HipSTR
- Get files names that stored in DNAnexus,
dx find data --name "<file_name_pattern>" --path "<path_want_to_check>" > filename_list.txt
. - UKBiobank use reference genome GRCh38 but didn't specify the detailed verion. Based on the header of cram files, pick this one GRCh38.
- Use
dx upload <local_directory> --destination <cloud_pathi>
.
The command may not function with large file, can use wdl to download file or use the Upload Agent recommended by DNAnexus (haven't try this one yet).
- Use
- DNAnexus use AWS for computation job. By default, DNAnexus will use spot instance with small memory (need to confirm, can't find the documentation).
- To compile wdl file to workflow use
java -jar dxCompiler-2.10.4.jar compile <your_wdl_file> -project <ukb_project_id> -folder <directory_to_storage_on_DNAnexus>
. This will output a string "workflow-xxxx" that needed for running workflow.
- If the
<directory_to_storage_on_DNAnexus>
does not exist, it will create for you. For example, /test/ folder not exist, it will create a /test/ folder under the root of your project and put the "workflow-xxxx" and other files in there. - Use
dx ls --brief <directory_to_storage_on_DNAnexus>
to check workflow ID if forgot,--long
flag to diplay full path and file ID. - Use
--streamFiles [all, none, perfile(default)]
to mount data instead of download. Forperfile
needparameter_meta
section in the wdl file.
- To run workflow use
dx run <workflow-xxxx> -y -f input.json --destination <path_to_storage>
.
- To generate an
input.json
file, usedx run <workflow-xx>
in an interactive mode and get the template forinput.json
. Alternatively,java -jar dxCompiler.jar compile <your_wdl_file> -project project-xxxx -folder <directory_to_storage_on_DNAnexus> -inputs input.json
will convert Crowell JSON format input file into DNAnexus format during compilinginput.dx.json
.awk '{print "{ \n","\"$dnanexus_link\":",$NF, "\n},"}' random_100_cram_ab_15G.txt | tr '()' '""' | less
to get array of inputs.
- If
--destination
not specified, dx run will output results to root directory by default. - If
<path_to_storage>
is not exist, it will create for you. To create<path_to_storage>
manually usedx mkdir -p <path_to_storage>
. - Add
--name <job_name>
to specify job name, if not specified, it will use the workflow name as job name. - Using
--head-job-on-demand
or set--priority
to specify on-demand (High
) or spot(low
) instance. Or,dx run app-xxxx --extra-args '{"executionPolicy":{"spotOnly":true}}'
.
- How to set up batch run
- Executions contains both analysis and Jobs (maybe wrong):
- Analyses are executions of workflows and consist of one or more app(let)s
dx find executions
to return 10 most recent executions in the current project.dx find analyses
to return top-level analyses, not any of the jobs.
- How to check the job status using "Analysis ID: analysis-GGGfFFjJv7B1FFF291FPfFx5" that ouput call dx run.
- If the docker image is build on macbook with M1 CPU, use
docker buildx build --platform linux/amd64
. If not, it can't run on DNAnexus. - For test run, redirect stdout or stderr to a file is not recommended. DNAnexus wouldn't be able to transfer those error/infor if job failed, which makes the debugging very difficult. But, once test run works, it is better to redirect stdout and stderr as a output because the online log sometime wouldn't be fully display.
- If tools need other files that not specified in the options, make sure include those file in the
Input
. - To store docker images on DNAnexus, use
dx-docker
command: - Cache docker images:
- Using
docker save
anddocker load
docker save
create a tarball image that can be included in DNAnexus and in the app run docker load
- Issues:
- Not sure whether this is the correct command to store images.
- The command depend on
docker2aci
package, which is achrived and have trouble on installation.
-
To launch a JupyterLab session, select
JupyterLab
tab from theTOOLS
menue and click on theNew JupyterLab
on the top right corner. Specify the project, instance type and other running information then start the session. After the session started, click on theOpen
button to open the JupyterLab in browser. -
There are two types of notebooks: Local vs DNAnexus
- The main difference between the Local and DNAnexus is files (include ipynb and datas) in DNAnexus notebook will be kept after close of JupyterLab while files in Local notebook will lost.
- To access data in your project from notebook:
- For reading the files multiple times, use
dx download
to download to current instance. - For reading the content file once or only small fraction of file's content, reading the content of iles in
/mnt/project
folder, which dynamically fetches the content from DNAnexus platform.
- For reading the files multiple times, use
- Notes, not sure about the difference, but the documentation mentioned
/mnt/project/
directory involve more api calls.
- Efficent way to get input?
- How to trouble shoot?
- How to avoid run duplicated jobs?
- dxWDL: provide extra information about dxWDL file documentation.
- WDL: provide specification for WDL.
- dxCompiler: provide more documentation about dxCompiler, like
-extras
,parameter_meta
. See also the DNAnexus websit - Other information from the DNAnexus like billing, dx command,DNAnexus websit.
- Check core numbers in the instance
- Where to change execution name using dx run.
- check SSH