How to run job arrays based on a function of TASK_ID?

Question

How to run job arrays based on a function of TASK_ID?

Closed this issue 6 months ago · 2 comments

I initiallly asked this question in a private email to the author, whose response I would like to share here in case anyone is interested.

Here is my question:

I have a file named my.txt which is shown below:

My first bivariate GREML analysis will be specified as --reml-bivar 1 2, second analysis --reml-bivar 3 1, .... and the 6th --reml-bivar 4 3. I used the attached bash script to submit the jobs without success. I can figure out the issue is related to the specification of TASK_ID in the following to lines:

trait1_L1=$(awk -F ',' -v task=$((TASK_ID)) 'NR==task {print $1}' /storage/***/my.txt)
trait2_L1=$(awk -F ',' -v task=$((TASK_ID)) 'NR==task {print $2}' /storage/***/my.txt)

I did something similar in slurm script, which worked. But this did not work for qsubshcom script.

Here is the unsuccessful qsubshcom script:

#!/bin/bash

script_dir=$(dirname $(readlink -f $0))
logs_dir=${script_dir}/../../logs
results_dir=${script_dir}/../../results
grm_dir=/storage/***/grm

cd ${logs_dir}

trait1_L1=$(awk -F ',' -v task=$((TASK_ID)) 'NR==task {print $1}' /storage/***/my.txt)
trait2_L1=$(awk -F ',' -v task=$((TASK_ID)) 'NR==task {print $2}' /storage/***/my.txt)

command1="gcta \
--reml-bivar ${trait2_L1} ${trait1_L1} \
--reml-bivar-lrt-rg 0 \
--grm ${grm_dir}/GRM_mafgt0.5  \
--pheno ${results_dir}/my.phen \
--out ${results_dir}/$(echo "rg_T${trait2_L1}T${trait1_L1}")"

qsubshcom "$command1" 1 10G myjob 23:00:00 "-queue=***, -array=1-10"

Here is the successful slurm script:

#!/bin/bash

#SBATCH --job-name=myjob
#SBATCH --output=/storage/***/%x_%a.out
#SBATCH --error=/storage/***/%x_%a.err
#SBATCH --chdir=/storage/***
#SBATCH --array=1-10
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=10G


trait1_L1=$(awk -F ',' -v task=$((SLURM_ARRAY_TASK_ID)) 'NR==task {print $1}' my.txt)
echo $trait1_L1
trait2_L1=$(awk -F ',' -v task=$((SLURM_ARRAY_TASK_ID)) 'NR==task {print $2}' my.txt)
echo $trait2_L1

outName="$(echo "rg_lev1_c${trait2_L1}VSc${trait1_L1}")"

gcta \
--reml-bivar ${trait2_L1} ${trait1_L1} \
--reml-bivar-lrt-rg 0 \
--grm ${grm_dir}/GRM_mafgt0.5 \
--pheno ${pheno_dir}/my.phen \
--out ${out_dir}/${outName}

how I could modify the qsubshcom script to get the job running?

Author response:
{TASK_ID} is a variable only existed on the remote worker node, so it can’t obtain its correct value from your local submitter.

Here is my revised script following the author's suggestions:

create a file named test.sh which contains the following:

#!/bin/bash

script_dir=$(dirname $(readlink -f $0))
logs_dir=${script_dir}/../../logs/rgStrModEur
results_dir=${script_dir}/../../results/rgStrModEur
grm_dir=/storage/***/grm

cd ${logs_dir}

trait1_L1=$(awk -F ',' -v task=$((TASK_ID)) 'NR==task {print $1}' /storage/***/my.txt)
trait2_L1=$(awk -F ',' -v task=$((TASK_ID)) 'NR==task {print $2}' /storage/***/my.txt)

gcta \
--reml-bivar ${trait2_L1} ${trait1_L1} \
--reml-bivar-lrt-rg 0 \
--grm ${grm_dir}/GRM_mafgt0.5  \
--pheno ${results_dir}/my.phen \
--out ${results_dir}/$(echo "rg_T${trait2_L1}T${trait1_L1}"))

Then run the following:
qsubshcom "bash test.sh" 1 10G test 23:00:00 "-queue=*** -array=1-10"

My revised script worked fine. However, the job_reports folder and qsub log file are created in the directory where I was when I was submitting my jobs, rather than in the desired logs_dir. How could I request that the directory logs_dir defined above to hold job_reports folder and qsub log file when I entering qsubshcom "bash test.sh" 1 10G test 23:00:00 "-queue=*** -array=1-10" manually in shell?

Answer 1 · 2024-06-25T03:52:40.000Z

Hi @kcstringer ,

Thanks for posting here. I insist to put to issues instead of email: 1. public available; 2. the email address will expire if I change my work (my UQ email will expire in next week).

Glad to hear you almost solve issue. The cluster log is not important usually, so I put to fixed job_reports folder, and also a qsub.$(date).log to save all your command. I use this because easier for the users to find what has been run and the log are in the job_reports. This usually happens to me, I will try multple command, and forget the last command I run, so just look into the latest qsub.$(date).log, the histories are all there.

I will not add another flag to customize this, as the script is used across several group, the changing may annoy users. Hope this won't bother you much.

Regards,
Zhili

Answer 2 · 2024-07-02T08:24:24.000Z

Thank you. This question is resolved and can be closed.