Issue finding busco when running Transpi with slurm / sbatch
infinity01 opened this issue · 24 comments
Hello!
I was able to successfully run TransPi when using slurm's srun (interactively), but I'm having trouble getting busco to run when submitting the job with sbatch.
The error:
.command.sh: line 4: busco: command not found
The command it is trying to run is:
#!/bin/bash -ue
echo -e "\n-- Starting BUSCO --\n"
busco -i Cprol_R.Trinity.fa -o Cprol_R.Trinity.bus4 -l /cm/shared/apps/TransPi//DBs/busco_db/metazoa_odb10 -m tran -c 8 --offline
echo -e "\n-- DONE with BUSCO --\n"
cp Cprol_R.Trinity.bus4/short_summary.*.Cprol_R.Trinity.bus4.txt .
cp Cprol_R.Trinity.bus4/run_*/full_table.tsv full_table_Cprol_R.Trinity.bus4.tsv
Do you have an suggestions on any variables to set in the slurm script so that it is able to find busco properly?
Thank you so much!
Hello @infinity01,
If you want to deploy TransPi using SLURM you need to configure this with the specifications of your system. In the nextflow.config
file that is generated for you in the precheck, you will see a section called profiles
(L254-L286). There you can add a profile for your system. Currently, I have one there as an example for my local system (see here).
Nextflow will handle this for you. So if you want to use only SLURM as the job scheduler and do not have any other requirements then you can have that section like this:
mySlurm {
process {
executor='slurm'
}
This will use the CPU and RAM info from the process labels in L195-L229. If you need to specify other requirements to SLURM (e.g. queue, partition, etc.) you can do so using the clusterOptions
.
mySlurm {
process {
executor='slurm'
clusterOptions='--partition=big_node --qos=low'
}
This will make the entire TransPi run (i.e. all processes) to be submitted using SLURM. Nextflow will handle job submission for you, no need to use sbatch
. Just add the -profile mySlurm
when calling TransPi.
Last, are you running TransPi using containers (docker or singularity) or using the TransPi conda environment created by the precheck?
Let me know if you have any other doubt.
Best,
Ramon
Thank you so much for the clarification Ramon! I submitted it to our slurm partition successfully.
One concern is that if I accidently sign out of my SSH session, it will stop the job from running (that's why i was trying to use sbatch). Would that happen in this case ?
We are using the conda environment (myconda) by the way.
Hi Ramon,
For some reason its still erroring out on the busco command not found. Any thoughts?
It worked before when I submitted the job with slurm's srun so not sure if there's any difference..
Is the busco command a binary?
I'm not able to find it in the TransPi conda environment's bin folder: .../anaconda/2020.02/envs/TransPi/bin/
Thanks again!
Error executing process > 'busco4_tri (Cprol_R)'
Caused by:
Process `busco4_tri (Cprol_R)` terminated with an error exit status (127)
Command executed:
echo -e "\n-- Starting BUSCO --\n"
busco -i Cprol_R.Trinity.fa -o Cprol_R.Trinity.bus4 -l /cm/shared/apps/TransPi//DBs/busco_db/metazoa_odb10 -m tran -c 8 --offline
echo -e "\n-- DONE with BUSCO --\n"
cp Cprol_R.Trinity.bus4/short_summary.*.Cprol_R.Trinity.bus4.txt .
cp Cprol_R.Trinity.bus4/run_*/full_table.tsv full_table_Cprol_R.Trinity.bus4.tsv
Command exit status:
127
Command output:
-- Starting BUSCO --
Command error:
.command.sh: line 4: busco: command not found
Work dir:
/TransPi_files/Cprol/work/c6/4ec6e5b8c066bd0e71a251cb093249
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
Hello,
You should see a conda env for busco4 when you type conda info -e
. If this is not there then it was not created successfully. To quickly solve this run the following:
conda create -n busco4 -c conda-forge -c bioconda busco=4.1.4=py_0 -y
Let me know if this solves the issue.
Best,
Ramon
Unfortunately I'm still getting the same error (not able to find busco command) after installing the busco4 conda env.
Is it supposed to change conda environments half way through?
Very odd. Can I see the output of conda info -e
? Are you running using the TransPi conda env (--myConda
), right? What is the entire command you use when calling the pipeline?
The entire command is:
nextflow run /cm/shared/apps/TransPi/TransPi.nf --all --maxReadLen 100 --k 25,41,57,67 --reads '/TransPi_files/Cprol/Cprol_R[1,2].fastq.gz' --profile conda --myConda -profile RDAC -resume
The conda env's are:
$ conda info -e
# conda environments:
#
base /cm/shared/compilers/anaconda/2020.02
TransPi * /cm/shared/compilers/anaconda/2020.02/envs/TransPi
busco4 /cm/shared/compilers/anaconda/2020.02/envs/busco4
I think I know what is the issue. You need to have the busco4 env info in the nextflow.config
. Either you rerun the precheck so it can generate a new nextflow.config
or you can add the PATH of the busco4 conda env to the line 67 of the nextflow.config
. It should look like this:
cenv="/cm/shared/compilers/anaconda/2020.02/envs/busco4"
Let me know if this solves the issue.
Sorry I forget to mention I updated that last night, but it still gave me that error.
//busco4 conda env
cenv="/cm/shared/compilers/anaconda/2020.02/envs/busco4"
Maybe run the precheck again just in case?
Yes, rerun the precheck and try again.
Hmm, that still didn't work.
I'm thinking to maybe add the bin folder of both conda env's the the PATH environment variable ?
I never had this issue before. So the new nextflow.config
has the PATH of cenv? Try logging out and having all the conda env deactivate before calling TransPi.
In the meantime, you can add the bin to the PATH env so you can continue working. But this is very odd since nextflow will take the PATH of the busco4 conda env and it will activate it automatically. I'll do more test and see if I can find the issue.
Now its saying:
BUSCO must be installed before it is run. Please enter 'python setup.py install (--user)'. See the user guide for more information.
Thinking to just reinstall everything at this point...
Command executed:
echo -e "\n-- Starting BUSCO --\n"
busco -i Cprol_R.Trinity.fa -o Cprol_R.Trinity.bus4 -l /cm/shared/apps/TransPi/DBs/busco_db/eukaryota_odb10 -m tran -c 8 --offline
echo -e "\n-- DONE with BUSCO --\n"
cp Cprol_R.Trinity.bus4/short_summary.*.Cprol_R.Trinity.bus4.txt .
cp Cprol_R.Trinity.bus4/run_*/full_table.tsv full_table_Cprol_R.Trinity.bus4.tsv
Command exit status:
1
Command output:
-- Starting BUSCO --
BUSCO must be installed before it is run. Please enter 'python setup.py install (--user)'. See the user guide for more information.
-- DONE with BUSCO --
Command error:
cp: cannot stat ‘Cprol_R.Trinity.bus4/short_summary.*.Cprol_R.Trinity.bus4.txt’: No such file or directory
I am testing locally and it is working fine for me. Let's try that, reinstall the tools.
1- Erase the conda env
conda remove -n TransPi -y --all
conda remove -n busco4 -y --all
conda clean -y --all
2- Rerun the precheck.
Remember to take out the bin directory from the PATH env and source the .bashrc
. Do you have the config file in the same directory as the main script? It seems like the config is not working properly or nextflow cannot use it properly. Let me know how it goes.
Hi Ramon,
I installed and re-ran TransPi from scratch and I'm still getting the busco command not found error during the "busco4_tri" step.
However, if I change conda environments to busco4, I am able to find it with the "which busco" command.
Are you supposed to load any conda environments prior to executing TransPi with nextflow ?
Right now I am loading TransPi conda environment prior to executing it.
I confirmed the conda environments for TransPi and busco4 are installed with the "conda info -e" command.
Also both busco4 and TransPi environments are set correctly in the nextflow.config:
// PATH to conda installation from precheck. Leave blank is precheck was not used or you will use comntainers
myCondaInstall="/cm/shared/compilers/anaconda/2020.02/envs/TransPi"
//busco4 conda env
cenv="/cm/shared/compilers/anaconda/2020.02/envs/busco4"
I am using slurm on a clustered setup so it is executing each step of TransPi on another node.
Any other thoughts?
Thanks again!
Something went wrong. Check error message below and/or log files.
Error executing process > 'busco4_tri (Cprol_R)'
Caused by:
Process `busco4_tri (Cprol_R)` terminated with an error exit status (127)
Command executed:
echo -e "\n-- Starting BUSCO --\n"
busco -i Cprol_R.Trinity.fa -o Cprol_R.Trinity.bus4 -l /cm/shared/apps/TransPi/DBs/busco_db/eukaryota_odb10 -m tran -c 20 --offline
echo -e "\n-- DONE with BUSCO --\n"
cp Cprol_R.Trinity.bus4/short_summary.*.Cprol_R.Trinity.bus4.txt .
cp Cprol_R.Trinity.bus4/run_*/full_table.tsv full_table_Cprol_R.Trinity.bus4.tsv
Command exit status:
127
Command output:
-- Starting BUSCO --
Command error:
.command.sh: line 4: busco: command not found
Work dir:
/TransPi/TransPi_files/Cprol/work/e5/3630925441d6d889d73349bb00b04a
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
Hello @infinity01,
I did some tests in a cluster from the university and in a virtual machine and it is working fine for me. I do not have any issues with busco.
Nextflow handles the activation of the conda environments for you, no need to activate them before calling the pipeline. Can you try deactivating the TransPi environment before running the pipeline?
@infinity01 I just noticed that you are using --profile
instead of -profile
(only one dash is required since it is a nextflow built in function). Since you have the conda of TransPi activated the other programs run fine (are available in the PATH
). But since busco4 is a separate conda environment (due to conflicts in versions) that is why nextflow cannot find it. Apologies I did not see this before. Can you try using -profile
? Let me know.
nextflow run /cm/shared/apps/TransPi/TransPi.nf --all --maxReadLen 100 --k 25,41,57,67 --reads '/TransPi_files/Cprol/Cprol_R[1,2].fastq.gz' -profile conda,RDAC --myConda -resume
Also, you can provide various profiles by using commas. See example above
Ah, good find! It always gets confusing which options use one or two hyphens.
I resubmitted it with -resume but for some reason its rerunning the entire thing again.
I'll just let it do its thing and let you know tomorrow.
Thanks you so much!
Hi Ramon,
It is working as expected. Thank you so much again for your time.
Great! I'll close the issue now. Anything let me know.