Home directory does not have enough space for ngless
liuxianghui opened this issue · 8 comments
My home directory in the HPC has only 5G space. I managed to setup my conda installation directory to my scratch directory which has enough space. I do not have problem with conda installation of ngless. However, when I tried to run the ngless gut-demo.ngl. I found that error. It complained that my ~/.local/share/ngless/data/References/hg19 got createDirectory: permission denied (Disk quota exceeded) problem.
However shall I setup to avoid that test dataset to be downloaded to my home directory.
Hi @liuxianghui you can try replacing ~/.local/share/ngless
with a symbolic link to a location where you have a bigger quota.
Alternatively, you can use the user-directory
or user-data-directory
configuration setting to directly specify where to store these data files. See https://ngless.embl.de/configuration.html
I tried to set the configuration file but seems not working. Could you kindly help to show an example configuration file?
Here is the configuration file:
here is the error message.
[Tue 03-11-2020 18:57]: Script OK. Starting interpretation...
[Tue 03-11-2020 18:57] Line 8: lock1: Obtained lock file: 'ngless-locks/2d8743f8/SAMN05615096.short.lock'
[Tue 03-11-2020 18:57] Line 8: Writing stats to 'ngless-stats/2d8743f8/SAMN05615096.short'
[Tue 03-11-2020 18:57] Line 10: load_mocat_sample found paired-end sample 'SAMN05615096.short/SRR4052021.pair.1.fq.gz' - 'SAMN05615096.short/SRR4052021.pair.2.fq.gz' with singles file 'SAMN05615096.short/SRR4052021.single.fq.gz'
[Tue 03-11-2020 18:57] Line 17: Starting download from https://ngless.embl.de/resources/References/1.1/hg19.tar.gz
[Tue 03-11-2020 19:15] Line 17: Reference download completed!g19.tar.gz[========================================] 100.0%
[Tue 03-11-2020 19:15] Line 17: Starting mapping to /gpfs0/scratch/xianghui/software/ngless/index/gpfs0/scratch/xianghui/software/ngless/References/hg19/Sequence/BWAIndex/reference.fa.gz
[Tue 03-11-2020 19:17] Line 17: Success
[Tue 03-11-2020 19:17] Line 17: Mapped readset stats (/gpfs0/scratch/xianghui/software/ngless/index/gpfs0/scratch/xianghui/software/ngless/References/hg19/Sequence/BWAIndex/reference.fa.gz):
[Tue 03-11-2020 19:17] Line 17: Total reads: 367525
[Tue 03-11-2020 19:17] Line 17: Total reads aligned: 9 [0.00%]
[Tue 03-11-2020 19:17] Line 17: Total reads Unique map: 9 [0.00%]
[Tue 03-11-2020 19:17] Line 17: Total reads Non-Unique map: 0 [0.00%]
[Tue 03-11-2020 19:17] Line 24: Heuristic for FastQ encoding determination for file "/tmp/reads_block_selected_mapped_reference.1.fq76828-6.gz" cannot be 100% confident. Guessing 33 offset (Sanger encoding, used by newer Illumina machines).
[Tue 03-11-2020 19:17] Line 27: Start BWA index creation for /gpfs0/scratch/xianghui/software/ngless/index/gpfs0/scratch/xianghui/software/ngless/Modules/igc.ngm/1.0/igc.fna
Exiting after fatal error:
An unhandled erorr occurred (this should not happen)!
If you can reproduce this issue, please run your script
with the --trace flag and report a bug (including the script and the trace) at
https://github.com/ngless-toolkit/ngless/issues
The error message was: /gpfs0/scratch/xianghui/software/ngless/index/gpfs0/scratch/xianghui/software/ngless/Modules/igc.ngm/1.0/igc.fna: getFileStatus: does not exist (No such file or directory)
The output is a bit confusing. I don't see any mention of trying to download IGC.
Are you by any chance running multiple parallel jobs in this case?
I'm wondering if the igc download was started in another process and was not complete before this process tried to index the igc.fna
file that didn't yet exist.
please give an example of how configuration shall be set...
I created the symbolic link as suggested. It seems to be working with some errors... Here is the output message. Please help to confirm if it is OK>
(base) [xianghui@ln-0001 gut-short]$ ngless gut-demo.ngl
NGLess v1.1.0 (C) NGLess authors
https://ngless.embl.de/
When publishing results from this script, please cite the following references:
- Coelho, L.P., Alves, R., Monteiro, P., Huerta-Cepas, J., Freitas, A.T., and Bork, P.,
NG-meta-profiler: fast processing of metagenomes using NGLess, a domain-specific language. in
Microbiome 7:84 (2019). DOI: http://doi.org/10.1186/s40168-019-0684-8
- An integrated catalog of reference genes in the human gut microbiome Junhua Li, Huijue Jia, Xianghang
Cai, Huanzi Zhong, Qiang Feng, Shinichi Sunagawa, Manimozhiyan Arumugam, Jens Roat Kultima, Edi
Prifti, Trine Nielsen, Agnieszka Sierakowska Juncker, Chaysavanh Manichanh, Bing Chen, Wenwei
Zhang, Florence Levenez, Juan Wang, Xun Xu, Liang Xiao, Suisha Liang, Dongya Zhang, Zhaoxi Zhang,
Weineng Chen, Hailong Zhao, Jumana Yousuf Al-Aama, Sherif Edris, Huanming Yang, Jian Wang, Torben
Hansen, Henrik Bjørn Nielsen, Søren Brunak, Karsten Kristiansen, Francisco Guarner, Oluf Pedersen,
Joel Doré, S Dusko Ehrlich, MetaHIT Consortium, Peer Bork & Jun Wang Nature Biotechnology 32, 834–841
(2014) doi:10.1038/nbt.2942
- Li, H., 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv
preprint arXiv:1303.3997.
- MOCAT2: a metagenomic assembly, annotation and profiling framework. Kultima JR, Coelho LP, Forslund
K, Huerta-Cepas J, Li S, Driessen M, et al. (2016) Bioinformatics (2016)
doi:10.1093/bioinformatics/btw183
- MOCAT: A Metagenomics Assembly and Gene Prediction Toolkit. Kultima JR, Sunagawa S, Li J, Chen W, Chen
H, Mende DR, et al. (2012) PLoS ONE 7(10): e47656. doi:10.1371/journal.pone.0047656
- Metagenomic species profiling using universal phylogenetic marker genes by Shinichi Sunagawa,
Daniel R Mende, Georg Zeller, Fernando Izquierdo-Carrasco, Simon A Berger, Jens Roat Kultima, Luis
Pedro Coelho, Manimozhiyan Arumugam, Julien Tap, Henrik Bjørn Nielsen, Simon Rasmussen, Søren
Brunak, Oluf Pedersen, Francisco Guarner, Willem M de Vos, Jun Wang, Junhua Li, Joël Doré, S Dusko
Ehrlich, Alexandros Stamatakis, and Peer Bork Nature Methods 10, 1196-1199 (2013)
doi:10.1038/nmeth.2693
[Wed 04-11-2020 18:35]: Script OK. Starting interpretation...
[Wed 04-11-2020 18:35] Line 8: lock1: Obtained lock file: 'ngless-locks/2d8743f8/SAMN05615098.short.lock'
[Wed 04-11-2020 18:35] Line 8: Writing stats to 'ngless-stats/2d8743f8/SAMN05615098.short'
[Wed 04-11-2020 18:35] Line 10: load_mocat_sample found paired-end sample 'SAMN05615098.short/SRR4052033.pair.1.fq.gz' - 'SAMN05615098.short/SRR4052033.pair.2.fq.gz' with singles file 'SAMN05615098.short/SRR4052033.single.fq.gz'
[Wed 04-11-2020 18:35] Line 17: Starting download from https://ngless.embl.de/resources/References/1.1/hg19.tar.gz
[Wed 04-11-2020 18:50] Line 17: Reference download completed!g19.tar.gz[========================================] 100.0%
[Wed 04-11-2020 18:50] Line 17: Start BWA index creation for /home/xianghui/.local/share/ngless/data/References/hg19/Sequence/BWAIndex/reference.fa.gz
[Wed 04-11-2020 19:51] Line 17: Success
[Wed 04-11-2020 19:51] Line 17: Starting mapping to /home/xianghui/.local/share/ngless/data/References/hg19/Sequence/BWAIndex/reference.fa.gz
[Wed 04-11-2020 19:53] Line 17: Success
[Wed 04-11-2020 19:53] Line 17: Mapped readset stats (/home/xianghui/.local/share/ngless/data/References/hg19/Sequence/BWAIndex/reference.fa.gz):
[Wed 04-11-2020 19:53] Line 17: Total reads: 392346
[Wed 04-11-2020 19:53] Line 17: Total reads aligned: 8 [0.00%]
[Wed 04-11-2020 19:53] Line 17: Total reads Unique map: 8 [0.00%]
[Wed 04-11-2020 19:53] Line 17: Total reads Non-Unique map: 0 [0.00%]
[Wed 04-11-2020 19:54] Line 24: Heuristic for FastQ encoding determination for file "/tmp/reads_block_selected_mapped_reference.1.fq23103-6.gz" cannot be 100% confident. Guessing 33 offset (Sanger encoding, used by newer Illumina machines).
[Wed 04-11-2020 19:54] Line 27: Start BWA index creation for /home/xianghui/.local/share/ngless/data/Modules/igc.ngm/1.0/./IGC.fna.gz
[Thu 05-11-2020 00:01] Line 27: Success
[Thu 05-11-2020 00:01] Line 27: Starting mapping to /home/xianghui/.local/share/ngless/data/Modules/igc.ngm/1.0/./IGC.fna.gz
[Thu 05-11-2020 00:06] Line 27: Success
[Thu 05-11-2020 00:06] Line 27: Mapped readset stats (/home/xianghui/.local/share/ngless/data/Modules/igc.ngm/1.0/./IGC.fna.gz):
[Thu 05-11-2020 00:06] Line 27: Total reads: 392344
[Thu 05-11-2020 00:06] Line 27: Total reads aligned: 366416 [93.39%]
[Thu 05-11-2020 00:06] Line 27: Total reads Unique map: 220532 [56.21%]
[Thu 05-11-2020 00:06] Line 27: Total reads Non-Unique map: 145884 [37.18%]
[Thu 05-11-2020 00:06] Line 29: Annotate with reference: "igc"
[Thu 05-11-2020 00:06] Line 29: Loading map file /home/xianghui/.local/share/ngless/data/Modules/igc.ngm/1.0/./IGC_catalog-v1.0.0.emapper.annotations-v2.tsv
[Thu 05-11-2020 00:07] Line 38: Start BWA index creation for /home/xianghui/.local/share/ngless/data/Modules/motus.ngm/0.1/data/mOTU.v1.padded
[Thu 05-11-2020 00:08] Line 38: Success
[Thu 05-11-2020 00:08] Line 38: Starting mapping to /home/xianghui/.local/share/ngless/data/Modules/motus.ngm/0.1/data/mOTU.v1.padded
[Thu 05-11-2020 00:09] Line 38: Success
[Thu 05-11-2020 00:09] Line 38: Mapped readset stats (/home/xianghui/.local/share/ngless/data/Modules/motus.ngm/0.1/data/mOTU.v1.padded):
[Thu 05-11-2020 00:09] Line 38: Total reads: 392344
[Thu 05-11-2020 00:09] Line 38: Total reads aligned: 2047 [0.52%]
[Thu 05-11-2020 00:09] Line 38: Total reads Unique map: 1442 [0.37%]
[Thu 05-11-2020 00:09] Line 38: Total reads Non-Unique map: 605 [0.15%]
[Thu 05-11-2020 00:09] Line 40: Annotate with reference: "motus"
[Thu 05-11-2020 00:09]: Interpretation finished.
[Thu 05-11-2020 00:09]: The collect() call at line 43 could not be executed as there are partial results missing.
When you use the parallel module and the collect() function,
you typically need to run ngless multiple times (once per sample)!
For more information, see https://ngless.embl.de/stdlib.html#parallel-module
[Thu 05-11-2020 00:09]: The collect() call at line 33 could not be executed as there are partial results missing.
When you use the parallel module and the collect() function,
you typically need to run ngless multiple times (once per sample)!
For more information, see https://ngless.embl.de/stdlib.html#parallel-module
Hi @liuxianghui this is a successful execution.
The message about collect()
is simply stating that since you provided a file with multiple samples, the final result can only be produced once all samples are complete.
Please check the link provided in the output https://ngless.embl.de/stdlib.html#parallel-module and in particular the Important notice on that page.