jtamames/SqueezeMeta

Error: Stopping in STEP10 -> 10.mapsamples.pl. File Sample1/intermediate/10.Sample1.mapcount is empty!

Closed this issue · 15 comments

Hi - running without assembly as I did this externally. Running into an error at step 10. Syslog, script, and file are attached. Any idea what is causing this error? Thanks!

Syslog: syslog.zip

This is my script:

#!/bin/bash

#SBATCH --job-name=SqueezeMeta_test_contigs
#SBATCH --nodes=1
#SBATCH --ntasks=128
#SBATCH --cpus-per-task=1
#SBATCH --mem=200G
#SBATCH --output=squeezemeta_test-%j.out
#SBATCH --error=squeezemeta_test-%j.err

# Load Conda environment
source activate SqueezeMeta

# Set environmental parameters
project_dir="MTG"
dir_data="${project_dir}/data"
dir_docs="${project_dir}/docs"
dir_results="${project_dir}/results"
dir_contigs="${dir_results}/megahit_assemblies/final_assemblies"
dir_test="${dir_results}/squeezemeta_test"
dir_clean_reads="${dir_data}/unmapped_bovine"

# Ensure directories exist
mkdir -p ${dir_test}/raw

# Sample file path
sample_file="${dir_test}/samplefile.txt"

# Contig file path
contig_file="${dir_test}/contigs/RMIC-3a.contigs.fa"


# Run SqueezeMeta
SqueezeMeta.pl -m sequential -s $sample_file -f $dir_clean_reads --extassembly $contig_file --nobins -t 128 

And this is the stdout file:

SqueezeMeta v1.6.4, July 2024 - (c) J. Tamames, F. Puente-Sánchez CNB-CSIC, Madrid, SPAIN

Please cite: Tamames & Puente-Sanchez, Frontiers in Microbiology 9, 3349 (2019). doi: https://doi.org/10.3389/fmicb.2018.03349

Run started Tue Jul 16 16:28:47 2024 in sequential mode
67 metagenomes found: Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7 Sample8 Sample9 Sample10 Sample11 Sample12 Sample13 Sample14 Sample15 Sample16 Sample17 Sample18 Sample19
Sample20 Sample21 Sample22 Sample23 Sample24 Sample25 Sample26 Sample27 Sample28 Sample29 Sample30 Sample31 Sample32 Sample33 Sample34 Sample35 Sample36 Sample37 Sample38 Sample39 Sample40 Sample41 Sample42 Sample43 Sample44 Sample45 Sample46 Sample47 Sample48 Sample49 Sample50 Sample51 Sample52 Sample53 Sample54 Sample55 Sample56 Sample57 Sample58 Sample59 Sample60 Sample61 Sample62 Sample63 Sample64 Sample65 Sample66 Sample67

--- SAMPLE Sample1 ---
Now creating directories
Reading configuration from /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/SqueezeMeta_conf.pl
Running trimmomatic (Bolger et al 2014, Bioinformatics 30(15):2114-20) for quality filtering
Parameters:
[0 seconds]: STEP1 -> RUNNING ASSEMBLY: 01.run_all_assemblies.pl (megahit)
External assembly provided: /home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/contigs/RMIC-3a.contigs.fa. Overriding assembly
Renaming contigs in /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/results/01.Sample1.fasta
Running prinseq for contig statistics: /data/home/AGR.GC.CA/oharaeo/miniconda3/envs/functional_env/envs/SqueezeMeta/SqueezeMeta/bin/prinseq-lite.pl -fasta /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/results/01.Sample1.fasta -stats_len -stats_info -stats_assembly > /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/intermediate/01.Sample1.stats
Counting length of contigs
Contigs stored in /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/results/01.Sample1.fasta
Number of contigs: 409547
[5 seconds]: STEP2 -> RNA PREDICTION: 02.rnas.pl
Running barrnap (Seeman 2014, Bioinformatics 30, 2068-9) for predicting RNAs: Bacteria Archaea Eukaryote Mitochondrial
Running RDP classifier (Wang et al 2007, Appl Environ Microbiol 73, 5261-7)
Running Aragorn (Laslett & Canback 2004, Nucleic Acids Res 31, 11-16) for tRNA/tmRNA prediction
[2 minutes, 23 seconds]: STEP3 -> ORF PREDICTION: 03.run_prodigal.pl
Running prodigal (Hyatt et al 2010, BMC Bioinformatics 11: 119) for predicting ORFs
ORFs predicted: 607997
[20 minutes, 5 seconds]: STEP4 -> HOMOLOGY SEARCHES: 04.rundiamond.pl
Setting block size for Diamond
AVAILABLE (free) RAM memory: 2986.41 Gb
We will set Diamond block size to 16 (Gb RAM/8, Max 16).
You can override this setting using the -b option when starting the project, or changing
the $blocksize variable in SqueezeMeta_conf.pl
Working with taxonomy database in /data/home/AGR.GC.CA/oharaeo/databases/SqueezeMeta/db/nr.dmnd
taxa COGS Running Diamond (Buchfink et al 2015, Nat Methods 12, 59-60) for KEGG

[1 hours, 55 minutes, 3 seconds]: STEP5 -> HMMER/PFAM: 05.run_hmmer.pl
Running HMMER3 (Eddy 2009, Genome Inform 23, 205-11) for Pfam
[9 hours, 3 minutes, 34 seconds]: STEP6 -> TAXONOMIC ASSIGNMENT: 06.lca.pl
Splitting Diamond file
Starting multithread LCA in 124 threads
Creating /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/results/06.Sample1.fun3.tax.wranks file
Creating /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/results/06.Sample1.fun3.tax.noidfilter.wranks file
[9 hours, 10 minutes, 40 seconds]: STEP7 -> FUNCTIONAL ASSIGNMENT: 07.fun3assign.pl
Functional assignment for COGS KEGG PFAM
[9 hours, 11 minutes, 12 seconds]: STEP9 -> CONTIG TAX ASSIGNMENT: 09.summarycontigs3.pl
Reading /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/results/06.Sample1.fun3.tax.wranks
Writing output to /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/intermediate/09.Sample1.contiglog
Reading /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/results/06.Sample1.fun3.tax.noidfilter.wranks
Writing output to /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/intermediate/09.Sample1.contiglog.noidfilter
[9 hours, 12 minutes, 29 seconds]: STEP10 -> MAPPING READS: 10.mapsamples.pl
Reading samples from /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/data/00.Sample1.samples
Metagenomes found: 1
Reading contig length from /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/intermediate/01.Sample1.lon
Reading orf info from /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/results/03.Sample1.gff
Mapping with Bowtie2 (Langmead and Salzberg 2012, Nat Methods 9(4), 357-9)
Creating reference from contigs
Working with sample 1: Sample1
Getting raw reads
Aligning to reference with bowtie
Calculating contig coverage
Counting with sqm_counter: Opening 124 threads
25228 reads counted
26259 reads counted
28036 reads counted
27126 reads counted
36335 reads counted
29219 reads counted
34291 reads counted
26223 reads counted
25687 reads counted
37082 reads counted
21675 reads counted
23531 reads counted
27702 reads counted
24775 reads counted
27307 reads counted
22681 reads counted
28441 reads counted
26875 reads counted
29350 reads counted
36139 reads counted
32997 reads counted
22705 reads counted
26072 reads counted
28050 reads counted
23498 reads counted
21423 reads counted
27253 reads counted
23065 reads counted
25459 reads counted
25878 reads counted
31673 reads counted
22222 reads counted
25560 reads counted
28341 reads counted
24313 reads counted
31043 reads counted
24193 reads counted
32221 reads counted
24018 reads counted
26001 reads counted
31307 reads counted
23334 reads counted
25134 reads counted
23601 reads counted
26191 reads counted
21305 reads counted
27817 reads counted
24601 reads counted
23746 reads counted
24871 reads counted
31767 reads counted
25782 reads counted
27296 reads counted
27081 reads counted
26253 reads counted
28504 reads counted
20810 reads counted
28770 reads counted
28387 reads counted
25936 reads counted
30211 reads counted
25427 reads counted
43584 reads counted
28567 reads counted
24499 reads counted
29799 reads counted
26652 reads counted
25957 reads counted
23996 reads counted
30291 reads counted
23566 reads counted
23915 reads counted
27949 reads counted
25814 reads counted
28938 reads counted
27531 reads counted
28020 reads counted
22849 reads counted
33552 reads counted
22303 reads counted
27374 reads counted
36699 reads counted
30604 reads counted
28441 reads counted
35902 reads counted
33837 reads counted
24466 reads counted
24746 reads counted
27012 reads counted
28201 reads counted
27393 reads counted
34170 reads counted
28336 reads counted
24031 reads counted
23124 reads counted
25152 reads counted
23676 reads counted
23121 reads counted
30124 reads counted
28373 reads counted
25016 reads counted
25777 reads counted
27217 reads counted
25022 reads counted
25788 reads counted
22217 reads counted
30334 reads counted
29822 reads counted
23516 reads counted
24071 reads counted
30637 reads counted
28960 reads counted
27310 reads counted
29613 reads counted
24669 reads counted
38495 reads counted
31187 reads counted
24151 reads counted
31372 reads counted
27601 reads counted
24331 reads counted
42881 reads counted
25484 reads counted
23418 reads counted
Output in /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/intermediate/10.Sample1.mapcount
Stopping in STEP10 -> 10.mapsamples.pl. File /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/intermediate/10.Sample1.mapcount is empty!

If you don't know what went wrong or want further advice, please look for similar issues in https://github.com/jtamames/SqueezeMeta/issues
Feel free to open a new issue if you don't find the answer there. Please add a brief description of the problem and upload the /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/Sample1/syslog file (zip it first)

I am running into the same issue, despite an attempted fix. The error is indicating the 'mapcount' file is empty, but it is not, it seems to be correctly formatted:

image

Any idea what is causing this? Note slightly changed script, test sample name changed from Sample1 to "RMIC-3A"

This was my fault, but should be fixed in 1.6.4.
If you don't want to update just restart from step 11, the results should actually be fine already regardless of SqueezeMeta complaining.

Thanks! I added the --restart -step 11 flag to the command, but I'm encountering a different error:

Can't open samples file (-s) in /data/00..samples. Please check that it is the correct file. There is a file there, in that directory. Is this a related issue? I've had this error before when trying to restart jobs.

You need to restart the individual sample that failed (so add -p Sample1 to the call)
Since this would be cumbersome, you can also create a for loop in which for each samples:

  1. You create a subset of your samples file, containing only the fastq files for that samples
  2. You run SqueezeMeta for that sample using the cosssembly mode SqueezeMeta.pl -m cosssembly -s subsetted_samples_file.tsv -p sample_name -f /path/to/raw/fastqs

This will give a similar result than the sequential mode, but will avoid the bug in step 10. If working in a cluster, you could also launch each sample as a different job, which will give you faster results.

I noticed that you seem to be already using v1.6.4, can you confirm this? The bug is supposed to be fixed in there, otherwise I'll look into it when I get back from holidays

Thanks - adding the sample name worked. This is just a test run on a single sample so it wasn't a problem. I'll look into different jobs for each sample on the full run.

And yes, confirmed that this is v1.6.4.

Hi! I have the same issue with coassembly mode using v1.6.4. Leaving a reply here if there is a follow-up

This should be now fixed in 1.6.5, let me know if you still experience problems

I'm still encountering the same error using version 1.6.5, this is the stdout file:

SqueezeMeta v1.6.5, August 2024 - (c) J. Tamames, F. Puente-Sánchez CNB-CSIC, Madrid, SPAIN

Please cite: Tamames & Puente-Sánchez, Frontiers in Microbiology 9, 3349 (2019). doi: https://doi.org/10.3389/fmicb.2018.03349

Run started Tue Aug 13 09:52:43 2024 in sequential mode
1 metagenomes found: RMIC-3A

--- SAMPLE RMIC-3A ---
Contig tax file /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/RMIC-3A/intermediate/09.RMIC-3A.contiglog already found, skipping step 9
[0 seconds]: STEP10 -> MAPPING READS: 10.mapsamples.pl
Reading samples from /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/RMIC-3A/data/00.RMIC-3A.samples
Metagenomes found: 1
Reading contig length from /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/RMIC-3A/intermediate/01.RMIC-3A.lon
Reading orf info from /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/RMIC-3A/results/03.RMIC-3A.gff
Mapping with Bowtie2 (Langmead and Salzberg 2012, Nat Methods 9(4), 357-9)
Creating reference from contigs
Working with sample 1: RMIC-3A
Getting raw reads
Aligning to reference with bowtie
BAM file already found in /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/RMIC-3A/data/bam/RMIC-3A.RMIC-3A.bam, skipping
Calculating contig coverage
Counting with sqm_counter: Opening 124 threads
124299 reads counted
113444 reads counted
115428 reads counted
129580 reads counted
124751 reads counted
116770 reads counted
146233 reads counted
104168 reads counted
118634 reads counted
132573 reads counted
102945 reads counted
137322 reads counted
138487 reads counted
121774 reads counted
101183 reads counted
118859 reads counted
111903 reads counted
129460 reads counted
110807 reads counted
115813 reads counted
122988 reads counted
101488 reads counted
134693 reads counted
102212 reads counted
107920 reads counted
121081 reads counted
101667 reads counted
301689 reads counted
136553 reads counted
103343 reads counted
118448 reads counted
112259 reads counted
107742 reads counted
126454 reads counted
106349 reads counted
143754 reads counted
135517 reads counted
113362 reads counted
130643 reads counted
107889 reads counted
133304 reads counted
103032 reads counted
114337 reads counted
125945 reads counted
99468 reads counted
112034 reads counted
129694 reads counted
122959 reads counted
138583 reads counted
138960 reads counted
118876 reads counted
140740 reads counted
117713 reads counted
118488 reads counted
112253 reads counted
127211 reads counted
119312 reads counted
117287 reads counted
117435 reads counted
139843 reads counted
136703 reads counted
128153 reads counted
111639 reads counted
118074 reads counted
117192 reads counted
113633 reads counted
114666 reads counted
134177 reads counted
105133 reads counted
115982 reads counted
154114 reads counted
130382 reads counted
135572 reads counted
120261 reads counted
114874 reads counted
106310 reads counted
139945 reads counted
128657 reads counted
125180 reads counted
128275 reads counted
100298 reads counted
129319 reads counted
107963 reads counted
153402 reads counted
119957 reads counted
102118 reads counted
112499 reads counted
130106 reads counted
122172 reads counted
122409 reads counted
142998 reads counted
169731 reads counted
139736 reads counted
116870 reads counted
115515 reads counted
124093 reads counted
127733 reads counted
138398 reads counted
127281 reads counted
111055 reads counted
157144 reads counted
172411 reads counted
108871 reads counted
124514 reads counted
103204 reads counted
174357 reads counted
137474 reads counted
107233 reads counted
132074 reads counted
117031 reads counted
109012 reads counted
122804 reads counted
136922 reads counted
121473 reads counted
122620 reads counted
113755 reads counted
112720 reads counted
112854 reads counted
138251 reads counted
119585 reads counted
124009 reads counted
122411 reads counted
109912 reads counted
119932 reads counted
Output in /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/RMIC-3A/intermediate/10.RMIC-3A.mapcount
Stopping in STEP10 -> 10.mapsamples.pl. File /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/RMIC-3A/results/10.RMIC-3A.mappingstat is empty!

If you don't know what went wrong or want further advice, please look for similar issues in https://github.com/jtamames/SqueezeMeta/issues
Feel free to open a new issue if you don't find the answer there. Please add a brief description of the problem and upload the /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/RMIC-3A/syslog file (zip it first)

Can you share /data/home/AGR.GC.CA/oharaeo/projects/GR2301/MTG/results/squeezemeta_test/RMIC-3A/results/10.RMIC-3A.mappingstat with me here?

Thanks for the quick reply!
This is the mappingstat file:

Sample Total reads Mapped reads Mapping perc Total bases
RMIC-3A 31785036 14746863 46.40 4693972972

There should be a # symbol before the word "Sample", is it there?

Please upload the actual file

Thanks for all the responses. So, I was running to some issues following update, and decided to do a complete clean re-install of SqueezeMeta, v.1.6.5. I've re-run the same script as above to test, and the stdout file is providing the below error:

stdout file:

SqueezeMeta v1.6.5, August 2024 - (c) J. Tamames, F. Puente-Sánchez CNB-CSIC, Madrid, SPAIN

Please cite: Tamames & Puente-Sánchez, Frontiers in Microbiology 9, 3349 (2019). doi: https://doi.org/10.3389/fmicb.2018.03349

Run started Wed Aug 14 09:34:21 2024 in sequential mode
1 metagenomes found: RMIC-3A

--- SAMPLE RMIC-3A ---
Can't open conf file /data/home/AGR.GC.CA/oharaeo/miniconda3/envs/functional_env/envs/SqueezeMeta/SqueezeMeta/scripts/SqueezeMeta_conf.pl

Any idea on this - should I open it as a fresh issue?

For clarity, this is the relevant part of the command:

dir_data="${project_dir}/data"
dir_docs="${project_dir}/docs"
dir_results="${project_dir}/results"
dir_contigs="${dir_results}/megahit_assemblies/final_assemblies"
dir_test="${dir_results}/squeezemeta_test"
dir_clean_reads="${dir_data}/unmapped_bovine"

# Sample file path
sample_file="${dir_test}/samplefile.txt"

# Contig file path
contig_file="${dir_test}/contigs/RMIC-3a.contigs.fa"

# Run SqueezeMeta
SqueezeMeta.pl -m sequential -s $sample_file -f $dir_clean_reads --extassembly $contig_file --nobins -t 124

Yes - apologies! I figured that out, and the test run has completed successfully! Thank you for your patience and help!