liulab-dfci/MAESTRO

error reading bam file

taylor-shi opened this issue · 5 comments

Hello. I tried running the Maestro pipeline on a scRNA-seq dataset of 3000 PBMC granulocyte sorted cells, obtained from the 10X Genomics website. I was able to successfully run the MAESTRO scrna-init command and performed a dry run with no errors. However, when I try to run the workflow I get the following error:

[W::bam_hdr_read] EOF marker is absent. The input is probably truncated
[E::bgzf_read] Read block operation failed with error 6 after 20 of 155 bytes

I would guess from the error that the BAM file was very large and somehow became truncated, but I dont know how to fix this. Any guidance would be appreciated, thank you in advance!

it seems the bam file you downloaded is not complete. can you re-download it and run it again?

I thought that the BAM file was created by the MAESTRO pipeline. There are no BAM files used to initialize the scRNA-Seq workflow, correct? Here is a larger section of the error file if that is helpful.

rule scrna_map:
input: /n/stat115/2021/HW5/references/Refdata_scRNA_MAESTRO_GRCh38_1.2.2/GRCh38_STAR_2.7.6a, /n/stat115/2021/HW5/references/whitelist/rna/737K-arc-v1.txt, /n/stat115/2021/HW5/pbmc_granulocyte_sorted_3k/gex
output: Result/STAR/scrna_pbmc_granulocyte_sorted_3kAligned.sortedByCoord.out.bam, Result/STAR/scrna_pbmc_granulocyte_sorted_3kAligned.sortedByCoord.out.bam.bai, Result/STAR/scrna_pbmc_granulocyte_sorted_3kSolo.out/Gene/raw/matrix.mtx, Result/STAR/scrna_pbmc_granulocyte_sorted_3kSolo.out/Gene/raw/features.tsv, Result/STAR/scrna_pbmc_granulocyte_sorted_3kSolo.out/Gene/raw/barcodes.tsv
log: Result/Log/scrna_pbmc_granulocyte_sorted_3k_STAR.log
jobid: 4
benchmark: Result/Benchmark/scrna_pbmc_granulocyte_sorted_3k_STAR.benchmark
threads: 12

[W::bam_hdr_read] EOF marker is absent. The input is probably truncated
[E::bgzf_read] Read block operation failed with error 6 after 62 of 155 bytes
samtools index: failed to create index for "Result/STAR/scrna_pbmc_granulocyte_sorted_3kAligned.sortedByCoord.out.bam"

Any further guidance would be appreciated. Thank you very much!

Hi, so you started with fastq files? if so, can you delete the MAESTRO generated bam file, and re-run it?

Hi, yes, I tried doing this multiple times and still get the same error. I also tried updating samtools to the latest version after activating the maestro environment (following the advice of another forum) but that did not work.

I have scATAQ-seq data as well that I want to run through the ataq-specific maestro pipeline. This workflow also starts with fastq files, and I again ran into the same error when I tried to run the ataq pipeline. It seems like the error is not specific to a specific dataset.

I was able to resolve the issue, my disk was running out of space from unrelated files. Thanks for your responsiveness!