gaolabtools/scNanoGPS

matrix_SNV.raw.vcf.gz unzipped in Step 5

Closed this issue · 3 comments

Hi,
first of all, thank you for developing scNanoGPS. I am trying to do SNV calling from the raw fastq files of our sequencing. I was running into an issue in the 5th step (5.3. single cell SNV profile). This is the traceback of the error:

Merge Longshot result...
Merging part 10 of 10 ...
Generating final matrix...
Filtering by prevalence: 0.01 ...
Traceback (most recent call last):
File "/hpc/pmc_holstege/oscar/scNanoGPS/snvcalling/../reporter_SNV.py", line 540, in
merge_longshot(CB_list, options)
File "/hpc/pmc_holstege/oscar/scNanoGPS/snvcalling/../reporter_SNV.py", line 136, in merge_longshot
filter_by_prevalence(options)
File "/hpc/pmc_holstege/oscar/scNanoGPS/snvcalling/../reporter_SNV.py", line 62, in filter_by_prevalence
fh = bgzf.open(os.path.join(options.o_dir, options.o_pref) + '.raw.vcf.gz', 'rt')
File "/hpc/pmc_holstege/oscar/anaconda3/envs/scNanoGPS/lib/python3.9/site-packages/Bio/bgzf.py", line 273, in open
return BgzfReader(filename, mode)
File "/hpc/pmc_holstege/oscar/anaconda3/envs/scNanoGPS/lib/python3.9/site-packages/Bio/bgzf.py", line 617, in init
self._load_block(handle.tell())
File "/hpc/pmc_holstege/oscar/anaconda3/envs/scNanoGPS/lib/python3.9/site-packages/Bio/bgzf.py", line 644, in _load_block
block_size, self._buffer = _load_bgzf_block(handle, self._text)
File "/hpc/pmc_holstege/oscar/anaconda3/envs/scNanoGPS/lib/python3.9/site-packages/Bio/bgzf.py", line 444, in _load_bgzf_block
raise ValueError(
ValueError: A BGZF (e.g. a BAM file) block should start with b'\x1f\x8b\x08\x04', not b'##fi'; handle.tell() now says 4

Looking into the scNanoGPS_res dir, I see that both matrix_SNV.raw.vcf.gz and matrix_SNV.filtered.vcf.gz have been created, but when I do 'file matrix_SNV.raw.vcf.gz' in the terminal this is the output: 'matrix_SNV.raw.vcf.gz: Variant Call Format (VCF) version 4.2, ASCII text, with very long lines'; and matrix_SNV.filtered.vcf.gz is empty. My intuition is that this file is not being zipped and is causing the problem, but I am not really sure how it works.

Do you have any suggestions on what might be causing the problem? Thank you a lot!

Hi,

I agree with your intuition.
It looks like the vcf is failed to be compressed correctly in a certain step.

Please try the following command step-by-step:
(1) Make sure each one longshot result file is compressed

cat `ls tmp/longshot.output.*.vcf.gz | head -n 1` | head -n 1

Expected result should be in binary which could not be understand.
But when using zcat then it should be readable.

(2) Check vcf merging output is compressed

ls tmp/longshot.output.*.vcf.gz | split -l 500 - splitted_vcf_list_
bcftools merge -0 -l splitted_vcf_list_aa -Oz -o splitted_vcf_list_aa.merge.vcf.gz
tabix -p vcf splitted_vcf_list_aa.merge.vcf.gz

From here, please check the file "splitted_vcf_list_aa.merge.vcf.gz" is compressed by using "zcat".

ls splitted_vcf_list_aa.merge.vcf.gz > final_vcf_list
bcftools merge -0 -l final_vcf_list -Oz -o test.vcf.gz

From here, please check the file "test.vcf.gz" is compressed.

Please keep me updated of the trial results.
Thanks.

Besides,
I've updated reporter_SNV.py line 126 to further guarantee the merged vcf file is compressed.
Please download reporter_SNV.py again and run it again to see whether this fix the problem or not.

Regards,
Cheng-Kai

Hi Cheng-Kai,
thanks for the fast reply. The "test.vcf.gz" does work, there was a problem with my installation of bcftools (a dependency was not correctly installed). After doing ldd $(whereis bcftools) I had this problem:
libcrypto.so.1.0.0 => not found
I added my base libcrypto.so.1.0.0 to my path and it works now.
Now I am running into another problem similar to #32. I tested with the --ref_genome example/GRCh38_chr22.fa argument and with my own data, but I run into the same problem. Maybe we can close this issue then?

I'm closing this issue.
I'll answer #32.