gaolabtools/scNanoGPS

error message at the end of "5.4 Generate final summary table", low java memory.

Closed this issue · 5 comments

Hello Cheng-Kai,

I come to the "5.4 Generate final summary table" process! (I skipped the 5.3 SNV profile at this moment.)
But unfortunatelly, I got another error message, and has not come to the "summary.txt"...

Below is the stdout after run the reporter_summary.py
The error message come up at the final of this process.
Please see the bottom of the stdout below.

It looks that memory which set to Java memory in qualimap process is low.
Is there any solution to this errow message? for example, assign more Java memory in qualimap?

Parsing scanner log ...
Done.

Computing read length ...
Done.

Calculating quality score ...
Done.


Merging all bam file for qualimap...

100 of 100 files...
[samfaipath] build FASTA index...
[E::fai_build3] Cannot index files compressed with gzip, please use bgzip
[samfaipath] fail to build FASTA index.
[E::sam_parse1] missing SAM header
[W::sam_read1] Parse error at line 1
[main_samview] truncated file.
Done.

/home/minoruyano/opt/local/scNanoGPS/reporter_summary.py:216: DtypeWarning: Columns (1) have mixed types. Specify dtype option on import or set low_memory=False.
  expr_df = pd.read_csv(options.exp_tb, header = 0, sep = '\t', skiprows =1, compression = options.compression)
Calling Qualimap...
Java memory size is set to 1200M
Launching application...

OpenJDK 64-Bit Server VM warning: Ignoring option MaxPermSize; support was removed in 8.0
QualiMap v.2.3
Built on 2023-05-19 16:57

Selected tool: rnaseq
Wed Apr 17 16:34:34 JST 2024            WARNING Output folder already exists, the results will be saved there

Initializing regions from /home/minoruyano/reference_genome_and_annotations/Homo_sapiens.GRCh38.111.gtf...

Initialized 100000 regions...
Initialized 200000 regions...
Initialized 300000 regions...
Initialized 400000 regions...
Initialized 500000 regions...
Initialized 600000 regions...
Initialized 700000 regions...
Initialized 800000 regions...
Initialized 900000 regions...
Initialized 1000000 regions...

WARNING: out of memory!
Qualimap allows to set RAM size using special argument: --java-mem-size
Check more details using --help command or read the manual.
Traceback (most recent call last):
  File "/home/minoruyano/opt/local/scNanoGPS/reporter_summary.py", line 242, in <module>
    fh = open(os.path.join(options.tmp_dir, "master_rnaseq_qc", "rnaseq_qc_results.txt"), "rt")
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'tmp_barcode09_GRCh38dnatoplevel/master_rnaseq_qc/rnaseq_qc_results.txt'

Best regards,
Minoru

Hi Minoru,

The error is due to insufficient memory when running qualimap with default java setting.
Please check this to increase the memory usage:
https://stackoverflow.com/questions/1565388/increase-heap-size-in-java
I've updated and added a new parameter in "reporter_summary.py".
You may try adding the parameter when running reporter_summary.py:

--qualimap_param "--java-mem-size=300G"

Thanks.

Regards,
Cheng-Kai

Hi Cheng-Kai,

Thank you for the quick response and the update on the "reporter_summary.py".
with the update, I succeeded in going ahead to the complete!!

But unfortunately, I still got another error as below at bam file analysis after successful completiion of initializing regions and of constructing transcripts.



Parsing scanner log ...
Done.

Computing read length ...
Done.

Calculating quality score ...
Done.


Merging all bam file for qualimap...

100 of 100 files...
[samfaipath] build FASTA index...
[E::fai_build3] Cannot index files compressed with gzip, please use bgzip
[samfaipath] fail to build FASTA index.
[E::sam_parse1] missing SAM header
[W::sam_read1] Parse error at line 1
[main_samview] truncated file.
Done.

/home/minoruyano/opt/local/scNanoGPS/reporter_summary.py:219: DtypeWarning: Columns (1) have mixed types. Specify dtype option on import or set low_memory=False.
  expr_df = pd.read_csv(options.exp_tb, header = 0, sep = '\t', skiprows =1, compression = options.compression)
Calling Qualimap...
Java memory size is set to 300G
Launching application...

OpenJDK 64-Bit Server VM warning: Ignoring option MaxPermSize; support was removed in 8.0
QualiMap v.2.3
Built on 2023-05-19 16:57

Selected tool: rnaseq
Thu Apr 18 08:29:48 JST 2024            WARNING Output folder already exists, the results will be saved there

Initializing regions from /home/minoruyano/reference_genome_and_annotations/Homo_sapiens.GRCh38.111.gtf...

Initialized 100000 regions...
Initialized 200000 regions...
Initialized 300000 regions...
Initialized 400000 regions...
Initialized 500000 regions...
Initialized 600000 regions...
Initialized 700000 regions...
Initialized 800000 regions...
Initialized 900000 regions...
Initialized 1000000 regions...
Initialized 1100000 regions...
Initialized 1200000 regions...
Initialized 1300000 regions...
Initialized 1400000 regions...
Initialized 1500000 regions...
Initialized 1600000 regions...
Initialized 1700000 regions...
Initialized 1800000 regions...
Initialized 1900000 regions...
Initialized 2000000 regions...
Initialized 2100000 regions...
Initialized 2200000 regions...
Initialized 2300000 regions...
Initialized 2400000 regions...
Initialized 2500000 regions...
Initialized 2600000 regions...
Initialized 2700000 regions...
Initialized 2800000 regions...
Initialized 2900000 regions...
Initialized 3000000 regions...
Initialized 3100000 regions...
Initialized 3200000 regions...
Initialized 3300000 regions...
Initialized 3400000 regions...

Initialized 3424897 regions it total

Starting constructing transcripts for RNA-seq stats...
Finished constructing transcripts

Starting BAM file analysis

Failed to run rnaseq
java.lang.RuntimeException: BAM file is empty or no read alignments are located in exonic regions.
        at org.bioinfo.ngs.qc.qualimap.process.ComputeCountsTask.run(ComputeCountsTask.java:559)
        at org.bioinfo.ngs.qc.qualimap.process.RNASeqQCAnalysis.run(RNASeqQCAnalysis.java:68)
        at org.bioinfo.ngs.qc.qualimap.main.RnaSeqQcTool.execute(RnaSeqQcTool.java:221)
        at org.bioinfo.ngs.qc.qualimap.main.NgsSmartTool.run(NgsSmartTool.java:190)
        at org.bioinfo.ngs.qc.qualimap.main.NgsSmartMain.main(NgsSmartMain.java:113)

Traceback (most recent call last):
  File "/home/minoruyano/opt/local/scNanoGPS/reporter_summary.py", line 245, in <module>
    fh = open(os.path.join(options.tmp_dir, "master_rnaseq_qc", "rnaseq_qc_results.txt"), "rt")
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'tmp_barcode09_GRCh38dnatoplevel/master_rnaseq_qc/rnaseq_qc_results.txt' 

When I see file size under the tmp folder, I did not see size zero bam file there.
In addition, the "master*" files and folder are as below.

$ls -l master*
-rw-r--r-- 1 minoruyano fgin   92 Apr 18 08:29 tmp_barcode09_GRCh38dnatoplevel/master.bam
-rw-r--r-- 1 minoruyano fgin   16 Apr 18 08:29 tmp_barcode09_GRCh38dnatoplevel/master.bam.bai

tmp_barcode09_GRCh38dnatoplevel/master_rnaseq_qc:
total 0

I am sorry for asking you repeatedly, but I feel that I am close to the end of my completion in using scNanoGPS.
I hope that I can get another help from you.

Best regards,
Minoru

Hi Minoru,

The error message showed that when qualimap preparing files it couldn't find bgzip:

samfaipath] build FASTA index...
[E::fai_build3] Cannot index files compressed with gzip, please use bgzip
[samfaipath] fail to build FASTA index.
[E::sam_parse1] missing SAM header
[W::sam_read1] Parse error at line 1
[main_samview] truncated file.

Please make sure your environment has bgzip installed in it.
you can try the following command:

which bgzip

Regards,
Cheng-Kai

Hi Cheng-Kai,

Thank you for the comment.
I checked the bgzip and I found that it may have been installed as shown below.
If this is true, is there any solution for this error?

Best regards,
Minoru

$ which bgzip
/home/minoruyano/anaconda3/bin/bgzip

$ bgzip -h

Usage:   bgzip [options] [file] ...

Options: -c      write on standard output, keep original files unchanged
         -d      decompress
         -f      overwrite files without asking
         -b INT  decompress at virtual file pointer INT
         -s INT  decompress INT bytes in the uncompressed file
         -h      give this help

Hi Cheng -Kai,

This trouble was solved!!
This is very common and basic and easy trouble...
I am sorry that I disturbed you by such the trouble.

The cause of this trouble was that the reference genome file format was not compressed-by-bgzip but by-gzip.
I recompressed the Homo_sapiens.GRCh38.dna.toplevel.fa.gz by bgzip as below, then the reporter_summary.py process wad completed !!

$ ls -l Homo_sapiens.GRCh38.dna.toplevel.fa.gz
-rw-r--r-- 1 minoruyano fgin  936478864 Apr 12 15:47 Homo_sapiens.GRCh38.dna.toplevel.fa.gz

$ gzip -d cp_Homo_sapiens.GRCh38.dna.toplevel.fa.gz
$ bgzip cp_Homo_sapiens.GRCh38.dna.toplevel.fa

$ ls -l Homo_sapiens.GRCh38.dna.toplevel.fa.gz*
-rw-r--r-- 1 minoruyano fgin 950902225 Apr 26 14:35 Homo_sapiens.GRCh38.dna.toplevel.fa.gz
-rw-r--r-- 1 minoruyano fgin     26844 Apr 26 14:38 Homo_sapiens.GRCh38.dna.toplevel.fa.gz.fai
-rw-r--r-- 1 minoruyano fgin    817016 Apr 26 14:38 Homo_sapiens.GRCh38.dna.toplevel.fa.gz.gzi

Before I found that this is the reference genome data issue, I had suspected that the CB.fa.gz files, which was made in assigner process, were incorrect with reading the comment "[E::fai_build3] Cannot index files compressed with gzip, please use bgzip".
However, by modifiying the reporter_summary.py and reviewing at which lines the trouble happened, I found that the error message was made while the code line 189-192 as below;

cmd = options.samtools + " view -Sb " + os.path.join(options.tmp_dir, "master.sam") + \
      " -T " + options.ref_genome + \
      " -o " + os.path.join(options.tmp_dir, "master.unsorted.bam")
os.system(cmd)

I found the word "options.ref_genome", then I realized that the issue is not from the CB.fa.gz files but from the reference genome file.

In addition, I now understand why example data was processed without this error, but I encounted this error in my project. The example data is using the "ch22.fa.gz" file with correctly comprresed by bgzip.

Now, after having solved this issue, I come to the end of the process with a comment, "Generating summary... Done."
Next, I will check into the all data from the scNanoGPS process!!

Best regards,
Minoru YANO