Error when running miRge3.0: ValueError in summarize function
Closed this issue · 6 comments
Hello! When running miRge3.0, I encountered an error during the "Summarizing and tabulating results" step. The traceback message and error indicate an issue with setting a DataFrame with multiple columns to a single column called miRNA_cbind
.
The command to run the job was as follows:
#!/bin/bash
#SBATCH --job-name=miRge3_job
#SBATCH --output=miRge3_output.txt
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --time=72:00:00
#SBATCH --mem=16G
eval "$(micromamba shell hook --shell bash)"
micromamba activate mirna
miRge3.0 --samples some1.fastq,some2.fastq --mir-DB miRBase --organism-name fruitfly --outDir miRge3_output --libraries-path miRge3_Lib
But, the following error is encountered during the execution of the summarize
function:
Alignment in progress ...
Alignment completed in 6255.9694 second(s)
Summarizing and tabulating results...
Traceback (most recent call last):
File "/home/shashank/micromamba/envs/mirna/bin/miRge3.0", line 10, in <module>
sys.exit(main())
File "/home/shashank/.local/lib/python3.10/site-packages/mirge/__main__.py", line 166, in main
summarize(args, workDir, ref_db, base_names, pdMapped, sampleReadCounts, trimmedReadCounts, trimmedReadCountsUnique)
File "/home/shashank/.local/lib/python3.10/site-packages/mirge/libs/summary.py", line 744, in summarize
subpdMapped['miRNA_cbind'] = subpdMapped[['exact miRNA', 'isomiR miRNA']].apply(lambda x: ''.join(x), axis = 1)
File "/home/shashank/micromamba/envs/mirna/lib/python3.10/site-packages/pandas/core/frame.py", line 3940, in __setitem__
self._set_item_frame_value(key, value)
File "/home/shashank/micromamba/envs/mirna/lib/python3.10/site-packages/pandas/core/frame.py", line 4094, in _set_item_frame_value
raise ValueError(
ValueError: Cannot set a DataFrame with multiple columns to the single column miRNA_cbind
Could you please let me know how to tackle this?
Thank you!
Edit: I used micromamba create mirna mirge3
to install mirge3.
Hi @shashankpritam,
This is probably due the old version of numpy installed. Can you let me know what numpy version is installed?
Thank you,
Arun.
Hello @arunhpatil Thank you for your prompt response.
Here's numpy info -
Name: numpy
Version: 1.23.2
Location: /home/shashank/.local/lib/python3.10/site-packages
This version seems fine. Can you send full or subset of some1.fastq
file for testing? Also, can you try with one other file? May be use the test file available here. Let me know how this goes.
Thank you,
Arun
Hi! Yes, The test file works as explained in the quick start docs.
Here's what I have been running the miRge over -
@SRR21820251.1 1 length=99
NGAGATTTTATTTCGGCAGATCATGACGATATCGCAAACTCGTGAACCTTACCACGCGTCGTGTGTGTACGTGTGTGTATATATAGCATACGTAAACTA
+SRR21820251.1 1 length=99
#FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFF
@SRR21820251.2 2 length=100
NTTTCTTTGGACCTGTCTTACCTTGAGGATCATATGGAAGCATAATCTTCACTTTGATCCCAAGAACACCTTGTCGAAGAAGCACATGACGGGTTGCAGT
+SRR21820251.2 2 length=100
#FFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,FFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@SRR21820251.3 3 length=100
NGCATGTATTTCATTATACTTTTCACTTGTATCAAAGGTAGCTCTTCGTTCAGCGACGAACATTTTCCTGCTTCCATTAACATCTTCATTTGTAGCCTGT
umassemb$ head SRR21820251_2.fastq
@SRR21820251.1 1 length=100
ATGTACTGACGTCTGGCTAAGGAGAGAAAGGGTGAGAGCGTATACAATCAATTAGTTTTTTAGTTTACGTATGCTATATATACACACACGTACACACACG
+SRR21820251.1 1 length=100
FFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF:FF:FFFF:FFFFFFFFFFFFF
@SRR21820251.2 2 length=100
TGTTGTTAGCGGTAAATTGCGTGGACAAAGAGCAAAATCAATGAAATTCGTTGATGGATTGATGATTCACTCTGGTGAACCCACCAATGAATATGTTAAT
+SRR21820251.2 2 length=100
FFF:FFFFFFFF,:FFFFFF:FFFFFFFFFFFFFFFFFFFFF,FFFFFFF,:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFF
@SRR21820251.3 3 length=100
CAACAGTTGTCGGCAGGTTCTTACGGATGTTCTACACATAGACTTATACCAAGGAAACAGGCTACAAATGAAGATGTTAATGGAAGCAGGAAAATGTTCG
umassemb$ head out_SRR21820251_1.fastq
@SRR21820251.1 1 length=99
NGAGATTTTATTTCGGCAGATCATGACGATATCGCAAACTCGTGAACCTTACCACGCGTCGTGTGTGTACGTGTGTGTATATATAGCATACGTAAACTA
+SRR21820251.1 1 length=99
#FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFF
@SRR21820251.2 2 length=100
NTTTCTTTGGACCTGTCTTACCTTGAGGATCATATGGAAGCATAATCTTCACTTTGATCCCAAGAACACCTTGTCGAAGAAGCACATGACGGGTTGCAGT
+SRR21820251.2 2 length=100
#FFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,FFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@SRR21820251.3 3 length=100
NGCATGTATTTCATTATACTTTTCACTTGTATCAAAGGTAGCTCTTCGTTCAGCGACGAACATTTTCCTGCTTCCATTAACATCTTCATTTGTAGCCTGT
umassemb$ head out_SRR21820251_2.fastq
@SRR21820251.1 1 length=100
ATGTACTGACGTCTGGCTAAGGAGAGAAAGGGTGAGAGCGTATACAATCAATTAGTTTTTTAGTTTACGTATGCTATATATACACACACGTACACACACG
+SRR21820251.1 1 length=100
FFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF:FF:FFFF:FFFFFFFFFFFFF
@SRR21820251.2 2 length=100
TGTTGTTAGCGGTAAATTGCGTGGACAAAGAGCAAAATCAATGAAATTCGTTGATGGATTGATGATTCACTCTGGTGAACCCACCAATGAATATGTTAAT
+SRR21820251.2 2 length=100
FFF:FFFFFFFF,:FFFFFF:FFFFFFFFFFFFFFFFFFFFF,FFFFFFF,:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFF
@SRR21820251.3 3 length=100
CAACAGTTGTCGGCAGGTTCTTACGGATGTTCTACACATAGACTTATACCAAGGAAACAGGCTACAAATGAAGATGTTAATGGAAGCAGGAAAATGTTCG
Data is from NCBI - RNA-seq of megachile rotundata
This could be an issue related with the data.
Info - the out_files are QC fastq files after runnning FASTP over the original.
Thank you,
Shashank
Hi @shashankpritam,
Yes. This accession SRR21820251 is paired-end derived RNA-seq data, I downloaded 'SRR21820251' it to see if there are any miRNAs in them and my grep search of most abundant miRNA let-7a is missing, which means there are no miRNAs in the data. miRge3.0 does not support paired-end reads, they need to be processed as single end (analyzing each end seperately). The reason is that, miRNAs are short sequences of ~25 nucleotides and paired-end experiments are not ideal.
Can you run the following accession with your pipeline, this is from the same source but specific to miRNA-sequencing:
SRR21851723
Having no miRNA reads has caused the issue. I hope this helps.
Thank you,
Arun.
Thank you very much @arunhpatil! Yes, it works. I got the output files with SRR21851723 -
annotation.report.csv annotation.report.html index_data.js mapped.csv miR.Counts.csv miRge3_visualization.html miR.RPM.csv run.log unmapped.csv
Thanks again for the help and guidance,
Shashank