Error generating Kallisto index
Upendra19993 opened this issue · 3 comments
Hi,
I want to run sqanti3 for my dataset. But to get familiar with the tool, I first tried the tool with the example dataset you have provided. I ran sqanti3 quality control step and I am getting an error. The whole message I get is as below.
The command I used is: sqanti3_qc.py UHR_chr22.gtf gencode.v38.basic_chr22.gtf GRCh38.p13_chr22.fasta -o UHR_chr22 -d SQANTI3_output --short_reads UHR_chr22_short_reads.fofn --cpus 4 --report both
The progress of the job and error messages are as below.
(base) [uqwwijes@bun025 SQANTI3_reinstallation_2]$ sqanti3_qc.py UHR_chr22.gtf gencode.v38.basic_chr22.gtf GRCh38.p13_chr22.fasta -o UHR_chr22 -d SQANTI3_output --short_reads UHR_chr22_short_reads.fofn --cpus 4 --report both
Rscript (R) version 4.3.1 (2023-06-16)
ERROR: genome fasta /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_reinstallation_2/GRCh38.p13_chr22.fasta doesn't exist. Abort!
(base) [uqwwijes@bun025 SQANTI3_reinstallation_2]$ cd ..
(base) [uqwwijes@bun025 Exampla_data]$ sqanti3_qc.py UHR_chr22.gtf gencode.v38.basic_chr22.gtf GRCh38.p13_chr22.fasta -o UHR_chr22 -d SQANTI3_output --short_reads UHR_chr22_short_reads.fofn --cpus 4 --report both
Rscript (R) version 4.3.1 (2023-06-16)
Write arguments to /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/UHR_chr22.params.txt...
**** Running SQANTI3...
**** Parsing provided files....
Reading genome fasta /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/GRCh38.p13_chr22.fasta....
Skipping aligning of sequences because GTF file was provided.
Indels will be not calculated since you ran SQANTI3 without alignment step (SQANTI3 with gtf format as transcriptome input).
**** Predicting ORF sequences...
**** Parsing Reference Transcriptome....
**** Parsing Isoforms....
**** Running STAR for calculating Short-Read Coverage.
START running STAR...
Running indexing...
/sw/local/rocky8/noarch/qcif/software/miniconda3/envs/sqanti3_5.2/bin/STAR-avx2 --runThreadN 4 --runMode genomeGenerate --genomeDir /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_index/ --genomeFastaFiles /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/GRCh38.p13_chr22.fasta --outTmpDir /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_index//_STARtmp/
STAR version: 2.7.11b compiled: 2024-01-29T15:15:38+0000 :/opt/conda/conda-bld/star_1706541070242/work/source
Feb 27 12:20:52 ..... started STAR run
Feb 27 12:20:52 ... starting to generate Genome files
!!!!! WARNING: --genomeSAindexNbases 14 is too large for the genome size=50818468, which may cause seg-fault at the mapping step. Re-run genome generation with recommended --genomeSAindexNbases 11
Feb 27 12:20:53 ... starting to sort Suffix Array. This may take a long time...
Feb 27 12:20:53 ... sorting Suffix Array chunks and saving them to disk...
Feb 27 12:21:02 ... loading chunks from disk, packing SA...
Feb 27 12:21:02 ... finished generating suffix array
Feb 27 12:21:02 ... generating Suffix Array index
Feb 27 12:21:11 ... completed Suffix Array index
Feb 27 12:21:11 ... writing Genome to disk ...
Feb 27 12:21:11 ... writing Suffix Array to disk ...
Feb 27 12:21:11 ... writing SAindex to disk
Feb 27 12:21:11 ..... finished successfully
Indexing done.
Mapping for UHR_Rep1_chr22.R1 : in progress...
Mapping for UHR_Rep1_chr22.R1 : done.
/sw/local/rocky8/noarch/qcif/software/miniconda3/envs/sqanti3_5.2/bin/STAR-avx2 --runThreadN 4 --genomeDir /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_index/ --readFilesIn /scratch/project/qaafi-cnafs/upendra/Sqanti3/Example/UHR_Rep1_chr22.R1.fastq /scratch/project/qaafi-cnafs/upendra/Sqanti3/Example/UHR_Rep1_chr22.R2.fastq --outFileNamePrefix /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_mapping/UHR_Rep1_chr22.R1 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterType BySJout --outSAMunmapped Within --outFilterMultimapNmax 20 --outFilterMismatchNoverLmax 0.04 --outFilterMismatchNmax 999 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --sjdbScore 1 --genomeLoad NoSharedMemory --outSAMtype BAM SortedByCoordinate --twopassMode Basic
STAR version: 2.7.11b compiled: 2024-01-29T15:15:38+0000 :/opt/conda/conda-bld/star_1706541070242/work/source
Feb 27 12:21:12 ..... started STAR run
Feb 27 12:21:12 ..... loading genome
Feb 27 12:21:12 ..... started 1st pass mapping
Feb 27 12:21:36 ..... finished 1st pass mapping
Feb 27 12:21:36 ..... inserting junctions into the genome indices
Feb 27 12:21:44 ..... started mapping
Feb 27 12:22:09 ..... finished mapping
Feb 27 12:22:09 ..... started sorting BAM
Feb 27 12:22:09 ..... finished successfully
Mapping for UHR_Rep2_chr22.R1 : in progress...
Mapping for UHR_Rep2_chr22.R1 : done.
/sw/local/rocky8/noarch/qcif/software/miniconda3/envs/sqanti3_5.2/bin/STAR-avx2 --runThreadN 4 --genomeDir /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_index/ --readFilesIn /scratch/project/qaafi-cnafs/upendra/Sqanti3/Example/UHR_Rep2_chr22.R1.fastq /scratch/project/qaafi-cnafs/upendra/Sqanti3/Example/UHR_Rep2_chr22.R2.fastq --outFileNamePrefix /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_mapping/UHR_Rep2_chr22.R1 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterType BySJout --outSAMunmapped Within --outFilterMultimapNmax 20 --outFilterMismatchNoverLmax 0.04 --outFilterMismatchNmax 999 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --sjdbScore 1 --genomeLoad NoSharedMemory --outSAMtype BAM SortedByCoordinate --twopassMode Basic
STAR version: 2.7.11b compiled: 2024-01-29T15:15:38+0000 :/opt/conda/conda-bld/star_1706541070242/work/source
Feb 27 12:22:10 ..... started STAR run
Feb 27 12:22:10 ..... loading genome
Feb 27 12:22:10 ..... started 1st pass mapping
Feb 27 12:22:28 ..... finished 1st pass mapping
Feb 27 12:22:28 ..... inserting junctions into the genome indices
Feb 27 12:22:36 ..... started mapping
Feb 27 12:22:55 ..... finished mapping
Feb 27 12:22:55 ..... started sorting BAM
Feb 27 12:22:55 ..... finished successfully
Input pattern: /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_mapping/.
The following files found and to be read as junctions:
/scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_mapping/UHR_Rep2_chr22.R1SJ.out.tab
/scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_mapping/UHR_Rep1_chr22.R1SJ.out.tab
6762 junctions read. 2 junctions added to both strands because no strand information from STAR.
Running calculation of TSS ratio
BAM files identified: ['/scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_mapping//UHR_Rep1_chr22.R1Aligned.sortedByCoord.out.bam', '/scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/STAR_mapping//UHR_Rep2_chr22.R1Aligned.sortedByCoord.out.bam']
Temp files removed.
**** Performing Classification of Isoforms....
Number of classified isoforms: 3925
**** RT-switching computation....
Full-length read abundance files not provided.
**** Adding TSS ratio data... ****
**** Running Kallisto to calculate isoform expressions.
Running kallisto index /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/kallisto_output/kallisto_corrected_fasta.idx using as reference /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/UHR_chr22_corrected.fasta
**Running Kallisto quantification for UHR_Rep1_chr22.R1 sample
Error: kallisto index file not found /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/kallisto_output/kallisto_corrected_fasta.idx
Usage: kallisto quant [arguments] FASTQ-files
Required arguments:
-i, --index=STRING Filename for the kallisto index to be used for
quantification
-o, --output-dir=STRING Directory to write output to
Optional arguments:
-b, --bootstrap-samples=INT Number of bootstrap samples (default: 0)
--seed=INT Seed for the bootstrap sampling (default: 42)
--plaintext Output plaintext instead of HDF5
--single Quantify single-end reads
--single-overhang Include reads where unobserved rest of fragment is
predicted to lie outside a transcript
--fr-stranded Strand specific reads, first read forward
--rf-stranded Strand specific reads, first read reverse
-l, --fragment-length=DOUBLE Estimated average fragment length
-s, --sd=DOUBLE Estimated standard deviation of fragment length
(default: -l, -s values are estimated from paired
end data, but are required when using --single)
-t, --threads=INT Number of threads to use (default: 1)
--verbose Print out progress information every 1M proccessed reads
Running Kallisto quantification for UHR_Rep2_chr22.R1 sample
Error: kallisto index file not found /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/kallisto_output/kallisto_corrected_fasta.idx
Usage: kallisto quant [arguments] FASTQ-files
Required arguments:
-i, --index=STRING Filename for the kallisto index to be used for
quantification
-o, --output-dir=STRING Directory to write output to
Optional arguments:
-b, --bootstrap-samples=INT Number of bootstrap samples (default: 0)
--seed=INT Seed for the bootstrap sampling (default: 42)
--plaintext Output plaintext instead of HDF5
--single Quantify single-end reads
--single-overhang Include reads where unobserved rest of fragment is
predicted to lie outside a transcript
--fr-stranded Strand specific reads, first read forward
--rf-stranded Strand specific reads, first read reverse
-l, --fragment-length=DOUBLE Estimated average fragment length
-s, --sd=DOUBLE Estimated standard deviation of fragment length
(default: -l, -s values are estimated from paired
end data, but are required when using --single)
-t, --threads=INT Number of threads to use (default: 1)
--verbose Print out progress information every 1M proccessed reads
Traceback (most recent call last):
File "/sw/local/rocky8/noarch/qcif/software/SQANTI3-5.2/sqanti3_qc.py", line 2542, in
main()
File "/sw/local/rocky8/noarch/qcif/software/SQANTI3-5.2/sqanti3_qc.py", line 2525, in main
run(args)
File "/sw/local/rocky8/noarch/qcif/software/SQANTI3-5.2/sqanti3_qc.py", line 1978, in run
exp_dict = expression_parser(expression_files)
File "/sw/local/rocky8/noarch/qcif/software/SQANTI3-5.2/sqanti3_qc.py", line 806, in expression_parser
reader = DictReader(open(exp_file), delimiter='\t')
FileNotFoundError: [Errno 2] No such file or directory: '/scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/kallisto_output/UHR_Rep1_chr22.R1/abundance.tsv'**
Kindly let me know what I can do to fix this issue.
Many thanks,
Upendra.
Hi @Upendra19993, that's very unfortunate, it seems like your process ended because it's missing the kallisto index for quantification. This is the main problem that's making everything downstream fail:
Error: kallisto index file not found /scratch/project_mnt/S0030/upendra/Sqanti3/Exampla_data/SQANTI3_output/kallisto_output/kallisto_corrected_fasta.idx
Do you have Kallisto installed? Can you try indexing the Kallisto fasta file?
I think there is a problem with latest version of kallisto...
I had a similar problem with kallisto v0.50.1. I downgraded by doing: conda install "bioconda::kallisto<0.50.1"
and I could make the kallisto index.
Hi Carolinamonzó and eprdz,
I had installed Kallisto but had this issue. Then I also tried with a different version (kallisto0.48.0) and it worked and got the results without any error. Thank you both of you!