sqanti3_rescue.py: TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

Question

sqanti3_rescue.py: TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

ChiaraCaprioli opened this issue 10 months ago · 35 comments

Hello,

Thank you for this great tool.
I am trying to run sqanti3_rescue.py

python $PATH_TOOLS/SQANTI3-5.2/sqanti3_rescue.py ml \
$PBS_O_WORKDIR/${sample}/isoform_annotated.filtered_MLresult_classification.txt \
--isoforms $PBS_O_WORKDIR/${sample}/isoform_annotated.filtered_corrected.fasta \
--gtf $PBS_O_WORKDIR/${sample}/isoform_annotated.filtered.filtered.gtf \
-g $PBS_O_WORKDIR/benchmarking/gtf/gencode.v45.annotation.gtf \
-k $PBS_O_WORKDIR/ref/gencode.v45.annotation_classification.txt \ 
--mode full \ 
-e all \
-o sqanti3_ml_rescue_output \
-d $PBS_O_WORKDIR/${sample} \
-r $PBS_O_WORKDIR/${sample}/randomforest.RData \
-j 0.7

and I am encountering the following error:

Rscript (R) version 4.3.1 (2023-06-16)
0.12.7
Traceback (most recent call last):
  File "/hpcnfs/data/PGP/ccaprioli/tools/SQANTI3-5.2/sqanti3_rescue.py", line 660, in <module>
    main()
  File "/hpcnfs/data/PGP/ccaprioli/tools/SQANTI3-5.2/sqanti3_rescue.py", line 517, in main
    if not os.path.isfile(args.refGenome):
  File "/hpcnfs/home/ieo4874/.conda/envs/SQANTI3.env/lib/python3.8/genericpath.py", line 30, in isfile
    st = os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

Do you have any suggestion on how to solve this?
Thank you,

C

Answer 1 · 2024-02-23T09:02:36.000Z

Hi,

Provide the full path to reference genome FASTA with the -f argument.

Alejandro.

Answer 2 · 2024-02-23T15:18:13.000Z

Hi Alejandro,
I get the same error with the rules mode, despite giving the full path. My command is as follows:

sqanti3_rescue.py rules
--isoforms ${OUTDIR}/corrected.fasta
--gtf ${OUTDIR}/filtered/filtered.gtf
--refGTF $REF_GTF
--refGenome $REF_FA
--refClassif ${OUTDIR}/classification.txt
--mode full
-o ds
-d ${OUTDIR}/rescued
${OUTDIR}/filtered/RulesFilter_result_classification.txt

I've also run the command directly on the commandling, using absolute paths but I get the same error. Any insights into what I might be missing?

Thanks

Answer 3 · 2024-02-24T16:47:33.000Z

Hi @sonalhenson,

If your error looks like this:

File "/home/apadepe/lr_pipelines/SQANTI3/sqanti3_rescue.py", line 660, in <module> main() File "/home/apadepe/lr_pipelines/SQANTI3/sqanti3_rescue.py", line 549, in main if not os.path.isfile(args.json): File "/home/apadepe/.conda/envs/sq3/lib/python3.10/genericpath.py", line 30, in isfile st = os.stat(path) TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

It is because you are missing the -j argument. This is the path to the rules filter in json format. If you used the default rules, you can find this file in utilities/filter/filter_default.json

Hope this fix you problem,
Alejandro.

Answer 4 · 2024-02-26T15:42:49.000Z

Hi @alexpan00,
That was exactly the error and your solution resolved it.

Much appreciate your very rapid assistance.

All best
Sonal

Answer 5 · 2024-05-19T16:43:02.000Z

Hi @alexpan00,

I'm having the same problem:

sqanti3_rescue.py ml MLfilter_output/${SP}_MLresult_classification.txt \
   -j 0.7 --isoforms $SP.SQANTI3qc_corrected.fasta \
   --gtf MLfilter_output/$SP.filtered.gtf \
   -g $GTF \
   --mode full \
   -f $ASSEMBLY \
   -o MLrescue_output \
   -r MLfilter_output/randomforest.RData

Traceback (most recent call last):
  File "/user/work/tk19812/software/SQANTI3-5.2.1/sqanti3_rescue.py", line 660, in <module>
    main()
  File "/user/work/tk19812/software/SQANTI3-5.2.1/sqanti3_rescue.py", line 521, in main
    if not os.path.isfile(args.refClassif):
  File "/user/work/tk19812/scWorkshop/miniforge3/envs/SQANTI3.env/lib/python3.10/genericpath.py", line 30, in isfile
    st = os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

I don't think I have to use a filter_default.json with the ml option.
Cheers
F

Answer 6 · 2024-05-19T17:24:00.000Z

hi @francicco,

you are missing the --refClassif parameter in your call to the rescue script.

Alejandro

Answer 7 · 2024-05-19T19:12:06.000Z

Hi @alexpan00,

thank you! How do I generate it? sqanti3_qc.py takes takes the isoforms (FASTA/FASTQ) or GTF format and the reference annotation. How do I run sqanti3_qc.py to run the refClassif file?

Cheers
F

Answer 8 · 2024-05-19T19:36:42.000Z

I tried one way... not sure if it was the best way, then I gave the classification file to sqanti3_rescue.py, and I've got this...

Rscript (R) version 4.3.1 (2023-06-16)
0.12.7
Output directory not defined. All the outputs will be stored at /user/work/tk19812/HeliconiniiProject/scRNA-IsoSeq/IsoQuant2.4.Hmel.PCGs/HmelIsoSeq/MLfilter_output directory

Automatic rescue run via the following command:

/user/work/tk19812/scWorkshop/miniforge3/envs/SQANTI3.env/bin/Rscript /user/work/tk19812/software/SQANTI3-5.2.1/utilities/rescue/automatic_rescue.R -c /user/work/tk19812/HeliconiniiProject/scRNA-IsoSeq/IsoQuant2.4.Hmel.PCGs/HmelIsoSeq/MLfilter_output/Hmel_MLresult_classification.txt -o MLrescue_output -d /user/work/tk19812/HeliconiniiProject/scRNA-IsoSeq/IsoQuant2.4.Hmel.PCGs/HmelIsoSeq/MLfilter_output -u /user/work/tk19812/software/SQANTI3-5.2.1/utilities   -g /user/work/tk19812/HeliconiniiProject/HeliconGenomeAlignmentAnnotation/UPDATEannotations/Hmel.v3.2.annotation.CAT.gtf -e all -m full

Loading required package: magrittr

---------------------------------------------------------------

		INITIATING SQANTI3 RESCUE...


---------------------------------------------------------------

	--mode full:

		Full rescue mode selected!


		Automatic rescue activated for artifact FSM transcripts.

		Additional rescue steps will be performed for ISM, NIC and NNC artifacts.


---------------------------------------------------------------

	READING FILTER CLASSIFICATION FILE...

Rows: 244753 Columns: 53
── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (16): isoform, chrom, strand, structural_category, associated_gene, asso...
dbl (21): length, exons, ref_length, ref_exons, diff_to_TSS, diff_to_TTS, di...
lgl (16): RTS_stage, FL, n_indels, n_indels_junc, bite, iso_exp, gene_exp, r...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

---------------------------------------------------------------

---------------------------------------------------------------

	PERFORMING AUTOMATIC RESCUE...


---------------------------------------------------------------

	***NOTE: you have set -e all:

		All mono-exonic artifact transcripts will be considered for rescue.

	Rescuing references associated to mono-exon FSM...

	Including mono-exon ISM as rescue candidates...

	Finding FSM-supported reference transcripts lost after filtering...
Error in `dplyr::filter()`:
ℹ In argument: `isoform %in% classif_ism_fsm$isoform`.
Caused by error:
! object 'isoform' not found
Backtrace:
     ▆
  1. ├─rescue %>% ...
  2. ├─dplyr::filter(., isoform %in% classif_ism_fsm$isoform)
  3. ├─dplyr:::filter.data.frame(., isoform %in% classif_ism_fsm$isoform)
  4. │ └─dplyr:::filter_rows(.data, dots, by)
  5. │   └─dplyr:::filter_eval(...)
  6. │     ├─base::withCallingHandlers(...)
  7. │     └─mask$eval_all_filter(dots, env_filter)
  8. │       └─dplyr (local) eval()
  9. ├─isoform %in% classif_ism_fsm$isoform
 10. └─base::.handleSimpleError(...)
 11.   └─dplyr (local) h(simpleError(msg, call))
 12.     └─rlang::abort(message, class = error_class, parent = parent, call = error_call)
Execution halted
Traceback (most recent call last):
  File "/user/work/tk19812/software/SQANTI3-5.2.1/sqanti3_rescue.py", line 660, in <module>
    main()
  File "/user/work/tk19812/software/SQANTI3-5.2.1/sqanti3_rescue.py", line 557, in main
    auto_result = run_automatic_rescue(args)
  File "/user/work/tk19812/software/SQANTI3-5.2.1/sqanti3_rescue.py", line 59, in run_automatic_rescue
    if subprocess.check_call(auto_cmd, shell = True) != 0:
  File "/user/work/tk19812/scWorkshop/miniforge3/envs/SQANTI3.env/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '/user/work/tk19812/scWorkshop/miniforge3/envs/SQANTI3.env/bin/Rscript /user/work/tk19812/software/SQANTI3-5.2.1/utilities/rescue/automatic_rescue.R -c /user/work/tk19812/HeliconiniiProject/scRNA-IsoSeq/IsoQuant2.4.Hmel.PCGs/HmelIsoSeq/MLfilter_output/Hmel_MLresult_classification.txt -o MLrescue_output -d /user/work/tk19812/HeliconiniiProject/scRNA-IsoSeq/IsoQuant2.4.Hmel.PCGs/HmelIsoSeq/MLfilter_output -u /user/work/tk19812/software/SQANTI3-5.2.1/utilities   -g /user/work/tk19812/HeliconiniiProject/HeliconGenomeAlignmentAnnotation/UPDATEannotations/Hmel.v3.2.annotation.CAT.gtf -e all -m full' returned non-zero exit status 1.

Not sure what happened...
Thank you for your help
Cheers
F

Answer 9 · 2024-05-20T08:18:35.000Z

Hi @francicco ,

You generate the reference classification running the sqanti3_qc script using your referenceGTF as isoforms and reference. The idea is that you use the same orthogonal data (if you have included any) that you used to run your transcriptome.

You can find more information in this discussion and in the wiki.

Alejandro

Answer 10 · 2024-05-20T08:26:44.000Z

Ok, I did right then! But I still have that error during rescue...
and I don't know why
F

Answer 11 · 2024-05-20T15:39:08.000Z

I've found the bug! The classification file from SQANTI3_filter.py has Isoform instead of isoform.
I edit it and now it runs.

I'll let you know if I find any other bug.

Cheers
F

Answer 12 · 2024-10-16T07:28:18.000Z

Hi, @alexpan00 , I had a similar question
I am trying to run sqanti3_rescue.py rules when i have ln -s some documents related with parameters

/opt/software/SQANTI3-5.2.2/sqanti3_rescue.py rules \
/home/dell/Public/01_genome/sp/03.Iso_seq/ccs_bam/SQANTI/filter/rules/sp_RulesFilter_result_classification.txt \
--isoforms sp_corrected.fasta \
--gtf sp.rules.filtered.gtf \
-g sp_std.gtf -f spsm.fasta \
--refClassif sp_classification.txt \
--mode full \
-j /opt/software/SQANTI3-5.2.2/utilities/filter/filter_default.json \
-d rules

and I am encountering the following error:

/opt/software/SQANTI3-5.2.2/sqanti3_rescue.py:17: DeprecationWarning: Use shutil.which instead of find_executable
  Rscript_path = distutils.spawn.find_executable('Rscript')
/opt/software/SQANTI3-5.2.2/sqanti3_rescue.py:18: DeprecationWarning: Use shutil.which instead of find_executable
  gffread_path = distutils.spawn.find_executable('gffread')
/opt/software/SQANTI3-5.2.2/sqanti3_rescue.py:19: DeprecationWarning: Use shutil.which instead of find_executable
  python_path = distutils.spawn.find_executable('python')
Rscript (R) version 4.3.3 (2024-02-29)
0.12.7
Traceback (most recent call last):
  File "/opt/software/SQANTI3-5.2.2/sqanti3_rescue.py", line 660, in <module>
    main()
  File "/opt/software/SQANTI3-5.2.2/sqanti3_rescue.py", line 534, in main
    args.output=args.sqanti_filter_classif[args.sqanti_filter_classif.rfind("/")+1:args.sqanti_filter_classif("_classification.txt")]
                                                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'str' object is not callabl

The chatgpt suggested that args.sqanti_filter_classif were changed into args.sqanti_filter_classif.find.
It is right. The problem was solved.
I just want to share my situation, although I don't know if this stuff is helpful for further improvement of this software or not.

Answer 13 · 2024-10-16T08:10:52.000Z

Hi @Xueliang24,

Thanks for sharing your experience and solution. It will certainly help improve the software and prevent this kind of error.

Alejandro

Answer 14 · 2024-10-16T08:28:02.000Z

Hi @Xueliang24,

Thanks for sharing your experience and solution. It will certainly help improve the software and prevent this kind of error.

Alejandro

But I met another problem when I ran sqanti3_rescue.py ml

/opt/software/SQANTI3-5.2.2/sqanti3_rescue.py rules  \
/home/dell/Public/01_genome/sp/03.Iso_seq/ccs_bam/SQANTI/filter/rules/sp_RulesFilter_result_classification.txt \
--isoforms sp_corrected.fasta \
--gtf sp.rules.filtered.gtf \
-g sp_std.gtf \
-f spchrsm.fasta \
-k sp_classification.txt \
--mode full \
-j 0.7 \
-d rules

and I am encountering the following error:

 Running random forest classifier on reference transcriptome...

Error in predict.randomForest(modelFit, newdata, type = "prob") :
  missing values in newdata
Calls: predict ... probFunction -> <Anonymous> -> predict -> predict.randomForest
Execution halted
Traceback (most recent call last):
  File "/opt/software/SQANTI3-5.2.2/sqanti3_rescue.py", line 660, in <module>
    main()
  File "/opt/software/SQANTI3-5.2.2/sqanti3_rescue.py", line 582, in main
    rescued = run_ML_rescue(args)
              ^^^^^^^^^^^^^^^^^^^
  File "/opt/software/SQANTI3-5.2.2/sqanti3_rescue.py", line 304, in run_ML_rescue
    if subprocess.check_call(refML_cmd, shell = True) != 0:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dell/anaconda3/envs/SQANTI3.env/lib/python3.11/subprocess.py", line 413, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '/home/dell/anaconda3/envs/SQANTI3.env/bin/Rscript /opt/software/SQANTI3-5.2.2/utilities/rescue/run_randomforest_on_reference.R -c sp_classification.txt -o sp_MLresult -d ml -r /home/dell/Public/01_genome/sp/03.Iso_seq/ccs_bam/SQANTI/filter/ml/randomforest.RData' returned non-zero exit status 1.

The sp_classification.txt and -j were the same from QC and ml filter. And I also check the sp_classification.txt by any() in R.
Now I want to debug R Scripts /opt/software/SQANTI3-5.2.2/utilities/rescue/run_randomforest_on_reference.R
or Could you give me some advices

Answer 15 · 2024-10-16T08:51:09.000Z

Sorry, I think you have pasted the command for the rules rescue. Just before the "Running random forest classifier on reference transcriptome" that you have pasted you should have a "Column-level NA check:" message. Are there any colnames after this message?

Answer 16 · 2024-10-16T09:01:45.000Z

Sorry, I think you have pasted the command for the rules rescue. Just before the "Running random forest classifier on reference transcriptome" that you have pasted you should have a "Column-level NA check:" message. Are there any colnames after this message?

I want to tell you the debug result:

Error in `[.data.frame`(classification, , model_cols) :
  undefined columns selected
Calls: [ -> [.data.frame

Does this mean that there is a problem with the column names of the classification.txt file generated by QC, but I'm using the same classification.txt file for -k in rules and ml.

Answer 17 · 2024-10-16T09:07:34.000Z

Running random forest classifier

The information you want to know

Loading required package: magrittr

        Validating columns used in prediction...

        Column-level NA check:
               length                 exons             RTS_stage
                FALSE                 FALSE                 FALSE
       min_sample_cov               min_cov                sd_cov
                FALSE                 FALSE                 FALSE
                   FL                  bite             FSM_class
                FALSE                 FALSE                 FALSE
               coding         predicted_NMD perc_A_downstream_TTS
                FALSE                 FALSE                 FALSE
            ratio_TSS
                 TRUE

        Column type check:
               length                 exons             RTS_stage
            "integer"             "integer"             "logical"
       min_sample_cov               min_cov                sd_cov
            "integer"             "integer"             "numeric"
                   FL                  bite             FSM_class
            "integer"              "factor"              "factor"
               coding         predicted_NMD perc_A_downstream_TTS
             "factor"              "factor"             "numeric"
            ratio_TSS
            "numeric"

        Running random forest classifier on reference transcriptome...

Answer 18 · 2024-10-16T09:19:44.000Z

Thanks, as you can see the ratio_TSS column has NA values. As a quick fix, my suggestion would be that you replace those NA values with 1 in the reference classification before you run the rescue script. This is what the first part of the script is supposed to do, but I am not sure why it is not working for you. If you could share the sp_classification.txt file with me, so I can easily reproduce the error that would be very helpful.

Answer 19 · 2024-10-16T09:55:38.000Z

Thanks, as you can see the ratio_TSS column has NA values. As a quick fix, my suggestion would be that you replace those NA values with 1 in the reference classification before you run the rescue script. This is what the first part of the script is supposed to do, but I am not sure why it is not working for you. If you could share the sp_classification.txt file with me, so I can easily reproduce the error that would be very helpful.

Yeah，some rows in the ratio_TSS column has NA values. I shared part of my classification.txt with you. I also found that NA values only exited in the contig level not in chr level.
part_classification.txt

Thanks!

Answer 20 · 2024-10-21T08:43:14.000Z

Hello @Xueliang24 and sorry for the delay in the answer,

First of all, the classification file provided to the SQANTI3 rescue with the -k argument should be generated running the reference annotation against itself with the sqanti3_qc script, including the same orthogonal data, i.e. illumina short-reads, cage, .... The file that you provided seems to be the one that you generated before the filtering step for your transcriptome against the reference.

On the other hand, it is normal that the classification has NA values in the TSS_ratio column. If the contigs are too small it is possible that there are no enough bases before the TSS of the gene to calculate the ratio. However, the thing is that the rescue script should handle the NA values, in particular, in your case the script that is crashing is SQANTI3/utilities/rescue/run_randomforest_on_reference.R. I have run the script until the part that the NA values are replaced and it has worked for me.

Alejandro.

Answer 21 · 2024-10-21T11:27:39.000Z

Thanks for your response！

Answer 22 · 2024-10-24T19:09:52.000Z

I have also encountered the following problems. I have checked the code and found no problems

/xxx/SQANTI3-5.2.2/sqanti3_rescue.py:17: DeprecationWarning: Use shutil.which instead of find_executable Rscript_path = distutils.spawn.find_executable('Rscript') /xxx/SQANTI3-5.2.2/sqanti3_rescue.py:18: DeprecationWarning: Use shutil.which instead of find_executable gffread_path = distutils.spawn.find_executable('gffread') /xxx/SQANTI3-5.2.2/sqanti3_rescue.py:19: DeprecationWarning: Use shutil.which instead of find_executable python_path = distutils.spawn.find_executable('python') Traceback (most recent call last): File "/xxx/SQANTI3-5.2.2/sqanti3_rescue.py", line 660, in <module> main() File "/xxx/SQANTI3-5.2.2/sqanti3_rescue.py", line 539, in main if not os.path.isfile(args.randomforest): File "/home/xxx/anaconda3/envs/SQANTI3.env/lib/python3.8/genericpath.py", line 30, in isfile st = os.stat(path) TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

Here is my code, thank you

python /xxx/SQANTI3-5.2.2/sqanti3_rescue.py ml --isoforms /xxx/Sample_corrected.fasta --gtf /xxx/Sample.filtered.gtf --refGTF /xxx/gencode.v46.annotation.gtf --refGenome /xxx/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa --refClassif /xxx/GCA_000001405.15_GRCh38_no_alt_classification.txt --output Sample --dir /xxx/rescue_ml --threshold 0.7 /xxx/Sample_MLresult_classification.txt

Answer 23 · 2024-10-25T02:48:52.000Z

hello, you may lost the parameter -r. Then, you should provide the randomforest.RData by the step sqanti3_filter ml

Answer 24 · 2024-10-27T06:11:15.000Z

hello, you may lost the parameter -r. Then, you should provide the randomforest.RData by the step sqanti3_filter ml

Thanks,it works!!!!

Answer 25 · 2024-11-14T02:43:21.000Z

When run sqanti.py rescue, the -k (--refClassif) parameter exists, requiring input of the reference SQANTi3 QC result file. For the production of this file, the following three methods， which is right？ the code is as follows:

according to the transcript sequence
python /PATH/SQANTI3-5.2.2/sqanti3_qc.py \ /PATH/GRCh38/genecode/gencode.v47.transcripts.fa \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/GCA_000001405.15_GRCh38_no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ --CAGE_peak /PATH/polyA_info/human.refTSS_v3.1.hg38.sorted.bed \ --polyA_motif_list /PATH/polyA_info/mouse_and_human.polyA_motif.txt \ -o GCA_000001405.15_GRCh38_no_alt \ -d /PATH/SQANTi3_ref/GRCh38 \ --fasta \ --force_id_ignore \ --cpus 20 --report html

according to the transcript annotation file
python /PATH/SQANTI3-5.2.2/sqanti3_qc.py \ /PATH/GRCh38/genecode/gencode.v47.transcripts.gtf \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/GCA_000001405.15_GRCh38_no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ --CAGE_peak /PATH/polyA_info/human.refTSS_v3.1.hg38.sorted.bed \ --polyA_motif_list /PATH/polyA_info/mouse_and_human.polyA_motif.txt \ -o GCA_000001405.15_GRCh38_no_alt \ -d /PATH/SQANTi3_ref/GRCh38 \ --fasta \ --force_id_ignore \ --cpus 20 --report html

according to the genome annotation file
python /PATH/SQANTI3-5.2.2/sqanti3_qc.py \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/GCA_000001405.15_GRCh38_no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ --CAGE_peak /PATH/polyA_info/human.refTSS_v3.1.hg38.sorted.bed \ --polyA_motif_list /PATH/polyA_info/mouse_and_human.polyA_motif.txt \ -o GCA_000001405.15_GRCh38_no_alt \ -d /PATH/SQANTi3_ref/GRCh38 \ --fasta \ --force_id_ignore \ --cpus 20 --report html

which one is true?

Answer 26 · 2024-11-14T04:15:18.000Z

The script needed transcript annotation file which produced by isoseq data.
After all, it targets full-length transcriptome data.

Answer 27 · 2024-11-14T05:55:24.000Z

The script needed transcript annotation file which produced by isoseq data. After all, it targets full-length transcriptome data.

thank u! u means each sample need to produce a REFCLASSIF file? when create REFCLASSIF file, the input file is this sample transcription gtf file produced by SQANTi3 QC ?

Answer 28 · 2024-11-14T06:16:17.000Z

The script needed transcript annotation file which produced by isoseq data. After all, it targets full-length transcriptome data.

thank u! u means each sample need to produce a REFCLASSIF file? when create REFCLASSIF file, the input file is this sample transcription gtf file produced by SQANTi3 QC ?

If you had many sample isoseq data, you get many bam files responsed to each isoseq data subreads bam.Then, it had been merged in the step isoseq refine, the code like

#many samples
 # Combine inputs
ls UHRR.IsoSeqX*bam > all.fofn
cat all.fofn

UHRR.IsoSeqX_bc01_5p--IsoSeqX_3p.bam
UHRR.IsoSeqX_bc02_5p--IsoSeqX_3p.bam
UHRR.IsoSeqX_bc03_5p--IsoSeqX_3p.bam
UHRR.IsoSeqX_bc04_5p--IsoSeqX_3p.bam
UHRR.IsoSeqX_bc05_5p--IsoSeqX_3p.bam
UHRR.IsoSeqX_bc06_5p--IsoSeqX_3p.bam
UHRR.IsoSeqX_bc07_5p--IsoSeqX_3p.bam
UHRR.IsoSeqX_bc08_5p--IsoSeqX_3p.bam
UHRR.IsoSeqX_bc09_5p--IsoSeqX_3p.bam
UHRR.IsoSeqX_bc10_5p--IsoSeqX_3p.bam
UHRR.IsoSeqX_bc11_5p--IsoSeqX_3p.bam
UHRR.IsoSeqX_bc12_5p--IsoSeqX_3p.bam

# Remove poly(A) tails and concatemer
$ isoseq refine all.fofn IsoSeq_v2_primers_12.fasta UHRR.flnc.bam --require-polya
#--require-polya parameter depends on your sequencing method

Answer 29 · 2024-11-14T08:24:31.000Z

isoseq refine
Do you mean that if I test different samples separately, if I want to understand the overall transcript characteristics of these samples, I need to combine the flnc.bam of these samples into one file, and then conduct subsequent isoseq culster, pbmm2, isoseq collapsed and SQANTi3?

Answer 30 · 2024-11-14T08:51:27.000Z

yes, you can get it from https://github.com/PacificBiosciences/IsoSeq/blob/master/isoseq-clustering.md

Answer 31 · 2024-11-14T09:01:29.000Z

yes, you can get it from https://github.com/PacificBiosciences/IsoSeq/blob/master/isoseq-clustering.md

Thank you. I understand that the process is to look at the global transcript of these samples. If I want to look at the transcript characteristics of each sample separately, should I run Isoseq workflow and SQANTi separately for each sample? If each sample is run separately, does REFclassication.txt required for SQANTi3 rescue require a file for each sample?

Answer 32 · 2024-11-14T10:08:10.000Z

Maybe.

yes, you can get it from https://github.com/PacificBiosciences/IsoSeq/blob/master/isoseq-clustering.md

Thank you. I understand that the process is to look at the global transcript of these samples. If I want to look at the transcript characteristics of each sample separately, should I run Isoseq workflow and SQANTi separately for each sample? If each sample is run separately, does REFclassication.txt required for SQANTi3 rescue require a file for each sample?

Answer 33 · 2024-11-14T10:23:17.000Z

You need to provide both as isoforms and reference the annotation that you used as reference to run your long read samples. Additionally, you should provide the same orthogonal data (short-reads, CAGE, polyA,...). So the third option.

When run sqanti.py rescue, the -k (--refClassif) parameter exists, requiring input of the reference SQANTi3 QC result file. For the production of this file, the following three methods， which is right？ the code is as follows:

according to the transcript sequence python /PATH/SQANTI3-5.2.2/sqanti3_qc.py \ /PATH/GRCh38/genecode/gencode.v47.transcripts.fa \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/GCA_000001405.15_GRCh38_no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ --CAGE_peak /PATH/polyA_info/human.refTSS_v3.1.hg38.sorted.bed \ --polyA_motif_list /PATH/polyA_info/mouse_and_human.polyA_motif.txt \ -o GCA_000001405.15_GRCh38_no_alt \ -d /PATH/SQANTi3_ref/GRCh38 \ --fasta \ --force_id_ignore \ --cpus 20 --report html

according to the transcript annotation file python /PATH/SQANTI3-5.2.2/sqanti3_qc.py \ /PATH/GRCh38/genecode/gencode.v47.transcripts.gtf \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/GCA_000001405.15_GRCh38_no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ --CAGE_peak /PATH/polyA_info/human.refTSS_v3.1.hg38.sorted.bed \ --polyA_motif_list /PATH/polyA_info/mouse_and_human.polyA_motif.txt \ -o GCA_000001405.15_GRCh38_no_alt \ -d /PATH/SQANTi3_ref/GRCh38 \ --fasta \ --force_id_ignore \ --cpus 20 --report html

according to the genome annotation file python /PATH/SQANTI3-5.2.2/sqanti3_qc.py \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/GCA_000001405.15_GRCh38_no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ --CAGE_peak /PATH/polyA_info/human.refTSS_v3.1.hg38.sorted.bed \ --polyA_motif_list /PATH/polyA_info/mouse_and_human.polyA_motif.txt \ -o GCA_000001405.15_GRCh38_no_alt \ -d /PATH/SQANTi3_ref/GRCh38 \ --fasta \ --force_id_ignore \ --cpus 20 --report html

which one is true?

Answer 34 · 2024-11-14T12:18:43.000Z

You need to provide both as isoforms and reference the annotation that you used as reference to run your long read samples. Additionally, you should provide the same orthogonal data (short-reads, CAGE, polyA,...). So the third option.

When run sqanti.py rescue, the -k (--refClassif) parameter exists, requiring input of the reference SQANTi3 QC result file. For the production of this file, the following three methods， which is right？ the code is as follows:
according to the transcript sequence python /PATH/SQANTI3-5.2.2/sqanti3_qc.py \ /PATH/GRCh38/genecode/gencode.v47.transcripts.fa \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/GCA_000001405.15_GRCh38_no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ --CAGE_peak /PATH/polyA_info/human.refTSS_v3.1.hg38.sorted.bed \ --polyA_motif_list /PATH/polyA_info/mouse_and_human.polyA_motif.txt \ -o GCA_000001405.15_GRCh38_no_alt \ -d /PATH/SQANTi3_ref/GRCh38 \ --fasta \ --force_id_ignore \ --cpus 20 --report html
according to the transcript annotation file python /PATH/SQANTI3-5.2.2/sqanti3_qc.py \ /PATH/GRCh38/genecode/gencode.v47.transcripts.gtf \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/GCA_000001405.15_GRCh38_no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ --CAGE_peak /PATH/polyA_info/human.refTSS_v3.1.hg38.sorted.bed \ --polyA_motif_list /PATH/polyA_info/mouse_and_human.polyA_motif.txt \ -o GCA_000001405.15_GRCh38_no_alt \ -d /PATH/SQANTi3_ref/GRCh38 \ --fasta \ --force_id_ignore \ --cpus 20 --report html
according to the genome annotation file python /PATH/SQANTI3-5.2.2/sqanti3_qc.py \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/GCA_000001405.15_GRCh38_no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ --CAGE_peak /PATH/polyA_info/human.refTSS_v3.1.hg38.sorted.bed \ --polyA_motif_list /PATH/polyA_info/mouse_and_human.polyA_motif.txt \ -o GCA_000001405.15_GRCh38_no_alt \ -d /PATH/SQANTi3_ref/GRCh38 \ --fasta \ --force_id_ignore \ --cpus 20 --report html
which one is true?

thank you. like this?

python /PATH/SQANTI3-5.2.2/sqanti3_qc.py \ /PATH/sample_A/sqanti/sample_A.GRCh38_corrected.gtf \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/GCA_000001405.15_GRCh38_no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ --CAGE_peak /PATH/polyA_info/human.refTSS_v3.1.hg38.sorted.bed \ --polyA_motif_list /PATH/polyA_info/mouse_and_human.polyA_motif.txt \ -o GCA_000001405.15_GRCh38_no_alt \ -d /PATH/SQANTi3_ref/GRCh38 \ --fasta \ --force_id_ignore \ --cpus 20 --report html

sample_A.GRCh38_corrected.gtf is produced by sqanti.py qc, input file is sample_A.collapsed.gff

Answer 35 · 2024-11-15T09:43:57.000Z

No, like what you had in the third option with gencode.v46.annotation.gtf both as input and reference. And you should't use the --fasta flag since you are stating from a gtf file.