ConesaLab/SQANTI3

sqanti3_rescue.py: TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

ChiaraCaprioli opened this issue Β· 35 comments

Hello,

Thank you for this great tool.
I am trying to run sqanti3_rescue.py

python $PATH_TOOLS/SQANTI3-5.2/sqanti3_rescue.py ml \
$PBS_O_WORKDIR/${sample}/isoform_annotated.filtered_MLresult_classification.txt \
--isoforms $PBS_O_WORKDIR/${sample}/isoform_annotated.filtered_corrected.fasta \
--gtf $PBS_O_WORKDIR/${sample}/isoform_annotated.filtered.filtered.gtf \
-g $PBS_O_WORKDIR/benchmarking/gtf/gencode.v45.annotation.gtf \
-k $PBS_O_WORKDIR/ref/gencode.v45.annotation_classification.txt \ 
--mode full \ 
-e all \
-o sqanti3_ml_rescue_output \
-d $PBS_O_WORKDIR/${sample} \
-r $PBS_O_WORKDIR/${sample}/randomforest.RData \
-j 0.7 

and I am encountering the following error:

Rscript (R) version 4.3.1 (2023-06-16)
0.12.7
Traceback (most recent call last):
  File "/hpcnfs/data/PGP/ccaprioli/tools/SQANTI3-5.2/sqanti3_rescue.py", line 660, in <module>
    main()
  File "/hpcnfs/data/PGP/ccaprioli/tools/SQANTI3-5.2/sqanti3_rescue.py", line 517, in main
    if not os.path.isfile(args.refGenome):
  File "/hpcnfs/home/ieo4874/.conda/envs/SQANTI3.env/lib/python3.8/genericpath.py", line 30, in isfile
    st = os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

Do you have any suggestion on how to solve this?
Thank you,

C

Hi,

Provide the full path to reference genome FASTA with the -f argument.

Alejandro.

Hi Alejandro,
I get the same error with the rules mode, despite giving the full path. My command is as follows:

sqanti3_rescue.py rules
--isoforms ${OUTDIR}/corrected.fasta
--gtf ${OUTDIR}/filtered/filtered.gtf
--refGTF $REF_GTF
--refGenome $REF_FA
--refClassif ${OUTDIR}/classification.txt
--mode full
-o ds
-d ${OUTDIR}/rescued
${OUTDIR}/filtered/RulesFilter_result_classification.txt

I've also run the command directly on the commandling, using absolute paths but I get the same error. Any insights into what I might be missing?

Thanks

Hi @sonalhenson,

If your error looks like this:

File "/home/apadepe/lr_pipelines/SQANTI3/sqanti3_rescue.py", line 660, in <module> main() File "/home/apadepe/lr_pipelines/SQANTI3/sqanti3_rescue.py", line 549, in main if not os.path.isfile(args.json): File "/home/apadepe/.conda/envs/sq3/lib/python3.10/genericpath.py", line 30, in isfile st = os.stat(path) TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

It is because you are missing the -j argument. This is the path to the rules filter in json format. If you used the default rules, you can find this file in utilities/filter/filter_default.json

Hope this fix you problem,
Alejandro.

Hi @alexpan00,
That was exactly the error and your solution resolved it.

Much appreciate your very rapid assistance.

All best
Sonal

Hi @alexpan00,

I'm having the same problem:

sqanti3_rescue.py ml MLfilter_output/${SP}_MLresult_classification.txt \
   -j 0.7 --isoforms $SP.SQANTI3qc_corrected.fasta \
   --gtf MLfilter_output/$SP.filtered.gtf \
   -g $GTF \
   --mode full \
   -f $ASSEMBLY \
   -o MLrescue_output \
   -r MLfilter_output/randomforest.RData
Traceback (most recent call last):
  File "/user/work/tk19812/software/SQANTI3-5.2.1/sqanti3_rescue.py", line 660, in <module>
    main()
  File "/user/work/tk19812/software/SQANTI3-5.2.1/sqanti3_rescue.py", line 521, in main
    if not os.path.isfile(args.refClassif):
  File "/user/work/tk19812/scWorkshop/miniforge3/envs/SQANTI3.env/lib/python3.10/genericpath.py", line 30, in isfile
    st = os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

I don't think I have to use a filter_default.json with the ml option.
Cheers
F

hi @francicco,

you are missing the --refClassif parameter in your call to the rescue script.

Alejandro

Hi @alexpan00,

thank you! How do I generate it? sqanti3_qc.py takes takes the isoforms (FASTA/FASTQ) or GTF format and the reference annotation. How do I run sqanti3_qc.py to run the refClassif file?

Cheers
F

I tried one way... not sure if it was the best way, then I gave the classification file to sqanti3_rescue.py, and I've got this...

Rscript (R) version 4.3.1 (2023-06-16)
0.12.7
Output directory not defined. All the outputs will be stored at /user/work/tk19812/HeliconiniiProject/scRNA-IsoSeq/IsoQuant2.4.Hmel.PCGs/HmelIsoSeq/MLfilter_output directory

Automatic rescue run via the following command:

/user/work/tk19812/scWorkshop/miniforge3/envs/SQANTI3.env/bin/Rscript /user/work/tk19812/software/SQANTI3-5.2.1/utilities/rescue/automatic_rescue.R -c /user/work/tk19812/HeliconiniiProject/scRNA-IsoSeq/IsoQuant2.4.Hmel.PCGs/HmelIsoSeq/MLfilter_output/Hmel_MLresult_classification.txt -o MLrescue_output -d /user/work/tk19812/HeliconiniiProject/scRNA-IsoSeq/IsoQuant2.4.Hmel.PCGs/HmelIsoSeq/MLfilter_output -u /user/work/tk19812/software/SQANTI3-5.2.1/utilities   -g /user/work/tk19812/HeliconiniiProject/HeliconGenomeAlignmentAnnotation/UPDATEannotations/Hmel.v3.2.annotation.CAT.gtf -e all -m full

Loading required package: magrittr

---------------------------------------------------------------

		INITIATING SQANTI3 RESCUE...


---------------------------------------------------------------

	--mode full:

		Full rescue mode selected!


		Automatic rescue activated for artifact FSM transcripts.

		Additional rescue steps will be performed for ISM, NIC and NNC artifacts.


---------------------------------------------------------------

	READING FILTER CLASSIFICATION FILE...

Rows: 244753 Columns: 53
── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (16): isoform, chrom, strand, structural_category, associated_gene, asso...
dbl (21): length, exons, ref_length, ref_exons, diff_to_TSS, diff_to_TTS, di...
lgl (16): RTS_stage, FL, n_indels, n_indels_junc, bite, iso_exp, gene_exp, r...

β„Ή Use `spec()` to retrieve the full column specification for this data.
β„Ή Specify the column types or set `show_col_types = FALSE` to quiet this message.

---------------------------------------------------------------

---------------------------------------------------------------

	PERFORMING AUTOMATIC RESCUE...


---------------------------------------------------------------

	***NOTE: you have set -e all:

		All mono-exonic artifact transcripts will be considered for rescue.

	Rescuing references associated to mono-exon FSM...

	Including mono-exon ISM as rescue candidates...

	Finding FSM-supported reference transcripts lost after filtering...
Error in `dplyr::filter()`:
β„Ή In argument: `isoform %in% classif_ism_fsm$isoform`.
Caused by error:
! object 'isoform' not found
Backtrace:
     β–†
  1. β”œβ”€rescue %>% ...
  2. β”œβ”€dplyr::filter(., isoform %in% classif_ism_fsm$isoform)
  3. β”œβ”€dplyr:::filter.data.frame(., isoform %in% classif_ism_fsm$isoform)
  4. β”‚ └─dplyr:::filter_rows(.data, dots, by)
  5. β”‚   └─dplyr:::filter_eval(...)
  6. β”‚     β”œβ”€base::withCallingHandlers(...)
  7. β”‚     └─mask$eval_all_filter(dots, env_filter)
  8. β”‚       └─dplyr (local) eval()
  9. β”œβ”€isoform %in% classif_ism_fsm$isoform
 10. └─base::.handleSimpleError(...)
 11.   └─dplyr (local) h(simpleError(msg, call))
 12.     └─rlang::abort(message, class = error_class, parent = parent, call = error_call)
Execution halted
Traceback (most recent call last):
  File "/user/work/tk19812/software/SQANTI3-5.2.1/sqanti3_rescue.py", line 660, in <module>
    main()
  File "/user/work/tk19812/software/SQANTI3-5.2.1/sqanti3_rescue.py", line 557, in main
    auto_result = run_automatic_rescue(args)
  File "/user/work/tk19812/software/SQANTI3-5.2.1/sqanti3_rescue.py", line 59, in run_automatic_rescue
    if subprocess.check_call(auto_cmd, shell = True) != 0:
  File "/user/work/tk19812/scWorkshop/miniforge3/envs/SQANTI3.env/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '/user/work/tk19812/scWorkshop/miniforge3/envs/SQANTI3.env/bin/Rscript /user/work/tk19812/software/SQANTI3-5.2.1/utilities/rescue/automatic_rescue.R -c /user/work/tk19812/HeliconiniiProject/scRNA-IsoSeq/IsoQuant2.4.Hmel.PCGs/HmelIsoSeq/MLfilter_output/Hmel_MLresult_classification.txt -o MLrescue_output -d /user/work/tk19812/HeliconiniiProject/scRNA-IsoSeq/IsoQuant2.4.Hmel.PCGs/HmelIsoSeq/MLfilter_output -u /user/work/tk19812/software/SQANTI3-5.2.1/utilities   -g /user/work/tk19812/HeliconiniiProject/HeliconGenomeAlignmentAnnotation/UPDATEannotations/Hmel.v3.2.annotation.CAT.gtf -e all -m full' returned non-zero exit status 1.

Not sure what happened...
Thank you for your help
Cheers
F

Hi @francicco ,

You generate the reference classification running the sqanti3_qc script using your referenceGTF as isoforms and reference. The idea is that you use the same orthogonal data (if you have included any) that you used to run your transcriptome.

You can find more information in this discussion and in the wiki.

Alejandro

Ok, I did right then! But I still have that error during rescue...
and I don't know why
F

I've found the bug! The classification file from SQANTI3_filter.py has Isoform instead of isoform.
I edit it and now it runs.

I'll let you know if I find any other bug.

Cheers
F

Hi, @alexpan00 , I had a similar question
I am trying to run sqanti3_rescue.py rules when i have ln -s some documents related with parameters

/opt/software/SQANTI3-5.2.2/sqanti3_rescue.py rules \
/home/dell/Public/01_genome/sp/03.Iso_seq/ccs_bam/SQANTI/filter/rules/sp_RulesFilter_result_classification.txt \
--isoforms sp_corrected.fasta \
--gtf sp.rules.filtered.gtf \
-g sp_std.gtf -f spsm.fasta \
--refClassif sp_classification.txt \
--mode full \
-j /opt/software/SQANTI3-5.2.2/utilities/filter/filter_default.json \
-d rules

and I am encountering the following error:

/opt/software/SQANTI3-5.2.2/sqanti3_rescue.py:17: DeprecationWarning: Use shutil.which instead of find_executable
  Rscript_path = distutils.spawn.find_executable('Rscript')
/opt/software/SQANTI3-5.2.2/sqanti3_rescue.py:18: DeprecationWarning: Use shutil.which instead of find_executable
  gffread_path = distutils.spawn.find_executable('gffread')
/opt/software/SQANTI3-5.2.2/sqanti3_rescue.py:19: DeprecationWarning: Use shutil.which instead of find_executable
  python_path = distutils.spawn.find_executable('python')
Rscript (R) version 4.3.3 (2024-02-29)
0.12.7
Traceback (most recent call last):
  File "/opt/software/SQANTI3-5.2.2/sqanti3_rescue.py", line 660, in <module>
    main()
  File "/opt/software/SQANTI3-5.2.2/sqanti3_rescue.py", line 534, in main
    args.output=args.sqanti_filter_classif[args.sqanti_filter_classif.rfind("/")+1:args.sqanti_filter_classif("_classification.txt")]
                                                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'str' object is not callabl

The chatgpt suggested that args.sqanti_filter_classif were changed into args.sqanti_filter_classif.find.
It is right. The problem was solved.
I just want to share my situation, although I don't know if this stuff is helpful for further improvement of this software or not.

Hi @Xueliang24,

Thanks for sharing your experience and solution. It will certainly help improve the software and prevent this kind of error.

Alejandro

Hi @Xueliang24,

Thanks for sharing your experience and solution. It will certainly help improve the software and prevent this kind of error.

Alejandro

But I met another problem when I ran sqanti3_rescue.py ml

/opt/software/SQANTI3-5.2.2/sqanti3_rescue.py rules  \
/home/dell/Public/01_genome/sp/03.Iso_seq/ccs_bam/SQANTI/filter/rules/sp_RulesFilter_result_classification.txt \
--isoforms sp_corrected.fasta \
--gtf sp.rules.filtered.gtf \
-g sp_std.gtf \
-f spchrsm.fasta \
-k sp_classification.txt \
--mode full \
-j 0.7 \
-d rules

and I am encountering the following error:

 Running random forest classifier on reference transcriptome...

Error in predict.randomForest(modelFit, newdata, type = "prob") :
  missing values in newdata
Calls: predict ... probFunction -> <Anonymous> -> predict -> predict.randomForest
Execution halted
Traceback (most recent call last):
  File "/opt/software/SQANTI3-5.2.2/sqanti3_rescue.py", line 660, in <module>
    main()
  File "/opt/software/SQANTI3-5.2.2/sqanti3_rescue.py", line 582, in main
    rescued = run_ML_rescue(args)
              ^^^^^^^^^^^^^^^^^^^
  File "/opt/software/SQANTI3-5.2.2/sqanti3_rescue.py", line 304, in run_ML_rescue
    if subprocess.check_call(refML_cmd, shell = True) != 0:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dell/anaconda3/envs/SQANTI3.env/lib/python3.11/subprocess.py", line 413, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '/home/dell/anaconda3/envs/SQANTI3.env/bin/Rscript /opt/software/SQANTI3-5.2.2/utilities/rescue/run_randomforest_on_reference.R -c sp_classification.txt -o sp_MLresult -d ml -r /home/dell/Public/01_genome/sp/03.Iso_seq/ccs_bam/SQANTI/filter/ml/randomforest.RData' returned non-zero exit status 1.

The sp_classification.txt and -j were the same from QC and ml filter. And I also check the sp_classification.txt by any() in R.
Now I want to debug R Scripts /opt/software/SQANTI3-5.2.2/utilities/rescue/run_randomforest_on_reference.R
or Could you give me some advices

Sorry, I think you have pasted the command for the rules rescue. Just before the "Running random forest classifier on reference transcriptome" that you have pasted you should have a "Column-level NA check:" message. Are there any colnames after this message?

Sorry, I think you have pasted the command for the rules rescue. Just before the "Running random forest classifier on reference transcriptome" that you have pasted you should have a "Column-level NA check:" message. Are there any colnames after this message?

I want to tell you the debug result:

Error in `[.data.frame`(classification, , model_cols) :
  undefined columns selected
Calls: [ -> [.data.frame

Does this mean that there is a problem with the column names of the classification.txt file generated by QC, but I'm using the same classification.txt file for -k in rules and ml.

Running random forest classifier

The information you want to know

Loading required package: magrittr

        Validating columns used in prediction...

        Column-level NA check:
               length                 exons             RTS_stage
                FALSE                 FALSE                 FALSE
       min_sample_cov               min_cov                sd_cov
                FALSE                 FALSE                 FALSE
                   FL                  bite             FSM_class
                FALSE                 FALSE                 FALSE
               coding         predicted_NMD perc_A_downstream_TTS
                FALSE                 FALSE                 FALSE
            ratio_TSS
                 TRUE

        Column type check:
               length                 exons             RTS_stage
            "integer"             "integer"             "logical"
       min_sample_cov               min_cov                sd_cov
            "integer"             "integer"             "numeric"
                   FL                  bite             FSM_class
            "integer"              "factor"              "factor"
               coding         predicted_NMD perc_A_downstream_TTS
             "factor"              "factor"             "numeric"
            ratio_TSS
            "numeric"

        Running random forest classifier on reference transcriptome...

Thanks, as you can see the ratio_TSS column has NA values. As a quick fix, my suggestion would be that you replace those NA values with 1 in the reference classification before you run the rescue script. This is what the first part of the script is supposed to do, but I am not sure why it is not working for you. If you could share the sp_classification.txt file with me, so I can easily reproduce the error that would be very helpful.

Thanks, as you can see the ratio_TSS column has NA values. As a quick fix, my suggestion would be that you replace those NA values with 1 in the reference classification before you run the rescue script. This is what the first part of the script is supposed to do, but I am not sure why it is not working for you. If you could share the sp_classification.txt file with me, so I can easily reproduce the error that would be very helpful.

Yeah,some rows in the ratio_TSS column has NA values. I shared part of my classification.txt with you. I also found that NA values only exited in the contig level not in chr level.
part_classification.txt

Thanks!

Hello @Xueliang24 and sorry for the delay in the answer,

First of all, the classification file provided to the SQANTI3 rescue with the -k argument should be generated running the reference annotation against itself with the sqanti3_qc script, including the same orthogonal data, i.e. illumina short-reads, cage, .... The file that you provided seems to be the one that you generated before the filtering step for your transcriptome against the reference.

On the other hand, it is normal that the classification has NA values in the TSS_ratio column. If the contigs are too small it is possible that there are no enough bases before the TSS of the gene to calculate the ratio. However, the thing is that the rescue script should handle the NA values, in particular, in your case the script that is crashing is SQANTI3/utilities/rescue/run_randomforest_on_reference.R. I have run the script until the part that the NA values are replaced and it has worked for me.

Alejandro.

Thanks for your response!

I have also encountered the following problems. I have checked the code and found no problems

/xxx/SQANTI3-5.2.2/sqanti3_rescue.py:17: DeprecationWarning: Use shutil.which instead of find_executable Rscript_path = distutils.spawn.find_executable('Rscript') /xxx/SQANTI3-5.2.2/sqanti3_rescue.py:18: DeprecationWarning: Use shutil.which instead of find_executable gffread_path = distutils.spawn.find_executable('gffread') /xxx/SQANTI3-5.2.2/sqanti3_rescue.py:19: DeprecationWarning: Use shutil.which instead of find_executable python_path = distutils.spawn.find_executable('python') Traceback (most recent call last): File "/xxx/SQANTI3-5.2.2/sqanti3_rescue.py", line 660, in <module> main() File "/xxx/SQANTI3-5.2.2/sqanti3_rescue.py", line 539, in main if not os.path.isfile(args.randomforest): File "/home/xxx/anaconda3/envs/SQANTI3.env/lib/python3.8/genericpath.py", line 30, in isfile st = os.stat(path) TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

Here is my code, thank you

python /xxx/SQANTI3-5.2.2/sqanti3_rescue.py ml --isoforms /xxx/Sample_corrected.fasta --gtf /xxx/Sample.filtered.gtf --refGTF /xxx/gencode.v46.annotation.gtf --refGenome /xxx/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa --refClassif /xxx/GCA_000001405.15_GRCh38_no_alt_classification.txt --output Sample --dir /xxx/rescue_ml --threshold 0.7 /xxx/Sample_MLresult_classification.txt

hello, you may lost the parameter -r. Then, you should provide the randomforest.RData by the step sqanti3_filter ml

hello, you may lost the parameter -r. Then, you should provide the randomforest.RData by the step sqanti3_filter ml

Thanks,it works!!!!

When run sqanti.py rescue, the -k (--refClassif) parameter exists, requiring input of the reference SQANTi3 QC result file. For the production of this file, the following three methods, which is right? the code is as follows:

according to the transcript sequence
python /PATH/SQANTI3-5.2.2/sqanti3_qc.py \ /PATH/GRCh38/genecode/gencode.v47.transcripts.fa \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/GCA_000001405.15_GRCh38_no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ --CAGE_peak /PATH/polyA_info/human.refTSS_v3.1.hg38.sorted.bed \ --polyA_motif_list /PATH/polyA_info/mouse_and_human.polyA_motif.txt \ -o GCA_000001405.15_GRCh38_no_alt \ -d /PATH/SQANTi3_ref/GRCh38 \ --fasta \ --force_id_ignore \ --cpus 20 --report html

according to the transcript annotation file
python /PATH/SQANTI3-5.2.2/sqanti3_qc.py \ /PATH/GRCh38/genecode/gencode.v47.transcripts.gtf \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/GCA_000001405.15_GRCh38_no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ --CAGE_peak /PATH/polyA_info/human.refTSS_v3.1.hg38.sorted.bed \ --polyA_motif_list /PATH/polyA_info/mouse_and_human.polyA_motif.txt \ -o GCA_000001405.15_GRCh38_no_alt \ -d /PATH/SQANTi3_ref/GRCh38 \ --fasta \ --force_id_ignore \ --cpus 20 --report html

according to the genome annotation file
python /PATH/SQANTI3-5.2.2/sqanti3_qc.py \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/GCA_000001405.15_GRCh38_no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ --CAGE_peak /PATH/polyA_info/human.refTSS_v3.1.hg38.sorted.bed \ --polyA_motif_list /PATH/polyA_info/mouse_and_human.polyA_motif.txt \ -o GCA_000001405.15_GRCh38_no_alt \ -d /PATH/SQANTi3_ref/GRCh38 \ --fasta \ --force_id_ignore \ --cpus 20 --report html

which one is true?

The script needed transcript annotation file which produced by isoseq data.
After all, it targets full-length transcriptome data.

The script needed transcript annotation file which produced by isoseq data. After all, it targets full-length transcriptome data.

thank u! u means each sample need to produce a REFCLASSIF file? when create REFCLASSIF file, the input file is this sample transcription gtf file produced by SQANTi3 QC ?

The script needed transcript annotation file which produced by isoseq data. After all, it targets full-length transcriptome data.

thank u! u means each sample need to produce a REFCLASSIF file? when create REFCLASSIF file, the input file is this sample transcription gtf file produced by SQANTi3 QC ?

If you had many sample isoseq data, you get many bam files responsed to each isoseq data subreads bam.Then, it had been merged in the step isoseq refine, the code like

#many samples
 # Combine inputs
ls UHRR.IsoSeqX*bam > all.fofn
cat all.fofn

UHRR.IsoSeqX_bc01_5p--IsoSeqX_3p.bam
UHRR.IsoSeqX_bc02_5p--IsoSeqX_3p.bam
UHRR.IsoSeqX_bc03_5p--IsoSeqX_3p.bam
UHRR.IsoSeqX_bc04_5p--IsoSeqX_3p.bam
UHRR.IsoSeqX_bc05_5p--IsoSeqX_3p.bam
UHRR.IsoSeqX_bc06_5p--IsoSeqX_3p.bam
UHRR.IsoSeqX_bc07_5p--IsoSeqX_3p.bam
UHRR.IsoSeqX_bc08_5p--IsoSeqX_3p.bam
UHRR.IsoSeqX_bc09_5p--IsoSeqX_3p.bam
UHRR.IsoSeqX_bc10_5p--IsoSeqX_3p.bam
UHRR.IsoSeqX_bc11_5p--IsoSeqX_3p.bam
UHRR.IsoSeqX_bc12_5p--IsoSeqX_3p.bam

# Remove poly(A) tails and concatemer
$ isoseq refine all.fofn IsoSeq_v2_primers_12.fasta UHRR.flnc.bam --require-polya
#--require-polya parameter depends on your sequencing method

isoseq refine
Do you mean that if I test different samples separately, if I want to understand the overall transcript characteristics of these samples, I need to combine the flnc.bam of these samples into one file, and then conduct subsequent isoseq culster, pbmm2, isoseq collapsed and SQANTi3?

yes, you can get it from https://github.com/PacificBiosciences/IsoSeq/blob/master/isoseq-clustering.md

Thank you. I understand that the process is to look at the global transcript of these samples. If I want to look at the transcript characteristics of each sample separately, should I run Isoseq workflow and SQANTi separately for each sample? If each sample is run separately, does REFclassication.txt required for SQANTi3 rescue require a file for each sample?

Maybe.

yes, you can get it from https://github.com/PacificBiosciences/IsoSeq/blob/master/isoseq-clustering.md

Thank you. I understand that the process is to look at the global transcript of these samples. If I want to look at the transcript characteristics of each sample separately, should I run Isoseq workflow and SQANTi separately for each sample? If each sample is run separately, does REFclassication.txt required for SQANTi3 rescue require a file for each sample?

You need to provide both as isoforms and reference the annotation that you used as reference to run your long read samples. Additionally, you should provide the same orthogonal data (short-reads, CAGE, polyA,...). So the third option.

When run sqanti.py rescue, the -k (--refClassif) parameter exists, requiring input of the reference SQANTi3 QC result file. For the production of this file, the following three methods, which is right? the code is as follows:

according to the transcript sequence python /PATH/SQANTI3-5.2.2/sqanti3_qc.py \ /PATH/GRCh38/genecode/gencode.v47.transcripts.fa \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/GCA_000001405.15_GRCh38_no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ --CAGE_peak /PATH/polyA_info/human.refTSS_v3.1.hg38.sorted.bed \ --polyA_motif_list /PATH/polyA_info/mouse_and_human.polyA_motif.txt \ -o GCA_000001405.15_GRCh38_no_alt \ -d /PATH/SQANTi3_ref/GRCh38 \ --fasta \ --force_id_ignore \ --cpus 20 --report html

according to the transcript annotation file python /PATH/SQANTI3-5.2.2/sqanti3_qc.py \ /PATH/GRCh38/genecode/gencode.v47.transcripts.gtf \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/GCA_000001405.15_GRCh38_no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ --CAGE_peak /PATH/polyA_info/human.refTSS_v3.1.hg38.sorted.bed \ --polyA_motif_list /PATH/polyA_info/mouse_and_human.polyA_motif.txt \ -o GCA_000001405.15_GRCh38_no_alt \ -d /PATH/SQANTi3_ref/GRCh38 \ --fasta \ --force_id_ignore \ --cpus 20 --report html

according to the genome annotation file python /PATH/SQANTI3-5.2.2/sqanti3_qc.py \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/GCA_000001405.15_GRCh38_no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ --CAGE_peak /PATH/polyA_info/human.refTSS_v3.1.hg38.sorted.bed \ --polyA_motif_list /PATH/polyA_info/mouse_and_human.polyA_motif.txt \ -o GCA_000001405.15_GRCh38_no_alt \ -d /PATH/SQANTi3_ref/GRCh38 \ --fasta \ --force_id_ignore \ --cpus 20 --report html

which one is true?

You need to provide both as isoforms and reference the annotation that you used as reference to run your long read samples. Additionally, you should provide the same orthogonal data (short-reads, CAGE, polyA,...). So the third option.

When run sqanti.py rescue, the -k (--refClassif) parameter exists, requiring input of the reference SQANTi3 QC result file. For the production of this file, the following three methods, which is right? the code is as follows:
according to the transcript sequence python /PATH/SQANTI3-5.2.2/sqanti3_qc.py \ /PATH/GRCh38/genecode/gencode.v47.transcripts.fa \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/GCA_000001405.15_GRCh38_no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ --CAGE_peak /PATH/polyA_info/human.refTSS_v3.1.hg38.sorted.bed \ --polyA_motif_list /PATH/polyA_info/mouse_and_human.polyA_motif.txt \ -o GCA_000001405.15_GRCh38_no_alt \ -d /PATH/SQANTi3_ref/GRCh38 \ --fasta \ --force_id_ignore \ --cpus 20 --report html
according to the transcript annotation file python /PATH/SQANTI3-5.2.2/sqanti3_qc.py \ /PATH/GRCh38/genecode/gencode.v47.transcripts.gtf \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/GCA_000001405.15_GRCh38_no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ --CAGE_peak /PATH/polyA_info/human.refTSS_v3.1.hg38.sorted.bed \ --polyA_motif_list /PATH/polyA_info/mouse_and_human.polyA_motif.txt \ -o GCA_000001405.15_GRCh38_no_alt \ -d /PATH/SQANTi3_ref/GRCh38 \ --fasta \ --force_id_ignore \ --cpus 20 --report html
according to the genome annotation file python /PATH/SQANTI3-5.2.2/sqanti3_qc.py \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/GCA_000001405.15_GRCh38_no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ --CAGE_peak /PATH/polyA_info/human.refTSS_v3.1.hg38.sorted.bed \ --polyA_motif_list /PATH/polyA_info/mouse_and_human.polyA_motif.txt \ -o GCA_000001405.15_GRCh38_no_alt \ -d /PATH/SQANTi3_ref/GRCh38 \ --fasta \ --force_id_ignore \ --cpus 20 --report html
which one is true?

thank you. like this?

python /PATH/SQANTI3-5.2.2/sqanti3_qc.py \ /PATH/sample_A/sqanti/sample_A.GRCh38_corrected.gtf \ /PATH/GRCh38/genecode/gencode.v46.annotation.gtf \ /PATH/GRCh38/GCA_000001405.15_GRCh38_no_alt/GCA_000001405.15_GRCh38_no_alt_analysis_set.fa \ --CAGE_peak /PATH/polyA_info/human.refTSS_v3.1.hg38.sorted.bed \ --polyA_motif_list /PATH/polyA_info/mouse_and_human.polyA_motif.txt \ -o GCA_000001405.15_GRCh38_no_alt \ -d /PATH/SQANTi3_ref/GRCh38 \ --fasta \ --force_id_ignore \ --cpus 20 --report html

sample_A.GRCh38_corrected.gtf is produced by sqanti.py qc, input file is sample_A.collapsed.gff

No, like what you had in the third option with gencode.v46.annotation.gtf both as input and reference. And you should't use the --fasta flag since you are stating from a gtf file.