isoseq3 refine filters many reads
Closed this issue · 1 comments
Operating system
Which operating system and version are you using?
I use the CentOS Linux release 7.6.1810 (Core) system, isoseq3 4.0.0 version
Package name
Which package / tool is causing the problem? Which version are you using, use tool --version
. Have you updated to the latest version conda update package
? Have you updated the complete env by running conda update --all
? Have you ensured that your channel priorities are set up according to the bioconda recommendations at https://bioconda.github.io/#set-up-channels?
isoseq3 --version
isoseq 4.0.0 (commit v4.0.0)
Conda environment
What is the result of conda list
? (Try to paste that between triple backticks.)
Describe the bug
A clear and concise description of what the bug is.
I on the analysis of samples https://www.encodeproject.org/experiments/ENCSR309IKK/ ENCODE data rep 1, I found that I before refine reads is about 2 million, But the reads after refine were 400,000. The ENCODE database compared 1.73 million reads to the bam file of the genome. I don't know what I did wrong. I pasted my code:
ccs --noPolish --minLength=10 --minPasses=1 --min-rq=0.9 --min-snr=2.5 --reportFile ccs_report.txt ENCFF028FCL.bam ENCFF028FCL_ccs.bam
lima --ccs --num-threads 12 --min-score 0 --min-end-score 0 --min-signal-increase 10 --min-score-lead 0 --same ENCFF028FCL_ccs.bam PB_adapters_same.fasta ENCFF028FCL_fl.bam
samtools view -h ENCFF028FCL_fl.bam > ENCFF028FCL_fl.sam
python flip_reads.py --f ENCFF028FCL_fl.sam --o ENCFF028FCL_flipped.sam
samtools view -bS ENCFF028FCL_flipped.sam > ENCFF028FCL_flipped.bam
isoseq3 refine --num-threads 12 --min-rq -1 ENCFF028FCL_flipped.bam CapTrap_PD_adapters.fasta ENCFF028FCL_flnc.bam
I analyzed according to ENCODE protocol. adapter and barcode were from the official website of pacbio.
primer_5p
TCGTCGGCAGCGTC
primer_3p
GTCTCGTGGGCTCGG
PB_adapters_same.fasta:
adapter
AAGCAGTGGTATCAACGCAGAGTAC
Error message
Paste the error message / stack.
isoseq3 refine log file:
{
"_comment": "Created by pbcopper v2.3.99",
"attributes": [
{
"id": "sample_name",
"name": "Sample Name",
"value": "MortA"
},
{
"id": "num_reads_fl",
"name": "Full-Length Reads",
"value": 2040618
},
{
"id": "num_reads_flnc",
"name": "Full-Length Non-Chimeric Reads",
"value": 392562
},
{
"id": "num_reads_flnc_polya",
"name": "Full-Length Non-Chimeric Reads with Poly-A Tail",
"value": 8778
}
],
"dataset_uuids": [],
"id": "isoseq_refine",
"plotGroups": [],
"tables": [],
"title": "Iso-Seq Refine Report",
"uuid": "eebf1a28-bdcf-4135-8d62-b471b8836cfb",
"version": "1.0.1"
}
To Reproduce
Steps to reproduce the behavior. Providing a minimal test dataset on which we can reproduce the behavior will generally lead to quicker turnaround time!
Expected behavior
A clear and concise description of what you expected to happen.
This is not a technical issue with our bioconda binaries. Please reach out to PacBio's support.