Welcome to PAIso-Seq.
This analysis pipeline is implemented in Perl and R. Packages and software Versions used are listed below:
Perl v5.26.2
R 3.5.1
minimap2 2.15-r905
Library structure:
Forward style: 5'-TSO--->cDNA--->AAAAA...AAAAA->Barcode->Adapter-3'
Reverse style: 5'-Adapter-Barcode->TTTTT...TTTTT->cDNA->TSO-3'
5'-TSO Sequence: 5'-AAGCAGTGGTATCAACGCAGAGTACATGGG-3' (30 nt)
3'-Adapter Sequence: 5'-GTACTCTGCGTTGATACCACTGCTT-3' (25 nt)
Barcode Sequence: 5'-GAGTGCTACTCTAGTA-3' (16 nt)
ccs movieX.subreads.bam movieX.ccs.bam --noPolish --minPasses 1 &>ccs.log
However, We now suggest the following command:
ccs movieX.subreads.bam movieX.ccs.bam --richQVs &>ccs.log
The output file is movieX.ccs.bam
Reference: https://github.com/PacificBiosciences/IsoSeq3/blob/master/README_v3.2.md
Convert movieX.ccs.bam to fasta:
bam2fasta -u -o ccs movieX.ccs.bam
The output file is movieX.ccs.fasta
Extract data and trim adapters:
./scripts/trim.py movieX.ccs.fasta sample GAGTGCTACTCTAGTAGTACTCTGCGTTGATACCACTGCTT 22 2 1>sample.out.fasta 2>sample.err.fasta
Create minimap2 index :
minimap2 -x splice -t 20 -d Mus_musculus.mmi Mus_musculus.GRCm38.dna.toplevel.chromosome.fa &>index.log
Collect CCS passes :
./scripts/GetCCSpass.pl movieX.ccs.bam >ccs.pass.txt
Run the pipeline:
perl run.pl --ccs sample.out.fasta --sample sampleName --species mm10 --minimap2_index Mus_musculus.mmi --minimap2_thread 10 --pass ccs.pass.txt &>run.log