Please download or obtain access to following before assembly and analysis
- Trimmomatic
- Unicycler
- SPAdes
- Guppy - need account to download
- Flye
- homopolish
- Prokka
The pipeline for generation of core SNP alignment is available from rknx/prok-snptree.
The following tools and scripts were used for generating dated phylogeny and distance matrix from raw reads:
Function | Tools/scripts |
---|---|
Quality check for raw reads | FᴀsᴛQC ⇨ src · web MᴜʟᴛɪQC ⇨ src · ref · web |
Adapter identification and trimming | ᴄᴜᴛᴀᴅᴀᴘᴛ ⇨ src · ref ᴛʀɪᴍ_ɢᴀʟᴏʀᴇ ⇨ src · ref · web |
Genome indexing and read alignment | ʙᴡᴀ ⇨ src · ref · ref · web |
Binary conversion and sorting | Sᴀᴍᴛᴏᴏʟs ⇨ src · ref · ref · web |
Variant calling and selection | GATK ⇨ src · ref · web |
SNP filtration and alignment | In-house code implemented in rknx/prok-snptree |
Dated phylogeny | BEAST ⇨ src · ref · web |
Core SNP matrix for PCA | ʀ/ᴀᴅᴇɢᴇɴᴇᴛ ⇨ src · ref · web |
Pairwise SNP count | FᴀsᴛᴀTᴏSNPCᴏᴜɴᴛ.sʜ ⇨ src |
#Trimmomatic is for quality control of Illumina reads
java -jar <path to trimmomatic.jar> PE [-threads <threads] [-phred33 | -phred64] [-trimlog <logFile>] <input 1> <input 2> <paired output 1> <unpaired output 1> <paired output 2> <unpaired output 2> <step 1>
#assembles genome from trimmomatic output. Unicycler needs SPAdes to run
for i in <input_dir>*.fastq;do N=$(basename $i .fastq); \
unicycler -s $i -o <output_dir> \
--spades_path <PATH> --no_miniasm;done
#Prokka annotates genomes
for F in <input_dir>*.fasta; do N=$(basename $F .fasta); \
prokka --outdir <PATH>$N --prefix $N --genus Xanthomonas \
--centre $N --addgenes $F; done
#basecalling, changes file type to fastq
cd guppy_612/ont-guppy-cpu/
ont-guppy-cpu/bin/guppy_basecaller \
-i <input_dir> \
-s <output_dir> \
-c dna_r9.4.1_450bps_fast.cfg \
--barcode_kits "SQK-RBK004" \
--trim_barcodes
#flye assembles the genome with raw reads
source activate <env>
flye \
--nano-hq rawreads/genome1.fastq \
-g 4m \
-o <output_dir> \
-i 2 --plasmids
#polishes assembled genome
homopolish polish -a <input_file> -g Xanthomonas -m R9.4.pkl -d -o <output_dir>
cd /kraken2/kraken2-2.1.2/
for i in $(ls <input_dir>*fastq.gz | grep "_R1" | cut -f 1 -d "_"); do N=$(basename $i .fastq.gz); \
./kraken2 --db <kraken_db> --gzip-compressed \
--paired \
--report <output_dir>/$N.kreport \
--output <output_dir>/$N.kraken \
${i}_R1.fastq ${i}_R2.fastq;done
cd <PATH>/kaiju/bin
./kaiju-mkbwt -n 5 -a ACDEFGHIKLMNPQRSTVWY -o <output_dir> <fasta_path>
./kaiju-mkfmi <output_dir>
for i in $(ls <raw_data_dir>*.fastq | grep "_R1" | cut -f 1 -d "_");do N=$(basename $i .fastq);\
./kaiju -t nodes.dmp \***
-f <output_dir>.fmi \
-i ${i}_R1.fastq -j ${i}_R2.fastq -o <kaiju_output_dir>$N.out -v -s 60 -e 10 -m 20;done
#change kaiju files into kraken files
for i in <kraken_output_dir>*.out;do N=$(basename $i .out); \
./kaiju2krona -i $i -o <kraken_output_dir>/$N.krona \
-t nodes.dmp \
-n names.dmp;done
#change krona files into html
cd <krona_bin_location>
for i in <kaiju_output_dir>*.krona;do N=$(basename $i .krona); \
./ktImportText -o <krona_output_dir>$N.html $i;done