HT-16srRNA.GABA is a comprehensive bioinformatics pipeline designed for the high-throughput identification of microbial strains using 16S rRNA gene sequencing data. The pipeline includes multiple steps such as duplex sequencing, taxonomic identification, and abundance profiling.
#create a environment with conda
conda create --name emu
#install emu seqkit blast
conda install -c bioconda emu seqkit blast samtools
git clone https://github.com/DingDingFan/HT-16srRNA.GABA.git
cd HT-16srRNA.GABA
conda activate emu
seqkit fq2fa pass.fq.gz > pass.fq.fasta
samtools faidx pass.fq.fasta
sh script/blastn.single.sh pass.fq.fasta script/50X50barcode.fasta aln # the database is custom
perl script/deal.pfa.pl pass.fq.fasta.fai aln.blast > aln.blast.deplex
perl script/add.sample.pl dual_barcode_pair.txt aln.blast.deplex > aln.blast.demultiplex.add.sample.xls
# "dual_barcode_pair.txt" is a file that contains a mapping of samples to their corresponding barcodes in a sheet format
for i in deplex/*/seq.id
echo "head -1000 $i | perl ~/bin/commom_bin/fishInWinter.pl --ff fasta /dev/stdin pass.fq.fasta >$i.fas "
end > batch.getseq.sh
perl script/commom_bin/multi-process.pl -cpu 20 batch.getseq.sh
for i in $PWD/deplex/*
do
echo "cd $i; sh script/nano.emu.sh "
done
end >batch.sh
perl ~/bin/commom_bin/multi-process.pl -cpu 10 batch.sh
for i in deplex/*/seq.id_rel-abundance.tsv
do
perl ~/bin/meta/got.emu.result.pl $i
done > Nanopore.assign.xls
Nanopore sequencing test data can be download form Bioproject(PRJNA1170189). ohters Pacbio/HIFI NGS also can be downloaded . sanger sequencing attached in the demo dirment.
This project is licensed under the MIT License
For questions or inquiries, please contact [biocomfun@qq.com].