Identify geographic location of 16S rRNA amplicon sequences based on blast searches
The pipeline consists of three modules:
- Blast search and blast output parsing
- GenBank files download
- Extract geographic location from GenBank files
GeoBlast runs with Docker or Singularity.
To install it simply download the one of these two wrap scripts:
Docker: geoblast_runner.sh
Singularity: geoblast_runner.sh
And add the execute permission:
chmod +x geoblast_runner.sh
Usage: geoblast_runner.bash <input file> <output directory> <options>
--help print this help
--min_id NUM minimum percentage of identity
--min_perc_len NUM minimum alignment percentage length
--e_val NUM e-value
--nslots NUM number of slots (default 2)
--overwrite t|f overwrite current directory (default f)
--sample_name CHAR sample name (default input file name)
/output_dir:
blout.tsv (raw blast output)
blout_filt.tsv (filtered blast output)
geoblast_output.tsv (geoblast final output table)
/<query>:
acc2download.txt (list of acc hits to be downloaded)
downloaded.gbk (downloaded gbk files of hits)
query_blout_filt.tsv (section of blout_filt.tsv corresponding to <query>)
parsed_gbk.tsv (parsed fields of gbk files)