Some script to download bacterial and fungal genomes from NCBI after they restructured their FTP a while ago.
Idea shamelessly stolen from Mick Watson's Kraken downloader
scripts
that can also be found in Mick's GitHub
repo. However, Mick's
scripts are written in Perl specific to actually building a Kraken database
(as advertised).
So this is a set of scripts that focuses on the actual genome downloading.
pip install ncbi-genome-download
Alternatively, clone this repository from GitHub, then run (in a python virtual environment)
pip install .
To download all bacterial RefSeq genomes in GenBank format from NCBI, run the following:
ncbi-genome-download bacteria
If you're on a reasonably fast connection, you might want to try running multiple downloads in parallel:
ncbi-genome-download bacteria --parallel 4
To download all fungal GenBank genomes from NCBI in GenBank format, run:
ncbi-genome-download --section genbank fungi
To download all viral RefSeq genomes in FASTA format, run:
ncbi-genome-download --format fasta viral
To download only completed bacterial RefSeq genomes in GenBank format, run:
ncbi-genome-download --assembly-level complete bacteria
To download bacterial RefSeq genomes of the genus Streptomyces, run:
ncbi-genome-download --genus Streptomyces bacteria
Note: This is a simple string match on the organism name provided by NCBI only.
To get an overview of all options, run
ncbi-genome-download --help
All code is available under the Apache License version 2, see the
LICENSE
file for details.