- Linux
If you plan on running installing the required modules using setup.py, following an apt-get update
the following packages need to be installed:
- git
- build-essential
- python-pip
- python-dev
- zlib1g-dev
- libcurl4-openssl-dev
- libssl-dev
sudo apt-get install -y build-essential git python-pip python-dev zlib1g-dev libcurl4-openssl-dev libssl-dev
Clone the repository (--recursive will clone the necessary submodules):
git clone https://github.com/adamkoziol/RefSeqFTPDownload.git --recursive
Install python dependencies:
cd RefSeqFTPDownload/
python setup.py install
Used to download refseq release ( ftp://ftp.ncbi.nlm.nih.gov/refseq/release/ ). By default the script downloads the entirety of the release, but it is possible to specify one or more of the following categories to download:
- archaea
- bacteria
- fungi
- invertebrate
- plant
- plasmid
- protozoa
- vertebrate_mammalian
- vertebrate_other
- viral
python refseqdownloadrelease.py -c viral /refseq/downloads
A path to the folder in which the downloads are to be placed is required. See usage below:
usage: refseqdownloadrelease.py [-h] [-v] [-d DOWNLOADPATH] [-t THREADS]
[-c CATEGORY]
path
Download genomes from refseq release
positional arguments:
path Specify path
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-t THREADS, --threads THREADS
Number of threads. Default is 4, as I don't want to
anger NCBI. Increase at your own risk
-c CATEGORY, --category CATEGORY
Choose one or more of the following categories of
refseq genomes to download: archaea, bacteria, fungi,
invertebrate, plant, plasmid, protozoa,
vertebrate_mammalian, vertebrate_other, viral.
Multiple selections should be separated by a comma.
The script defaults to using all categories.
Uses the supplied assembly_summary_refseq.txt file to determine ftp paths of every organism in refseq, and downloads each *_genomic.fna.gz file in the database. If you replace the assembly_summary_refseq.txt file with a new version, ( ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/assembly_summary_refseq.txt ), please ensure that you delete any comments at the top of the file.
python refseqdownload /refseq
A path to the folder in which the downloads are to be placed is required. See usage below:
usage: refseqdownload.py [-h] [-v] [-d DOWNLOADPATH] [-t THREADS] path
Download genomes from refseq
positional arguments:
path Specify path
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-d DOWNLOADPATH, --downloadpath DOWNLOADPATH
Path in which to place the downloads. If not supplied,
this defaults to your current working directory/downloads
-t THREADS, --threads THREADS
Number of threads. Default is 4