A repository to create automations that download external bioinformatics Datasets.
For each database source, this package can be setup to run an automation that checks if a new version of the database is available. As soon as it detects a new release , the automation will download it locally.
For each database source,the automation creates a root directory that is the name of the database in the path set in the main Configuration (EXTERNAL_DATA_BASE). The organization of files under these root directories will depend on the way a given data source publishes its data.
Some databases generate dataset downloads by release (Ensembl, Wormbase, ...) and some do not (MGI, NCBI, ...).
Under data source root directory, you will find:
- A file (current_release_NUMBER) that stores the latest release of the data source
- A directory for each version downloaded
- A symbolic "current" that points to the latest version
Under data source root directory, the files will be stored by datasets or as specified in variables DATASETS, or/and TAXA in the data source configuration file
Package Documentation: https://github.com/mdibl/data_downloads/wiki