A program used to download data from multiple sources and protocols (currently supported HTTP and FTP) to local disk. It is designed to be extensible to support more protocols, scalable even when memory is limited and partially downloaded files are deleted.
For development and testing,
pip install -r requirements-dev.txt
For production,
pip install -r requirements.txt
All tests
nosetests
Unit tests
nosetests tests/unit
Integration tests
nosetests tests/integration
Note actual files are downloaded into the path defined in environment variable OUTPUT_DIR
(default to /tmp/file_output
) and removed after the tests run.
Setup
cd <project_path>
PYTHONPATH=$PYTHONPATH:.
Run
python data_downloader/main.py --url <url1> --url <url2> --output <output_dir>
For help
python data_downloader/main.py -h