The tool is intended for Machine Learning scientists to facilitate image dataset creation.
If you find useful Tumblr blog with images of one type (cats, dogs, cars, people etc.) just fill in blogs' urls and wait until Scrapy finishes downloading.
Python 3.x
pip install scrapy
You may want to tweak download speed, number of parallel threads and other options
in spiders/settings.py
. The file is self-explanatory or consult Scrapy docs for
more information.
Fill in start_urls.txt
with start pages of Tumblr blogs.
Run:
scrapy crawl tumblr-spider
Images from each blog will be saved into images/blogname.tumblr.com/
.
Images are saved to a single folder.
Re-run will not download images twice.