Google, Naver multiprocess image crawler
-
Install Chrome
-
Extract chromedriver.zip
-
Add PATH where you extracted chromedriver.
-
pip install -r requirements.txt
-
Write search keywords in keywords.txt
-
Run auto_crawler.py
-
Files will be downloaded to 'download' directory.
usage: python3 auto_crawler.py [--skip true] [--threads 4] [--google true] [--naver true]
--skip SKIP Skips keyword already downloaded before. This is needed when re-downloading.
--threads THREADS Number of threads to download.
--google GOOGLE Download from google.com (boolean)
--naver NAVER Download from naver.com (boolean)
Detects data imblance based on number of files.
When crawling ends, the message show you what directory has under 50% of average files.
I recommend you to remove those directories and re-download.