Crawl images from Behance.net, field of textile design as example
Retrieve project URLs and save as xls
- Install ButterSoup4 and selenium
pip install BeautifulSoup4
pip install selenium
- Install support packages of regular expression, excel and socket connection
pip install re
pip install xlwt
pip install socket
- Install browser webdriver
Download and install from browser support page
-
Run RetrieveProject.py
This script will grasp project urls from Behance.net, and save in file ProjectURL.xls
A pre-generated ProjectURL.xls is provided. -
Run RetrieveImages.py
This script will download images of each project in ProjectURL.xls, and save in fold 'pic1' under the root
Downloading process and infomation will be printed.
If fail to download a image from the url, 0 will be writen at the corresponding row in ProjectURL.xls. Else, 1 will be written. -
Run TransformImages.py
This script will convert different images to JPEG file with RGB colorspace.