/Behance-spider

crawl stylish images from design community Behance.net, field of textile design as example

Primary LanguagePython

Behance-spider

Crawl images from Behance.net, field of textile design as example
Retrieve project URLs and save as xls

Pre-requirements

  1. Install ButterSoup4 and selenium
    pip install BeautifulSoup4
    pip install selenium
  2. Install support packages of regular expression, excel and socket connection
    pip install re
    pip install xlwt
    pip install socket
  3. Install browser webdriver
    Download and install from browser support page

Steps

  1. Run RetrieveProject.py
    This script will grasp project urls from Behance.net, and save in file ProjectURL.xls
    A pre-generated ProjectURL.xls is provided.

  2. Run RetrieveImages.py
    This script will download images of each project in ProjectURL.xls, and save in fold 'pic1' under the root
    Downloading process and infomation will be printed.
    If fail to download a image from the url, 0 will be writen at the corresponding row in ProjectURL.xls. Else, 1 will be written.

  3. Run TransformImages.py
    This script will convert different images to JPEG file with RGB colorspace.