Find architecture portfolios on Issuu
Issuu爬虫
*查找最赞的建筑作品集
*利用Issuu页面自动加载的推荐文件爬
*有下载功能
使用库: urllib + re
folder 4000: results after crawling for 4000 portfolios and sorted by likes
folder 10000: results after crawling for 10000 portfolios and sorted by likes, ran for 6 hours on my laptop
- change directory to this project
- open
crawlForMostLikedPortf.py
- change your starting portfolios. I add some urls to
myqueue
, you can add your favorite portfolios into this queue. - change your saving directory on line 45
- change portfolios number on line 26
- run the script.
If you have finished last crawl, the script should have saved two files dictPub.csv
and dictQueue.csv
- change directory to this project
- open
LoadAndContinueCrawl.py
- change your saving directory on line 40
- change portfolios number on line 20
- run the script.
- change directory to this project
- open
downloadPublications.py
- change your saving directory on line 12
- change the portfolio url you want to download on line 9
- run the script.