lvanz's Stars
postlight/parser
📜 Extract meaningful content from the chaos of a web page
LiuXingMing/LinkedinSpider
Linkedin爬虫,根据公司名字抓取员工的linkedin信息
SpiderClub/smart_login
各大网站登陆方式,有的是通过selenium登录,有的是通过抓包直接模拟登录(精力原因,目前不再继续维护)
SpiderClub/weibospider
:zap: A distributed crawler for weibo, building with celery and requests.
dataabc/weiboSpider
新浪微博爬虫,用python爬取新浪微博数据
pwxcoo/chinese-xinhua
:orange_book: 中华新华字典数据库。包括歇后语,成语,词语,汉字。
luyishisi/Anti-Anti-Spider
越来越多的网站具有反爬虫特性,有的用图片隐藏关键数据,有的使用反人类的验证码,建立反反爬虫的代码仓库,通过与不同特性的网站做斗争(无恶意)提高技术。(欢迎提交难以采集的网站)(因工作原因,项目暂停)
scrapinghub/dateparser
python parser for human readable dates
CrawlScript/WebCollector
WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.
qiyeboy/IPProxyPool
IPProxyPool代理池项目,提供代理ip
xlvector/captcha
captcha
MingyuTian/CAPTCHA_verify
verify CAPTCHA of the website http://gsxt.gdgs.gov.cn/
p19891117/captcha-ocr
java decaptcha
KavinLiu/crawler-of-brand
for so.quandashi.com and sbcx.saic.gov.cn:9080/tmois/wszhcx_getZhcx.xhtml
mylove1/CnkiSpider
**知网爬虫
mylove1/qichacha_spider
a spider crawl qichacha by scrapy