jingzhou123/tieba-crawler

简单的一逼的，到处硬编码的，有可能泄露个人信息的百度贴吧爬虫，基于scrapy.

PythonMIT

tieba-crawler

满足下列查询需求：

一个吧的所有关注者（会员）
一个吧的所有帖子
一个用户关注的所有贴吧
一个用户关注的其它所有用户
一个用户的所有粉丝
一个贴吧的所有文章
一个文章(回复)的所有回复(所有回复中互相回复的帖子全算对楼主的回复)
一个文章的所有回复者

提供了哪些爬虫：

tieba: 提供贴吧的一般性信息
post: 贴吧内发表的主贴
reply: 主贴的回复
comment: 回复的回复(评论)
user_post, user_reply, user_comment: 通过主题贴、回复和评论取得的用户名来爬用户的详细信息
user_fan, user_follow: 通过粉丝关系和关注关系爬粉丝或关注的用户详情
fan, follow: 爬每一个用户的关注和粉丝，只有用户名

使用方法

scrapy crawl 名称 2>&1 | tee -a tb.log

依赖：

每个爬虫在运行前最好先运行它依赖的爬虫

tieba []
post [tieba]
reply [post]
comment [reply]
member [tieba]
user_member [member]
user_fan [fan]
user_follow [follow]
user_comment [comment]
user_post [post]
fan [user_post user_reply user_comment member]
follow [user_post user_reply user_comment member]