/BIlibiliTop

爬取B站排行榜的全部信息

Primary LanguagePython

BIlibiliTop

爬取B站排行榜的全部信息

目标网站:

https://www.bilibili.com/v/popular/rank/all [全站]

https://www.bilibili.com/v/popular/rank/bangumi [番剧]

https://www.bilibili.com/v/popular/rank/guochan [国产动画]

https://www.bilibili.com/v/popular/rank/guochuang [国创相关]

https://www.bilibili.com/v/popular/rank/documentary [纪录片]

https://www.bilibili.com/v/popular/rank/douga [动画]

https://www.bilibili.com/v/popular/rank/music [音乐]

https://www.bilibili.com/v/popular/rank/dance [舞蹈]

https://www.bilibili.com/v/popular/rank/game [游戏]

https://www.bilibili.com/v/popular/rank/technology [知识]

https://www.bilibili.com/v/popular/rank/digital [数码]

https://www.bilibili.com/v/popular/rank/life [生活]

https://www.bilibili.com/v/popular/rank/food [美食]

https://www.bilibili.com/v/popular/rank/animal [动物圈]

https://www.bilibili.com/v/popular/rank/kichiku [鬼畜]

https://www.bilibili.com/v/popular/rank/fashion [时尚]

https://www.bilibili.com/v/popular/rank/ent [娱乐]

https://www.bilibili.com/v/popular/rank/cinephile [影视]

https://www.bilibili.com/v/popular/rank/movie [电影]

https://www.bilibili.com/v/popular/rank/tv [电视剧]

https://www.bilibili.com/v/popular/rank/origin [原创]

https://www.bilibili.com/v/popular/rank/rookie [新人]

任务目标:爬取B站排行榜不同分类的全部内容

img

实验过程

使用Xpath 获取网页信息

获取全部分类

//ul[@class='rank-tab']/li/text()

img

获取排名

//div[@class='num']/text()

img

获取视频标题

//div[@class='info']/a[@class='title']/text()

img

获取播放量

//i[@class='b-icon play']//following::text()[1]

img

获取弹幕数

//i[@class='b-icon view']//following::text()[1]

img

获取综合得分

//div[@class='pts']/div/text()

img

获取网址

//div[@class='info']/a/@href

img

获取作者

//i[@class='b-icon author']//following::text()[1]

img

获取作者个人空间

//div[@class='detail']/a/@href

img

获取更新情况

//div[@class='pgc-info']/text()

img

获取收藏量

//i[@class='fav']//following::text()[1]

img