如何只抓取排行榜的某一类
scp23328 opened this issue · 8 comments
scp23328 commented
例如只抓取排行榜的插画排行榜,而不抓取漫画排行榜?
CWHer commented
可能改一下排行榜的url就能实现,我等今天下班看看
scp23328 commented
另外我在运行程序时总是只能下载一小半就退出程序,我不太确定到底是哪里出了问题
scp23328 commented
CWHer commented
另外我在运行程序时总是只能下载一小半就退出程序,我不太确定到底是哪里出了问题
可以把你运行的main.py
发一下吗
scp23328 commented
from config import DOWNLOAD_CONFIG
from crawlers.bookmark_crawler import BookmarkCrawler
from crawlers.keyword_crawler import KeywordCrawler
from crawlers.ranking_crawler import RankingCrawler
from crawlers.users_crawler import UserCrawler
from utils import checkDir
if __name__ == "__main__":
checkDir(DOWNLOAD_CONFIG["STORE_PATH"])
# case 1: (need cookie !!!)
# download artworks from rankings
# the only parameter is flow capacity, default is 1024MB
app = RankingCrawler(capacity=1024)
app.run()
# case 2: (need cookie !!!)
# download artworks from bookmark
# 1st parameter is max download number, default is 200
# 2nd parameter is flow capacity, default is 1024MB
# app = BookmarkCrawler(n_images=20, capacity=200)
# app.run()
# case 3:
# download artworks from a single artist
# 2nd parameter is flow capacity, default is 1024MB
# app = UserCrawler(artist_id="32548944", capacity=200)
# app.run()
# case 4: (need premium & cookie !!!)
# download search results of a keyword (sorted by popularity)
# 1st parameter is keyword
# 2nd parameter is max download number
# 3rd parameter is flow capacity
#app = KeywordCrawler(keyword="百合", n_images=200, capacity=1024*256)
#app = RankingCrawler(capacity=1024*8)
#app.run()
CWHer commented
超过流量限制就自动结束了,你可以修改capacity
参数调大流量限制,比如
app = RankingCrawler(capacity=1024 * 10)
scp23328 commented
好的,多谢
CWHer commented
-
为排行榜模式添加了
CONTENT_MODE
配置文件位于
./pixiv_crawler/config.py
CONTENT_MODE
: 下载插画、漫画或是全部类型的作品(参考文件中CONTENT_MODES
)设置为
"illust"
则仅下载插画