CWHer/PixivCrawler

如何只抓取排行榜的某一类

scp23328 opened this issue · 8 comments

例如只抓取排行榜的插画排行榜,而不抓取漫画排行榜?

CWHer commented

可能改一下排行榜的url就能实现,我等今天下班看看🤔

另外我在运行程序时总是只能下载一小半就退出程序,我不太确定到底是哪里出了问题

CWHer commented

另外我在运行程序时总是只能下载一小半就退出程序,我不太确定到底是哪里出了问题

可以把你运行的main.py发一下吗

from config import DOWNLOAD_CONFIG
from crawlers.bookmark_crawler import BookmarkCrawler
from crawlers.keyword_crawler import KeywordCrawler
from crawlers.ranking_crawler import RankingCrawler
from crawlers.users_crawler import UserCrawler
from utils import checkDir


if __name__ == "__main__":

    checkDir(DOWNLOAD_CONFIG["STORE_PATH"])

    # case 1: (need cookie !!!)
    #   download artworks from rankings
    #   the only parameter is flow capacity, default is 1024MB
    app = RankingCrawler(capacity=1024)
    app.run()

    # case 2: (need cookie !!!)
    #   download artworks from bookmark
    #   1st parameter is max download number, default is 200
    #   2nd parameter is flow capacity, default is 1024MB
    # app = BookmarkCrawler(n_images=20, capacity=200)
    # app.run()

    # case 3:
    #   download artworks from a single artist
    #   2nd parameter is flow capacity, default is 1024MB
    # app = UserCrawler(artist_id="32548944", capacity=200)
    # app.run()

    # case 4: (need premium & cookie !!!)
    #   download search results of a keyword (sorted by popularity)
    #   1st parameter is keyword
    #   2nd parameter is max download number
    #   3rd parameter is flow capacity
    #app = KeywordCrawler(keyword="百合", n_images=200, capacity=1024*256)
    #app = RankingCrawler(capacity=1024*8)
    #app.run()
CWHer commented

超过流量限制就自动结束了,你可以修改capacity参数调大流量限制,比如

app = RankingCrawler(capacity=1024 * 10)

好的,多谢

CWHer commented
  • 为排行榜模式添加了CONTENT_MODE

    配置文件位于./pixiv_crawler/config.py

    CONTENT_MODE: 下载插画、漫画或是全部类型的作品(参考文件中CONTENT_MODES

    设置为"illust"则仅下载插画

    Related commit