/linovelib2epub

Crawl light novel from linovelib(https://w.linovelib.com/) and convert to epub.

Primary LanguagePythonGNU Affero General Public License v3.0AGPL-3.0

linovelib2epub

Crawl light novel from 哔哩轻小说(linovelib) and convert to epub.

Hatch project flake8 Build and Publish PyPI - Downloads PyPI Lines of code Hits-of-Code GitHub commit activity

preview

A picture is worth a thousand words. Talk is cheap, show me the real effect.

preview

This demo use this screen recorder tool to record.

Features

  • flexible has_illustration and divide_volume option for epub output
  • support download a certain volume of a novel
  • built-in http request retry mechanism to improve network fault tolerance
  • built-in random browser user_agent through fake_useragent library
  • built-in strict integrity check about image download
  • built-in mechanism for saving temporary book data by pickle library
  • use multi-process to download images
  • support add custom css style to epub

Supported Websites (plan)

序号 网站名称 语言 爬虫难度 支持进度 备注
1 哔哩轻小说(Mobile) 简/繁 中😰 🆗 默认选项。
2 哔哩轻小说(Web) 简/繁 中😰 🚫 资源同Mobile,没必要。
3 轻之国度 简/繁 高🤣👿 🚫 需要登录,轻币门槛,导航分类混乱。
4 无限轻小说 中😰 不用登录。一章多页。
5 轻小说文库 简/繁 中😰 需要登录。一章一页。
6 轻小说百科 简/繁 低😆 不用登录,一章一页。遗憾的是插图清晰度低。
7 真白萌 简/繁 中😰 需要登录,一章一页。

爬虫友好度有两个重要指标:

  • 1.访问门槛。是否需要登陆以及积分。
  • 2.页面结构。一个章节多页渲染的视为中等难度。

如果你发现其他的很好轻小说目标源,资源丰富,更新及时,插图清晰,并且爬虫门槛合理的,可以在issue发起补充。

代码实现中对其他轻小说源的支持,关键是继承并重写这个 BaseNovelWebsiteSpider 类。

Usage

install from source

  1. clone this repo
git clone https://github.com/lightnovel-center/linovelib2epub.git
  1. set up a clean local python venv

See also: creating-virtual-environments

replace py with your real python command if needed. e.g. python or python3.

# new a venv
py -m venv venv

# activate venv
.\venv\Scripts\activate

# install dependencies
py -m pip install -r requirements.txt

# install this package in local
# under project root folder: linovelib2epub/
python -m pip install -e .
  1. Now you can use this package as a pypi remote package.
from linovelib2epub.linovel import Linovelib2Epub

# warning!: must run within __main__ module guard due to process spawn issue.
if __name__ == '__main__':
    linovelib_epub = Linovelib2Epub(book_id=3279)
    linovelib_epub.run()

install from pypi

  1. Install this package from pypi:
pip install linovelib2epub
  1. create a python file and edit the content as follows:
from linovelib2epub.linovel import Linovelib2Epub

# warning!: must run within __main__ module guard due to process spawn issue.
if __name__ == '__main__':
    linovelib_epub = Linovelib2Epub(book_id=3279)
    linovelib_epub.run()

If it finished without errors, you can see the epub file is under the folder where your python file is located.

Options

Parameters type required default description
book_id number YES None 书籍ID。
base_url string NO 'https://w.linovelib.com/novel' 哔哩轻小说主页URL
divide_volume boolean NO False 是否分卷
select_volume_mode boolean NO False 选择卷模式,它为True时 divide_volume 强制为True。
has_illustration boolean NO True 是否下载插图
image_download_folder string NO "images" 图片下载临时文件夹. 不允许以相对路径../开头。
pickle_temp_folder string NO "pickle" pickle临时数据保存的文件夹。
http_timeout number NO 10 一个HTTP请求的超时等待时间(秒)。代表connect和read timeout。
http_retries number NO 5 当一个HTTP请求失败后,重试的最大次数。
http_cookie string NO '' 自定义HTTP cookie。
custom_style_cover string NO '' 自定义cover.xhtml的样式
custom_style_nav string NO '' 自定义nav.xhtml的样式
custom_style_chapter string NO '' 自定义每章(?.xhtml)的样式
disable_proxy boolean NO True 是否禁用所在的代理环境,默认禁用

Todo

  • quality: setup pytest and codecov
  • quality: setup more formatter and linter for maintainability

Contributors

All Contributors

GokouRuri
GokouRuri

🐛 💻
xxxfhy
xxxfhy

🐛

Acknowledgements