A Manga download framework using selenium
.
Now support the websites below:
Program will check the website of given URL automaticity
If the website you given is unsupported, the program will raise an error.
Now support multi manga download with only login one time
现在支持批量下载
you should prepare the information below:
- Manga URL
- Cookies
- Image dir (Where to put the image, folder name)
- Some website you should see the size of image and set it at
res
. Cmoa.jp doesn't need this.
All the settings are in main.py
.
settings = {
# Manga urls, should be the same website
'manga_url': [
'URL_1',
'URL_2'
],
# Your cookies
'cookies': 'YOUR_COOKIES_HERE',
# Folder names to store the Manga, the same order with manga_url
'imgdir': [
'IMGDIR_FOR_URL_1',
'IMGDIR_FOR_URL_2'
],
# Resolution, (Width, Height), For cmoa.jp this doesn't matter.
'res': (1393, 2048),
# Sleep time for each page (Second), normally no need to change.
'sleep_time': 2,
# Time wait for page loading (Second), if your network is good, you can reduce this parameter.
'loading_wait_time': 20,
# Cut image, (left, upper, right, lower) in pixel, None means do not cut the image. This often used to cut the edge.
# Like (0, 0, 0, 3) means cut 3 pixel from bottom of the image.
'cut_image': None,
# File name prefix, if you want your file name like 'klk_v1_001.jpg', write 'klk_v1' here.
'file_name_prefix': '',
# File name digits count, if you want your file name like '001.jpg', write 3 here.
'number_of_digits': 3,
# Start page, if you want to download from page 3, set this to 3, None means from 0
'start_page': None,
# End page, if you want to download until page 10, set this to 10, None means until finished
'end_page': None,
}
This program now work for Chrome, if you use another browser, please check this page
-
Install python packages selenium and pillow and get the Google chrome Drivers.
- For selenuim ad pillow:
pip install selenium pip install Pillow
-
Change the
IMGDIR
in the main.py to indicate where to put the manga. -
Add your cookies in the program.
Remember to use F12 to see the cookies!
Because some http only cookies can not be seen by javascript!
Remember to go to the links below to get the cookies!
- For [Bookwalker.jp] cookies, go here.
- For [Bookwalker.com.jp] cookies, go here.
- For [www.cmoa.jp] cookies, go here. This website I'm not sure, because I can not buy manga on it. If there is any problem, please let me know.
- Open the page.
- Press F12.
- Click on the Network.
- Refresh the page.
- Find the first profile request, click it.
- On the right, there will be a Request Headers, go there.
- Find the cookie:...., copy the string after the cookie:, paste to the main.py, YOUR_COOKIES_HERE
-
Change the manga_url in the main.py.
-
For [Bookwalker.jp]
First go to 購入済み書籍一覧, you can find all your mangas here.
This time the URL is the URL of 'この本を読む' button for your manga.
Right click this button, and click 'Copy link address'.
The URL is start with member.bookwalker.jp, not the viewer.bookwalker.jp. Here we use the manga 【期間限定 無料お試し版】あつまれ!ふしぎ研究部 1.
This is the URL of the あつまれ!ふしぎ研究部 1: https://member.bookwalker.jp/app/03/webstore/cooperation?r=BROWSER_VIEWER/640c0ddd-896c-4881-945f-ad5ce9a070a6/https%3A%2F%2Fbookwalker.jp%2FholdBooks%2F
-
For [Bookwalker.com.tw]
Please go to 线上阅读.
The manga URL like this:https://www.bookwalker.com.tw/browserViewer/56994/read
-
For [Cmoa.jp]
Open the Manga and just copy the URL on the browser.
The manga URL like this : https://www.cmoa.jp/bib/speedreader/speed.html?cid=0000156072_jp_0001&u0=0&u1=0&rurl=https%3A%2F%2Fwww.cmoa.jp%2Fmypage%2Fmypage_top%2F%3Ftitle%3D156072
Just copy this URL to the
MANGA_URL
in main.py. -
-
After edit the program, run
python main.py
to run it.
-
The
SLEEP_TIME
by default is 2 seconds, you can adjust it with your own network situation, if the downloading has repeated images, you can change it to 5 or more. If you think it's too slow, try change it to 1 or even 0.5. -
LOADING_WAIT_TIME = 20
, this is the time to wait until the manga viewer page loaded, if your network is not good, you can set it to 30 or 50 seconds. -
Resolution, you can change it as you want, but check the original image resolution first.
RES = (784, 1200)
If the original image has a higher resolution, you can change it like this (The resolution is just a example).
RES = (1568, 2400)
For [Cmoa.jp] no need this, the resolution is fixed by [Cmoa.jp].
-
Some time we should log out and log in, this website is very strict and take so many method to prevent abuse.
-
Now you can cut the image by setting
CUT_IMAGE
to (left, upper, right, lower).For example you want to cut 3px from the bottom of image, you can set it to:
CUT_IMAGE = (0, 0, 0, 3)
This function use
Pillow
, if you want to use it, you should install it by using the command:pip install Pillow
By default it is
None
, means do not cut the image. -
You can now change the file name prefix and number of digits by changing
file_name_prefix
andnumber_of_digits
.For example, if you are downloading Kill La Kill Manga Volume 1, and you want the file name like:
KLK_V1 │--KLK_V1_001.jpg │--KLK_V1_002.jpg │--KLK_V1_003.jpg
Then you can set the parameters like below:
settings = { ..., 'file_name_prefix': 'KLK_V1', # File name digits count, if you want your file name like '001.jpg', write 3 here. 'number_of_digits': 3 }
-
Concept
To download Manga, normally we do like this:
+------------+ +-----------+ +------------+ +-------------------+ +--------------+ | | | | | | | |OVER| | | Login +-----+ Load page +----->+ Save image +----->+ Move to next page +----+ Finished | | | | | | | | | | | +------------+ +-----------+ +-----+------+ +---------+---------+ +--------------+ ^ | | | | More page | +-----------------------+
So we can create a framework to reuse the code, for new website, normally we only need to write some of the method.
-
Structure of file
|--main.py │--downloader.py │--README.MD └─website_actions │--abstract_website_actions.py │--bookwalker_jp_actions.py │--bookwalker_tw_actions.py │--cmoa_jp_actions.py │--__init__.py
-
Introduction to abstract
WebsiteActions
class.For each website, the class should have the following methods/attributes, here we use bookwalker.jp as example:
class BookwalkerJP(WebsiteActions): ''' bookwalker.jp ''' # login_url is the page that we load first and put the cookies. login_url = 'https://member.bookwalker.jp/app/03/login' @staticmethod def check_url(manga_url): ''' This method return a bool, check if the given manga url is belong to this class. ''' return manga_url.find('bookwalker.jp') != -1 def get_sum_page_count(self, driver): ''' This method return an integer, get total page number. ''' return int(str(driver.find_element_by_id('pageSliderCounter').text).split('/')[1]) def move_to_page(self, driver, page): ''' This method return nothing, move to given page number. ''' driver.execute_script( 'NFBR.a6G.Initializer.B0U.menu.a6l.moveToPage(%d)' % page) def wait_loading(self, driver): ''' This method return nothing, wait manga loading. ''' WebDriverWait(driver, 30).until_not(lambda x: self.check_is_loading( x.find_elements_by_css_selector(".loading"))) def get_imgdata(self, driver, now_page): ''' This method return String/something can be written to file or convert to BytesIO, get image data. ''' canvas = driver.find_element_by_css_selector(".currentScreen canvas") img_base64 = driver.execute_script( "return arguments[0].toDataURL('image/jpeg').substring(22);", canvas) return base64.b64decode(img_base64) def get_now_page(self, driver): ''' This method return an integer, the page number on the current page ''' return int(str(driver.find_element_by_id('pageSliderCounter').text).split('/')[0])
We also have a
before_download
method, this method run before we start download, because some website need to close some pop-up component before we start downloading.def before_download(self, driver): ''' This method return nothing, Run before download. ''' driver.execute_script('parent.closeTips()')