/Manga_downloader

A Manga download framework using selenium.

Primary LanguagePython

NEW METHOD FOR BW TO TRY!

New Version

v0.3.3 (Update Recommended)

  • Update to Chromium 112.0.5590.0.
  • Better support for BW books page number pattern, support novels.

Download it in the release or here: Windows x64 release build v0.3.3

v0.3.2

This version has some good features below for BW:

  • Could download the cover (If the cover image is jpeg, please check it or better convert it to png before you share it, because the jpeg file will contain your BW account info).
  • Could name the image automictically using the page number, you can start at any page you want and go forward or go back!
  • Could name the folder with the BW uuid but not a random one anymore.
  • No more blank or repeating page will be skipped, no more image hash to check repeating page, better performance.

Example screenshot: 1670681578(1)

If you find the file name all become "cover_or_extra_xxx" when downloading some manga, please file a bug, there may be more URL patterns in BW than I have seen or they changed the pattern, it should be covered to make the page number working correctly.

Download it in the release or here: Windows x64 release build v0.3.2

v0.3.1

This version has improved the performance about saving snapshot, if you have some problems that the browser become very slow during downloading, please try the new version.

Download it in the release or here: Windows x64 release build v0.3.1

v0.3

Fixed the problem that some manga has width less than 800px could not be downloaded, see #113.

Download it in the release or here: Windows x64 release build v0.3

v0.2.1

Pump Chromium to 109.0.5393, may fix some problems.

Download it in the release or here: Windows x64 release build v0.2.1

v0.2

This version is based on Chromium 106.0.5243.0, the changes are below:

  • Support https://ebook.tongli.com.tw, will save the downloaded images in the C:\bw_export_data\TONGLI_URL_STRING
  • Support https://www.dlsite.com, but this is saving the cache images, so the final 3~4 pages should be downloaded as below (for example we have 10 pages):
    • Go through page 1 to 10 (Make sure the current page is full loaded when you go to next page).
    • You will find that at page 10, there are maybe only images for page 1-7.
    • Go back from page 10 to page 5, you will find that the final pages are saved. (but maybe in reverse order)
    • Currently we could not do anything better than this.
  • Works for https://book.dmm.com, use the script below to move page:
  window.i=0;setInterval(()=>{NFBR.a6G.Initializer.views_.menu.options.a6l.moveToPage(window.i);console.log(window.i);window.i++;},3000)

The script above is for DMM, for BW please use the script below:

  window.i=0;setInterval(()=>{NFBR.a6G.Initializer.L7v.menu.options.a6l.moveToPage(window.i);console.log(window.i);window.i++;},3000)
  • Maybe slitely faster for BW and may download some images that width > height.

Download it in the release or here: Windows x64 release build v0.2

How to use (Same as the old version):

  1. Unzip the file BW-downloader-chrome-bin.zip.
  2. Open a powsershell or cmd, cd to the unzipped browser dir.
  3. Open the browser with command line .\chrome.exe --user-data-dir=c:\bw-downloader-profile --no-sandbox
  4. Browser the manga, manga will be saved to C:\bw_export_data

Do not use it for other website, only use it as a Managa downloader, it is not as safe as normal chrome browser!

Old Version

Download it in the release or here: Windows x64 release build v0.1

If you are finding something to download BW, please try this method for BW, it's really a good thing to try, you will like it!

For coma, please see below.

Now have a new method, with a customized chromium browser, it can download BW original image with it's original size very easily. It can download both manga and novel, and may be used to every website that use canvas to render the page (now only tested on BW).

It is only a dev version, may have bugs and may crash, but you can download the customized browser and try it now.

Do not use it for other website, only use it as a BW downloader, it is not as safe as normal chrome browser!

Clone this repo or only download the BW-downloader-chrome-bin.zip

  1. Unzip the file BW-downloader-chrome-bin.zip.

  2. Open a powsershell or cmd, cd to the unzipped browser dir.

  3. Open the browser with command line .\chrome.exe --user-data-dir=c:\bw-downloader-profile --no-sandbox

  4. Adjust your browser window size, make it smaller and can only display one manga page, example below image

  5. You can now go to BW website, log in and open the manga you want to download, remember to reset the manga's read status before you open it.

  6. Press F12, make it a separate window (see the image below) and run the script below (just go to console and copy-past the code, press enter) to move the page automatically, if your network is good, you can change the 3000 to a smaller number, 3000 means 3000ms -> 3s, every 3s it will move to the next page.

    You can also manually click the mouse left button / use a keyboard arrow key / use a keyboard simulation software to move the page, you can choose the way you like, just make sure the page is moving.

    window.i=0;setInterval(()=>{NFBR.a6G.Initializer.L7v.menu.options.a6l.moveToPage(window.i);console.log(window.i);window.i++;},3000)

    If this not work and show 'Uncaught TypeError: Cannot read properties of undefined (reading 'menu') at :1:54', it means BW has updated the js, you can try to find it in the console, just try NFBR.a6G.Initializer.*.menu is not undefined and the * is the new object name; Or you can just file a bug.

    image

  7. Now you can check your C:\bw_export_data, you can find a random uuid folder with all the manga images in it. image

If you want to download multiple manga at the same time, just open as many manga as you want, and do the step 5 to 6.

This method is very easy to use, stable and no need to find any resolution or cookies, and it can download the real original image with no barcode.

Maybe will add a new browser ui to it and can click to download in the future.

May not download the cover page now.

Only built on windows, no program now for other platform.

If you find some problem, please file a bug, thank you!

Manga_Downloader

A Manga download framework using selenium.

Now support the websites below:

  1. Bookwalker.jp
  2. Bookwalker.com.tw
  3. Cmoa.jp

Program will check the website of given URL automaticity

If the website you given is unsupported, the program will raise an error.

Now support multi manga download with only login one time

现在支持批量下载

you should prepare the information below:

  1. Manga URL
  2. Cookies
  3. Image dir (Where to put the image, folder name)
  4. Some website you should see the size of image and set it at res. Cmoa.jp doesn't need this.

How to Use

All settings

All the settings are in main.py.

settings = {
    # Manga urls, should be the same website
    'manga_url': [
        'URL_1',
        'URL_2'
    ],
    # Your cookies
    'cookies': 'YOUR_COOKIES_HERE',
    # Folder names to store the Manga, the same order with manga_url
    'imgdir': [
        'IMGDIR_FOR_URL_1',
        'IMGDIR_FOR_URL_2'
    ],
    # Resolution, (Width, Height), For cmoa.jp this doesn't matter.
    'res': (1393, 2048),
    # Sleep time for each page (Second), normally no need to change.
    'sleep_time': 2,
    # Time wait for page loading (Second), if your network is good, you can reduce this parameter.
    'loading_wait_time': 20,
    # Cut image, (left, upper, right, lower) in pixel, None means do not cut the image. This often used to cut the edge.
    # Like (0, 0, 0, 3) means cut 3 pixel from bottom of the image.
    'cut_image': None,
    # File name prefix, if you want your file name like 'klk_v1_001.jpg', write 'klk_v1' here.
    'file_name_prefix': '',
    # File name digits count, if you want your file name like '001.jpg', write 3 here.
    'number_of_digits': 3,
    # Start page, if you want to download from page 3, set this to 3, None means from 0
    'start_page': None,
    # End page, if you want to download until page 10, set this to 10, None means until finished
    'end_page': None,
}

Install environment & How to Get URL/Cookies

This program now work for Chrome, if you use another browser, please check this page

  1. Install python packages selenium and pillow and get the Google chrome Drivers.

    1. For selenium ad pillow:
    pip install selenium
    pip install Pillow
    # This undetected_chromedriver is prevent us from been detected by BW
    pip install undetected_chromedriver
    1. For Google chrome Drivers:

      1. Please check your Chrome version, 'Help'->'About Google Chrome'.

      2. Download Chrome Driver fit to your Chrome version here.

      3. Put it into any folder and add the folder into the PATH.

    2. For more info, I suggest you to check it here

  2. Change the IMGDIR in the main.py to indicate where to put the manga.

  3. Add your cookies in the program.

    Remember to use F12 to see the cookies!

    Because some http only cookies can not be seen by javascript!

    Remember to go to the links below to get the cookies!

    1. For [Bookwalker.jp] cookies, go here.
    2. For [Bookwalker.com.tw] cookies, go here.
    3. For [www.cmoa.jp] cookies, go here and you must get cookies by plug-in EditThisCookie, download it for chrome here.
    • For EditThisCookie, this can be used in any website above, but for cmoa you must use this method

      1. Go to user preferences (chrome-extension://fngmhnnpilhplaeedifhccceomclgfbg/options_pages/user_preferences.html) of EditThisCookie
      2. Set the cookie export format to Semicolon separated name=value pairs
      3. Go to cmoa, click the EditThisCookie and click export button
      4. Copy the cookies in the file (After the // Example: http://www.tutorialspoint.com/javascript/javascript_cookies.htm) into the program
    • For the traditional way

      1. Open the page.
      2. Press F12.
      3. Click on the Network.
      4. Refresh the page.
      5. Find the first profile request, click it.
      6. On the right, there will be a Request Headers, go there.
      7. Find the cookie:...., copy the string after the cookie:, paste to the main.py, YOUR_COOKIES_HERE
  4. Change the manga_url in the main.py.

    1. For [Bookwalker.jp]

      First go to 購入済み書籍一覧, you can find all your mangas here.

      This time the URL is the URL of 'この本を読む' button for your manga.

      Right click this button, and click 'Copy link address'.

      The URL is start with member.bookwalker.jp, not the viewer.bookwalker.jp. Here we use the manga 【期間限定 無料お試し版】あつまれ!ふしぎ研究部 1.

      This is the URL of the あつまれ!ふしぎ研究部 1: https://member.bookwalker.jp/app/03/webstore/cooperation?r=BROWSER_VIEWER/640c0ddd-896c-4881-945f-ad5ce9a070a6/https%3A%2F%2Fbookwalker.jp%2FholdBooks%2F

    2. For [Bookwalker.com.tw]

      Please go to 线上阅读.

      The manga URL like this:https://www.bookwalker.com.tw/browserViewer/56994/read

    3. For [Cmoa.jp]

      Open the Manga and just copy the URL on the browser.

      The manga URL like this : https://www.cmoa.jp/bib/speedreader/speed.html?cid=0000156072_jp_0001&u0=0&u1=0&rurl=https%3A%2F%2Fwww.cmoa.jp%2Fmypage%2Fmypage_top%2F%3Ftitle%3D156072

    Just copy this URL to the MANGA_URL in main.py.

  5. After edit the program, run python main.py to run it.

Notice

  1. The SLEEP_TIME by default is 2 seconds, you can adjust it with your own network situation, if the downloading has repeated images, you can change it to 5 or more. If you think it's too slow, try change it to 1 or even 0.5.

  2. LOADING_WAIT_TIME = 20, this is the time to wait until the manga viewer page loaded, if your network is not good, you can set it to 30 or 50 seconds.

  3. Resolution, you can change it as you want, but check the original image resolution first.

    RES = (784, 1200)

    If the original image has a higher resolution, you can change it like this (The resolution is just a example).

    RES = (1568, 2400)

    For [Cmoa.jp] no need this, the resolution is fixed by [Cmoa.jp].

  4. Some time we should log out and log in, this website is very strict and take so many method to prevent abuse.

  5. Now you can cut the image by setting CUT_IMAGE to (left, upper, right, lower).

    For example you want to cut 3px from the bottom of image, you can set it to:

    CUT_IMAGE = (0, 0, 0, 3)

    This function use Pillow, if you want to use it, you should install it by using the command:

    pip install Pillow

    By default it is None, means do not cut the image.

  6. You can now change the file name prefix and number of digits by changing file_name_prefix and number_of_digits.

    For example, if you are downloading Kill La Kill Manga Volume 1, and you want the file name like:

        KLK_V1
        │--KLK_V1_001.jpg
        │--KLK_V1_002.jpg
        │--KLK_V1_003.jpg
    

    Then you can set the parameters like below:

    settings = {
        ...,
        'file_name_prefix': 'KLK_V1',
        # File name digits count, if you want your file name like '001.jpg', write 3 here.
        'number_of_digits': 3
    }

Develop

  1. Concept

    To download Manga, normally we do like this:

    +------------+     +-----------+      +------------+      +-------------------+      +--------------+
    |            |     |           |      |            |      |                   | OVER |              |
    |   Login    +-----+ Load page +----->+ Save image +----->+ Move to next page +----->+   Finished   |
    |            |     |           |      |            |      |                   |      |              |
    +------------+     +-----------+      +-----+------+      +---------+---------+      +--------------+
                                                ^                       |
                                                |                       |
                                                |      More page        |
                                                +-----------------------+
    

    So we can create a framework to reuse the code, for new website, normally we only need to write some of the method.

  2. Structure of file

    |--main.py
    │--downloader.py
    │--README.MD
    └─website_actions
        │--abstract_website_actions.py
        │--bookwalker_jp_actions.py
        │--bookwalker_tw_actions.py
        │--cmoa_jp_actions.py
        │--__init__.py
    
  3. Introduction to abstract WebsiteActions class.

    For each website, the class should have the following methods/attributes, here we use bookwalker.jp as example:

    class BookwalkerJP(WebsiteActions):
        '''
        bookwalker.jp
        '''
    
        # login_url is the page that we load first and put the cookies.
        login_url = 'https://member.bookwalker.jp/app/03/login'
    
        @staticmethod
        def check_url(manga_url):
            '''
            This method return a bool, check if the given manga url is belong to this class.
            '''
            return manga_url.find('bookwalker.jp') != -1
    
        def get_sum_page_count(self, driver):
            '''
            This method return an integer, get total page number.
            '''
            return int(str(driver.find_element_by_id('pageSliderCounter').text).split('/')[1])
    
        def move_to_page(self, driver, page):
            '''
            This method return nothing, move to given page number.
            '''
            driver.execute_script(
                'NFBR.a6G.Initializer.B0U.menu.a6l.moveToPage(%d)' % page)
    
        def wait_loading(self, driver):
            '''
            This method return nothing, wait manga loading.
            '''
            WebDriverWait(driver, 30).until_not(lambda x: self.check_is_loading(
                x.find_elements_by_css_selector(".loading")))
    
        def get_imgdata(self, driver, now_page):
            '''
            This method return String/something can be written to file or convert to BytesIO, get image data.
            '''
            canvas = driver.find_element_by_css_selector(".currentScreen canvas")
            img_base64 = driver.execute_script(
                "return arguments[0].toDataURL('image/jpeg').substring(22);", canvas)
            return base64.b64decode(img_base64)
    
        def get_now_page(self, driver):
            '''
            This method return an integer, the page number on the current page
            '''
            return int(str(driver.find_element_by_id('pageSliderCounter').text).split('/')[0])

    We also have a before_download method, this method run before we start download, because some website need to close some pop-up component before we start downloading.

    def before_download(self, driver):
        '''
        This method return nothing, Run before download.
        '''
        driver.execute_script('parent.closeTips()')