This is a simple scraper for the site MangaEden written in Python 3.9
It enables you to dowload any kind of manga series from the site https://www.mangaeden.com/it/
I called the script "EDEN"
Premise:
I'm new to GitHub and this is the first time i upload something here
I would like to learn more about the platform but for the moment everything may look a bit rough.
Just Take it as it is ;)
HOW DOES IT WORK?
The Scraper code is composed of a single script file ( "Eden.py" ) in Python 3.9 which takes care of all the stuff
The Script make use of wget ( GNU Wget 1.20.3 built on mingw3 - here: https://eternallybored.org/misc/wget/ )
to download manga main pages
The adoption of Wget as the base tool to scrape the site was an early idea and more over the development I moved to the "Request" lib of python which I had to install through PIP ( the package installer for Python )
"Request" gave me more controll over the download process of the sigle files and most important, allowed me to easily change proxy for each request when needed
By the way, the main page of each manga is still dowloaded by wget. Whit future releases i'm going to convert it to the "Request" lib too
ABOUT PROXIES:
When I started coding the script, i din't know i would need the use of a lot of proxies to get everything done, but testing the program i realized their importance to avoid web server refusing my requests
The Script got a series of proxies saved in a "Dictionary" and checks for their functioning just when the execution starts, removing unreachable ones
ABOUT FILE 501 INSIDE "Pages" FOLDER:
This file is really important
Sometimes a page request get "refused" by mangaeden webserver because too much requests were just sent whit the same IP ( Proxy )
The WebServer realizes im a Scraper and decides to joke me
It responds with a fake page telling "ERROR 503" ( yes, the name is wrong xD ) so that "Request" lib does not return an error and can't understand if the correct response was just sent by the webserver
When this moment comes, having this fake page stored allows the program to compare it with the one he just dowloaded
if they match, too much requests have been sent with the same proxy, and the current download thread sleeps for 0.5 seconds before retrying
Well, basically this is how the script gets everything done
All the new stuff is stored insed "manga/manga-name/chapter-number/" folder
Hope my work is not so bad for you
Whit Love,
Karma
Milano - italy