hentai-chan/hentai

Module do not work anymore

FloRRenn opened this issue · 7 comments

I got a error like this when try to execute a code

Exception has occurred: RetryError
HTTPSConnectionPool(host='nhentai.net', port=443): Max retries exceeded with url: /random (Caused by ResponseError('too many 503 error responses'))

This is my code

from hentai import Hentai, Format

doujin = Hentai(177013)
print(doujin.url)

Hi, thanks for filing this issue. Can you still access the website through the browser?

Edit: I now received a couple of reports via mail that mentioned something similar happening so this is now a confirmed bug. Instead of responding to each and everyone of them I will try to make sense of it but at the end of the day, there's only so much I can do about it since I'm not in control of the backend. If we're lucky this could also only be a temporary issue. Also, if anyone knows how to reach the site admin that would be great - ideally I want to find a solution that works for everyone.


tl;dr nothing I can fix


After digging into the issue I found out that the nhentai site added cloudflare protection to their website. This is not limited to their API only, when you fire up https://nhentai.net for the first time you will see the cloudflare protection for a few seconds, so pretending to be someone else with a fake UA string or full request header won't do much here either since that's not the only thing cloudflare checks against. By the way ,we are not the only people that face this issue, pretty much service that consumes this API will experience something similar [1] [2]. After some more research I tried a few libraries that promised to circumvent cloudflare protection,

#!/usr/bin/env python3

import cloudscraper

settings = { "browser": "chrome", "platform": "android", "desktop": False }
scraper = cloudscraper.create_scraper(browser = settings)
response = scraper.get("https://nhentai.net/api/gallery/177013").text

print(response) # nope

but even if it did work I would not have been fond of this solution. In my opinion trying to play this game to trick out cloudflare is a lost cause because the site admin can decide to shut down the API at any time. According to other developers this protection may be turned off again after a few days. In my opinion, the best solution would be to add an OAuth2 protocol to the API in combination with a rate limit in order to combat malicious behavior such as scraping the entire catalog, but this is a process the site admin would have to kick off. I think they mentioned a few years ago that they had plans to add authentication to their API but they never got around implementing it. In the past, the API has also been shut down occasionally.

Long story short, while there probably are short-term solution to solve this problem [3], I don't think it's wort the time and effort if it doesn't address the underlying issues. Most if not all language bindings have the same problem, but I will keep my eyes around to see if the protection got turned off again or somebody else found a good solution.

[1] DiamondMiner88/nhentai#14
[2] SylveonDeko/NHentaiAPI#12
[3] https://github.com/FlareSolverr/FlareSolverr

PS: I don't think many users of this library are aware of this feature but there actually is an alternative way to instantiate a Hentai object, namely by feeding the JSON data to the constructor:

from hentai import Hentai

response: dict = None # find a way to obtain the JSON data from the /api/gallery/id endpoint
doujin = Hentai(json=response)

This feature was originally implemented as some sort of caching, i.e. if you stored the JSON with the export method using options=[Option.Raw], this would make it possible to regain access to the properties without making unnecessary requests, which in turn would making parsing this data easy again.

tl;dr nothing I can fix

After digging into the issue I found out that the nhentai site added cloudflare protection to their website. This is not limited to their API only, when you fire up https://nhentai.net for the first time you will see the cloudflare protection for a few seconds, so pretending to be someone else with a fake UA string or full request header won't do much here either since that's not the only thing cloudflare checks against. By the way ,we are not the only people that face this issue, pretty much service that consumes this API will experience something similar [1] [2]. After some more research I tried a few libraries that promised to circumvent cloudflare protection,

#!/usr/bin/env python3

import cloudscraper

settings = { "browser": "chrome", "platform": "android", "desktop": False }
scraper = cloudscraper.create_scraper(browser = settings)
response = scraper.get("https://nhentai.net/api/gallery/177013").text

print(response) # nope

but even if it did work I would not have been fond of this solution. In my opinion trying to play this game to trick out cloudflare is a lost cause because the site admin can decide to shut down the API at any time. According to other developers this protection may be turned off again after a few days. In my opinion, the best solution would be to add an OAuth2 protocol to the API in combination with a rate limit in order to combat malicious behavior such as scraping the entire catalog, but this is a process the site admin would have to kick off. I think they mentioned a few years ago that they had plans to add authentication to their API but they never got around implementing it. In the past, the API has also been shut down occasionally.

Long story short, while there probably are short-term solution to solve this problem [3], I don't think it's wort the time and effort if it doesn't address the underlying issues. Most if not all language bindings have the same problem, but I will keep my eyes around to see if the protection got turned off again or somebody else found a good solution.

[1] DiamondMiner88/nhentai#14 [2] andy840119/NHentaiAPI#12 [3] https://github.com/FlareSolverr/FlareSolverr

PS: I don't think many users of this library are aware of this feature but there actually is an alternative way to instantiate a Hentai object, namely by feeding the JSON data to the constructor:

from hentai import Hentai

response: dict = None # find a way to obtain the JSON data from the /api/gallery/id endpoint
doujin = Hentai(json=response)

This feature was originally implemented as some sort of caching, i.e. if you stored the JSON with the export method using options=[Option.Raw], this would make it possible to regain access to the properties without making unnecessary requests, which in turn would making parsing this data easy again.

thank for your supporting :))

Now its working everyone! Enjoy!
Again...

Now its working everyone! Enjoy! Again...

Assuming we're still broken and just waiting on a turn-around basically?

Now its working everyone! Enjoy! Again...

Assuming we're still broken and just waiting on a turn-around basically?

Yepp, I dont know if they are rate limiting or blocking the requests, once in a blue moon it works, and for other modules, they are using cookies to make it work..