hentai-chan/hentai

[ ENH ] A function to fetch the popular_now section

Closed this issue · 5 comments

Utils.get_homepage currently doesn't fetch the Popular Now section, as It should. So I was thinking fo making a new function in Utils to fetch the current trending Doujins.

I am not actually familiar with the json requests I need to make in order to fetch so currently I don't have any code that does the job but if anyone knows, you can comment the format and I'll implement the function

Don't think nhentai has an API to fetch the popular ones directly, we have to use beautifulsoup to extract the popular ones.
This adds beautifulsoup in the dependency though

Let’s not rush to conclusions, I’ll investigate this issue tomorrow. In theory there should be an query that yields the desired result, it’s just a matter of figuring out what exactly ‘popular now’ entitles which in and of itself should not be too hard if we take this caption literally.

I checked another API that does the same like this one, written in js. What they did was parsed the webpage, they didn't use the API. They used the API for other stuffs but not for the getPopular function so I'm not too positive that there is a APi query that we can use.

I scraped together a solution that uses requests_html to retrieve the title names inside of div.index-popular. One thing that I noticed immediately is that

popular_today = Utils.search_by_query(query="*", sort=Sort.PopularToday)

always contains the doujins from the "Popular Now" section. But since all queries return 25 doujins every time there is no way of knowing which 5 doujins are the most popular now amongst them, see also this issue. If this weren't the case one could just return the top 5 doujins sorted by num_favorites. So, what I did is searching for these titles in popular_today which gives the correct result but adds up the number of total requests made to the server just for this function to three. By comparison, this means that this function invocation takes between 2.5 and 3.0 seconds if we must insist on returning only five and not all twenty-five doujins.

Since "Popular Now" is only on shown on the first page of the homepage, it suffices to change the return type of the get_homepage function to include this information. Long story short, that's what I came up with:

@dataclass
class Homepage:
    popular_now: List[Hentai]
    new_uploads: List[Hentai]

def get_homepage(handler=RequestHandler()) -> Homepage:
    try:
        response = HTMLSession().get(Hentai.HOME)
    except HTTPError as error:
        print(f"{Fore.RED}{error}")
    else:
        titles = response.html.find("div.index-popular", first=True).text

        return Homepage(
            popular_now=[doujin for doujin in Utils.search_by_query(query="*", sort=Sort.PopularToday) if str(doujin) in titles],
            new_uploads=Utils.browse_homepage(1, 1, handler)
        )

If it works, then fine with me. We can optimize it later