subject-f/guyamoe

Double-page handling in the reader

Opened this issue · 13 comments

The current implementation of double-page mode has issues when there are pre-stitched spreads in an otherwise normal chapter.
When it should be leaving it as a standalone image, it puts it as part of another spread, messing up the page count and spreads thereafter irrevocably thanks to the implementation of double-page mode. Although I like the implementation otherwise, this should be fixed.
Example: https://guya.moe/proxy/imgur/QhQijGx/1/11

Other readers e.g. mcomix and mangadex handle this more gracefully by looking at the resolution or ratio of the image.

This is an issue exclusive to the proxy since chapters that are actually uploaded to Guya have their pages post-processed and these spreads are detected and labeled accordingly. The frontend reader implementation can then know which pages are spreads via the api and handle the view for spreads in the double page format without problems.

The post-processing solution was added when the proxy feature of Guya was just a fun little side project by @funkyhippo and mostly an easter egg for the site. Now that it's public info and getting linked regularly on sites like Reddit, I suppose it's worth looking into a solution built into the reader. @Algoinde might look into it when he has the time.

Personally though, I'm not a huge fan of the proxy being used to spread scanlations of other manga, especially really huge series like OPM that could potentially get Guya in legal trouble. I'm not too keen on working on features just to cater for the proxy userbase, but I won't refuse it either I guess. 🤷 I'll leave it up to Alg whether or not he wants to follow up on this.

I think we could allow imgur proxying, but we need a clear message in the reader that "you are viewing third-party content; if it infringes your rights, here's Imgur DMCA link".

I can try do a fallback for it, but as all image containers are rendered on chapter load, it would be pretty damn hard to address in a non-jank way (if you go to page 10 and page 2 is a double-spread and it is not loaded, how would it know whether page 10 should be on the left or on the right?)

@Jaegrqualm it is now fixed for Imgur proxies - imgur provides dimensions in their API so it's was possible to implement (thanks @funkyhippo). For the rest of the proxies, the challenge remains, so I leave the issue open.

Hey,

I looked into this issue for a bit, and for sources that do not provide any information on the dimensions I see only two possibilities:

  1. The server needs to preload the headers of the images (32 bytes are enough for PNG files, but JPG files have a weird structure and might require up to 1024) and add the wide flag based on that.
  2. The client needs to preload the images and dynamically change the state of the pages.

I have a working PoC for the first implementation, which adds a negligible overhead from what I can see. I guess the best one would be the second one though since the client needs to load the pages anyway, but that would require a non-trivial refactoring.

So, if it is still something you want to fix, I can either provide a PR on https://github.com/subject-f/cubarimoe for the first implementation or look into the second one.

I think the server-side solution is actually neater. We have a hard restriction on the "synchronous page index, asynchronous page data" design of the reader, so in 2-page mode, a middle-of-the-chapter link will invariably land on the wrong page if we were to try to do this on the client side; it's simply impossible if we're not loading all of them.

Also, the reader is incredibly dead, as I have not worked on it for about 2 years now. I mean, it still works, but if you do go through with your idea for the first one, this will require next to no changes for the reader. I think it looks at the _w in the file name and that's it?

Yes, that's what it does, then it splits the pages in two lists (normals and wides).

I have zero recollection on why I did two lists, but thinking about the logic once again, yeah, it needs to know a full index ahead of time. The server can just send that data in the API response inside the image name as a query param or something (funny how query params in this case would reverse the usual flow of data).

So, I guess I will clean the code a bit and post a PR then.
Right now I just edited the MangaDex proxy, but this can be easily factored and used for all non-Imgur proxies.

@funkyhippo says he doesn't think it's feasible for the server to preload the headers of the images since all our requests are blocking until we migrate to ASGI. And I have no idea when we'll do that, it's "wouldn't it be nice to rewrite" and then 6 months pass by and we never speak about this again.

Right now I just edited the MangaDex proxy

Didn't MD have page dimensions in their API? Or am I recalling wrong? They certainly use this information in their reader.

I looked at the public API, and no they have nothing for that, they just provide the links.
It's possible that they have an internal API for their reader though, I haven't checked that far.

Right now for preloading the headers I just use a request with the Range header which seems to work fine, I guess it would fail if the provider uses a non-standard web server though but that doesn't seem to be the case for the proxies supported by cubari.

With this, for every chapter request we'll be making 10-50 more requests on average, so it'd require ASGI for sure. Be aware cubari handles around 2 million requests a month so it'll have to have very little overhead despite this request multiplication. There's also a possibility of someone feeding a giant list of external URLs as a proxy source, effectively turning it into tiny DoS and nuking django in the process. Metadata is cached for some time, but still.

I see, I didn't take into account the scaling for your main instance.
So, I guess I will keep this change on a fork for now (it's good enough for a self-hosted proxy with few users) and go back to this issue whenever you end up refactoring the code base.