anka-213/webcomic_reader

xkcd.com: include `srcset` attribute of comic images so that 2x-resolution versions load when relevant

Opened this issue · 2 comments

Background – xkcd offers multiple images using srcset

xkcd uses the srcset HTML attribute to detect and serve different images based on my display’s pixel density. My browser chooses to load higher-resolution images when possible because my computer has a HiDPI/Retina display that can show the full detail of the higher resolution within the same space.

When I load https://xkcd.com/2161/ without Webcomic Reader, the image I see on my computer’s screen is https://imgs.xkcd.com/comics/an_apple_a_day_2x.png, the “2x” resolution version. Though the src of the image refers to https://imgs.xkcd.com/comics/an_apple_a_day.png, the lower-resolution version, my browser prefers the version in the srcset for my screen.

This is the HTML for the img of xkcd #2161, including the srcset attribute:

<img
  src="//imgs.xkcd.com/comics/an_apple_a_day.png"
  title="Even the powerful, tart Granny Smith cultivar is proving ineffective against new Gran-negative doctors."
  alt="An Apple a Day"
  srcset="//imgs.xkcd.com/comics/an_apple_a_day_2x.png 2x"
>

Problem – this script loads the wrong image on my screen

With Webcomic Reader enabled, on https://xkcd.com/2161/, I see https://imgs.xkcd.com/comics/an_apple_a_day.png, the lower-resolution version. This makes for a worse reading experience because the lower-resolution image is blurrier on my screen.

The img tag put on the page by Webcomic Reader looks like this. It has no srcset attribute. (It is also missing the alt attribute.)

<img
  id="wcr_imagen"
  src="https://imgs.xkcd.com/comics/an_apple_a_day.png"
  title="Even the powerful, tart Granny Smith cultivar is proving ineffective against new Gran-negative doctors."
  style="width: 623px; height: 307px; cursor: url(&quot;[…]QmCC&quot;) 16 16, auto;"
>

The description for paginas[i] in the code says that the img selector “gets the <img> element containing the desired image (not just the src, but the whole <img>)”. But this is not happening in this case. The “whole <img>” is missing two attributes that were on the original img.

Notes for the solution

Webcomic Reader should preserve any srcset attributes it find on imgs.

Some pages on xkcd, such as https://xkcd.com/3/, don’t have a srcset attribute and offer no high-resolution image.

xkcd is the site that I notice the problem on, but unless xkcd is selecting the image in an unusual way, other comic sites could be affected. There could be other comics with high-resolution images that people are missing.

This is Webcomic Reader’s current code for xkcd:

{ url: 'xkcd.',
img: ['//div[@id="comic"]//img'],
first: '.="|<"',
last: '.=">|"',
extra: ['<br/>', ['//div[@id="ctitle"]'], function(html, pos) {
var href = xpath('//div[@id="comic"]//a/@href', html);
return '<br/><a href=' + href + '>' +
(href.indexOf('xkcd') >= 0 ? 'Large version' : 'Bonus Link!') +
'</a>';
}, function(html, pos) {
var comic = xpath('//div[@id="comic"]', html);
var img = comic.getElementsByTagName('img')[0];
img.parentNode.removeChild(img);
return comic;
}, function(html, pos){
var nr = link[pos].match(/(\d+)\/$/)[1];
var url = 'http://www.explainxkcd.com/wiki/index.php/' + nr;
return '<a target=\"_blank\" href=\"' + url + '\">Explain Xkcd</a>';
}],
bgcol: '#fff'
},

Getting WCR to select Image B over Image A if B exists isn't too difficult. Had to do something similar for grabbing nav buttons for FurAffinity last year. (There are two nav buttons for it, one for gallery, and one for comics using the built in comic nav)
snippet for example:
back: ['(//span[@class="parsed_nav_links"]//a[contains(.,"PREV")]|//a[@class="auto_link named_url" and contains(.,"PREV")]|//a[@class="prev button-link"])[last()]'],

The problem is grabbing the srcset and then correctly implementing it. I had just done something similar with TwoKinds, since many pages have secret extras hidden (mostly sketches) as a link attached to the comic image. I was able to grab the href info, and then convert it into a image was relatively simple enough.

However, when trying to implement something similar for XKCD, there becomes a problem.

['//div[@id="comic"]//img/@srcset']
That there will grab the srcset as text. I tested it by having that at the start of the extra field. the output it gave was
//imgs.xkcd.com/comics/photo_deposit_2x.png 2x

That 2x is however the problem. There are ways for xpath to tourniquet the extra bits off, usually by "substring-before" and "substring-after". However, I so far ( after half hour or so), I have had little luck in getting those to work within WCR. I suspect it's because of the use of quotation marks and apostrophes, as they are the only two quotation/container characters XPath has, and they are both are already being used.

So it is may be possible, but I am having trouble in doing so.

I figured I would atleast put this info down here incase someone else wanted to try their hand at it.

Well it's been several years and I ended up tackling this.

Would you mind checking this dev branch to see how it works out for you? https://github.com/SoraHjort/webcomic_reader/blob/Fixes-and-Additions/webcomic_reader.user.js

The only issues I'm seeing is related to CSS, and a instance on page 1799 where the regex capture will need to be tweaked to include when the comic page is in a link. which I'll work more on hopefully soon. But I just want to make sure that it's setup works for the request. Give it a try to make sure it doesn't run into any major issues.

Edit: xpath and regex updated to take in account of hyperlinked comic pages.
Edit 2: tweaked the CSS and how it got implemented. It should hopefully look good enough and close to the site's normal look without the issue of going off the side of the screen due to errant positioning.