ExtractorHTML matches srcset attribute case-sensitively
ato opened this issue · 0 comments
ato commented
Links of the form <source srcSet="1.jpg 1x, 2.jpg 2x">
are being extracted as a single url like 1.jpg%201x,%202x.jpg%202x
. It appears that the srcset parser is not invoked unless the srcset attribute is fully lowercase.
This appears to be because ExtractorHtml.elementContext() does not lowercase the attribute and then it's tested in processEmbed() when deciding to invoke the srcset parser with a case-sensitive comparison:
if (context.equals(HTMLLinkContext.IMG_SRCSET.toString())
|| context.equals(HTMLLinkContext.SOURCE_SRCSET.toString())
|| context.equals(HTMLLinkContext.IMG_DATA_SRCSET.toString())
|| context.equals(HTMLLinkContext.IMG_DATA_ORIGINAL_SET.toString())
|| context.equals(HTMLLinkContext.SOURCE_DATA_ORIGINAL_SET.toString())) {