paulhammond/webkit2png

Feature to scroll through website so images are loaded

RSully opened this issue · 12 comments

I attempted to capture an imgur album, and the result was a capture with only the top ~20% of images loaded.

Does adding a --delay=5 option make it better?

If not, I think this might be caused by imgur not loading images that are below the "fold" until you scroll the page into view (which would mean this bug is somewhat similar to #60). Which would be messy but maybe not impossible to fix...

Adding the delay does not help.

If I add --delay=5 --js='window.scrollTo(0, document.body.scrollHeight);' then the top and bottom images are loaded, with the middle ones lost. (Though now the right sidebar is at the bottom of the page instead of the top.)

I read your comments in #60, and while I understand the concepts and problem, I can't come up with any solutions.

Edit: can't quite think of a good name to rename this issue to, either

Just scrolling to the bottom won't work as the JS on the page is smart enough to only load the visible images (and not any above the viewport). Try this:

--js='var i=0; function scroll(){ i += 600; if (i > document.height) { window.scrollTo(0,0);webkit2png.start() } else { window.scrollTo(0, i); window.setTimeout(scroll, 500)}}; webkit2png.stop(); scroll();'

Does that work?

That doesn't work - only loads the top few. (Tried 5s, 20s and no delay.)

Do you have a sample gallery I can test against? I've tried using their random functionality to get test cases, but they're all relatively short and work with the js above...

This should be a good example: http://imgur.com/a/1S2u5

Oh. I'm using the latest development version, I think you're probably using the last release (0.6) which doesn't have the functionality to wait for async JS before capturing

Could you try downloading the latest development version and see if it works then?

I am using homebrew's formula.

Do you think we could get to a point soon for another release? A year really hurts when people using the stable version try to submit bug reports, apparently 😉

Edit: tomorrow I'll download and +x the script manually in a testing folder

Using the latest release the --js above mostly works (the last image didn't have enough time to load fully).

Related, unrelated -- is the delay in seconds?

-h only writes:

Web page functionality:
    --delay=DELAY       delay between page load finishing and screenshot

Would be a kind specification! :)

@Saeven yes. The documentation was updated in #98

I've been seeing this issue on more and more sites recently, especially news websites. I think a feature to automatically scroll through a website and wait a second for images to load might be useful.