yujiosaka/headless-chrome-crawler

Is there a way to scroll?

wemow opened this issue · 4 comments

wemow commented

What is the current behavior?
No documented way of scrolling

What is the expected behavior?
Being able to scroll

What is the motivation / use case for changing the behavior?
Being able to scroll dynamically loaded content by scrolling

Sorry, @yemd, I cannot reach this library maintainer to get access to publishing updates. I'd recommend building a custom solution using puppeteer instead of using this library.

Here is how I kept scrolling through a list for lazy loaded products untill the crawler reached the bottom of the page. I hope this helps :)

const productCrawler = await Crawler.launch({
  /*... */
});

await productCrawler.queue({
  url: '...',
  retryCount: 1,
  maxDepth: 3,
  depthPriority: false,
  waitUntil: 'networkidle0',
  jQuery: false,
  waitFor: {
    options: {},
    args: [config], // args for selectorOrFunctionOrTimeout
    selectorOrFunctionOrTimeout: function (config) {
      const documentHeight = document.documentElement.scrollHeight;

      window.scrollTo(0, documentHeight);

      // You might want to check if there are any elements still loading (look for spinners, other indicators, or just wait)
      // Return true if you are done scrolling, false otherwise

      return true; 
    },
  },
});

await productCrawler.onIdle();
await productCrawler.close();

If not you can always scroll inside the evaluatePageMethod

const productCrawler = await Crawler.launch({
  // ...
  evaluatePage: eval(`() => {
    const documentHeight = document.documentElement.scrollHeight;

    window.scrollTo(0, documentHeight);
  }`),
  // ...
})

Take a look at get-set-fetch infinite scrolling example. It may prove a viable alternative.
Disclaimer: I'm the repo owner.

worked for me like that:

        customCrawl: async (page, crawl) => {
            await page.setViewport({
                width: 1200,
                height: 800
            });
            const result = await crawl();

            await page.evaluate(scrollToBottom);
            await page.waitFor(3000);
            return result;
        },
...
async function scrollToBottom() {
    await new Promise(resolve => {
        const distance = 100; // should be less than or equal to window.innerHeight
        const delay = 100;
        const timer = setInterval(() => {
            document.scrollingElement.scrollBy(0, distance);
            if (document.scrollingElement.scrollTop + window.innerHeight >= document.scrollingElement.scrollHeight) {
                clearInterval(timer);
                resolve();
            }
        }, delay);
    });
}