yujiosaka/headless-chrome-crawler

Requesting to enable Host-Resolver-Rules option or Option to Ignore specific Urls from crawling

SriGitHubSri opened this issue · 3 comments

What is the current behavior?
For our website currently I am using headless-chrome-crawler(executing crawler and printing urls crawled), and could see that it is trying to crawl few pages, which we dont want it to crawl. It would be helpful if we could have an option to provide a list/pattern of Urls to the crawler, which we want it to ignore while crawler execution.

If the current behavior is a bug, please provide the steps to reproduce

What is the expected behavior?

What is the motivation / use case for changing the behavior?

Please tell us about your environment:

  • Version: 1.8.0
  • Platform / OS version: Windows 7
  • Node.js version: 8.11.1

Crawler's connect(), launch() and the queue() function provides both allowedDomains and deniedDomains arguments in which you can specify an array of strings and regular expressions.

@SriGitHubSri did @BubuAnabelas answer solves your problem?

closing due to inactivity