yujiosaka/headless-chrome-crawler

allowedDomains and deniedDomains parameters are not work

liufuhu opened this issue · 2 comments

What is the current behavior?
I set like this:

crawler.queue({
    url: 'https://a.example.com',
    obeyRobotsTxt: false,
    maxDepth: 2,
    depthPriority: true,
    allowedDomains: [/example\.com$/],
  });

but the argument options of function _checkAllowedDomains in the lib/hccrawler.js, the options does not include the allowedDomains setting.

If the current behavior is a bug, please provide the steps to reproduce
It is a bug, please check. The package version is 1.8.0

The code looks good so I think it should work like that.

The _checkAllowedDomains function passes the array to the checkDomainMatch function that makes a regular JS regex test against the requested url.

If you found out what's wrong please post it here.

@liufuhu are you still experiencing this issue? I wasn't able to reproduce the issue based on your example.
Let me know if so, I'll reopen it.