yujiosaka/headless-chrome-crawler

Pages with 403 errors not throwing errors

mrispoli24 opened this issue · 1 comments

What is the current behavior?

When you crawl a page that throws a 403 unauthorized error the crawler just hangs and stays there indefinitely. It ignores all timeouts and doesn't throw any erros.

If the current behavior is a bug, please provide the steps to reproduce

If you take the current crawler and try to run from a remote server on Digital Ocean for sites that implement blocking of bots the returned 403 error does not trigger the error promise. This can be replicated with any best buy URL as an example.

What is the expected behavior?

Sites that return 403 unauthorized errors should trigger the onError function and move on to the next URL to be crawled.

What is the motivation / use case for changing the behavior?

If a site implements this type of blocking it would halt your entire crawl process without triggering any kind of notification that this URL failed.

To skip errors and to continue the script , you can use Node Js version < 15