stevenvachon/broken-link-checker

HTML parser fails for some Sites

vvdwivedi opened this issue · 3 comments

Describe the bug
I am testing with two sites for a recursive scan:

  1. https://pg.vvdwivedi.com/
  2. https://www.capillarytech.com/

First one runs fine. I am using the cli from bin folder to run the build code. After digging through the code, I saw that parseHTML(https://github.com/stevenvachon/broken-link-checker/blob/master/lib/internal/parseHTML.js) is having issues, specifically with:

parser.once(FINISH_EVENT, () => { resolve(parser.document) });

There is a TODO with an issue link (sindresorhus/got#834) at that part of code. I can see that the issue is marked as closed, but even after trying out the suggested solution, my scan is not working.

To Reproduce
Build the code. From root directory, execute

./bin/blc https://www.capillarytech.com/ -r

./bin/blc https://pg.vvdwivedi.com -r

Expected behavior
Both should run a full recursive scan.

Environment:

  • OS and version: macOS Catalina (10.15.2)
  • Node.js version: 12.13.1
  • broken-link-checker version: v-8-alpha (checking after building from master branch)

Yeah, the "finish" event isn't emitting for some reason.

@vvdwivedi Im facing the same issue.

Any updates here?