stevenvachon/broken-link-checker

Alpha version 0.8.0 does not follow found links recursively

ralphbolliger opened this issue · 3 comments

Describe the bug
This morning I played around with release version 0.7.8 (yarnpkg). Works fine so far. This afternoon I was curious how alpha version 0.8.0 (github) works. Unfortunately I can't get it to scan through a website recursively. It only logs the URL I defined as staring point in console / node stdout.

This is my index.js:

const {SiteChecker} = require('broken-link-checker');

let options = {
        acceptedSchemes: ['http', 'https'],
        honorRobotExclusions: false,
        cacheResponses: false
    },
    customData = null,
    siteUrl = new URL('https://www.example.com');

const siteChecker = new SiteChecker(options)
    .on('error', (error) => {
    })
    .on('robots', (robots, customData) => {
    })
    .on('html', (tree, robots, response, pageURL, customData) => {
        console.log(pageURL.href)
    })
    .on('queue', () => {
    })
    .on('junk', (result, customData) => {
    })
    .on('link', (result, customData) => {
    })
    .on('page', (error, pageURL, customData) => {
    })
    .on('site', (error, siteURL, customData) => {
        console.log(siteURL.href)
    })
    .on('end', () => {
        console.log('Done!')
    });

siteChecker.enqueue(siteUrl, customData);

To Reproduce

  1. Add broken-link-checker from github via yarn add
  2. Build it via yarn build in node_modules/broken-link-checker
  3. Create an index.js in project root and copy and paste my example mentioned above
  4. Run node index.js in command line

Expected behavior
A list of URLs based on the given URL as starting point like this:
https://www.example.com
https://www.example.com/2017/12/08/kalte-winterdaemmerung-am-rheinfall/
https://www.example.com/author/johndoe/
https://www.example.com/2017/11/12/konzert-kammgarn/
https://www.example.com/2017/10/18/portrait-shooting/
https://www.example.com/2017/10/15/wochenendtrip/
https://www.example.com/2017/10/01/zu-besuch/
https://www.example.com/2017/06/29/gewitterfront/
https://www.example.com/2017/06/15/la-belle-paris/
https://www.example.com/2017/03/13/alvaro-soler/
...

Environment:

  • macOS 10.15.3 (19D76) Catalina
  • Node.js version: v12.16.1
  • broken-link-checker version: 0.8.0 (read from package.json)

change

acceptedSchemes: ['http', 'https'],

to

acceptedSchemes: ['http:', 'https:'],

Perhaps this should be handled in the options parser to simplify the API.

One should read the manual carefully… 🤦🏼‍♂️
Thanks for the hint, now it works as expectet.