mandatoryprogrammer/PaperChaser

URL is not accepted.

Opened this issue · 0 comments

When I run the seed file with the public google URL added, I get the following error:

Phonecian:PaperChaser skiwheelr$ node paperchaser.js crawl seed-file.txt
node:internal/process/promises:246
          triggerUncaughtException(err, true /* fromPromise */);
          ^

TypeError [ERR_INVALID_URL]: Invalid URL
    at new NodeError (node:internal/errors:363:5)
    at onParseError (node:internal/url:536:9)
    at new URL (node:internal/url:612:5)
    at /Users/skiwheelr/PaperChaser/libs/parsing.js:135:28
    at Array.map (<anonymous>)
    at Object.get_ids_from_urls (/Users/skiwheelr/PaperChaser/libs/parsing.js:124:37)
    at Command.<anonymous> (/Users/skiwheelr/PaperChaser/paperchaser.js:52:34)
    at Command.listener [as _actionHandler] (/Users/skiwheelr/PaperChaser/node_modules/commander/lib/command.js:473:17)
    at /Users/skiwheelr/PaperChaser/node_modules/commander/lib/command.js:1173:65
    at Command._chainOrCall (/Users/skiwheelr/PaperChaser/node_modules/commander/lib/command.js:1091:12) {
  input: '',
  code: 'ERR_INVALID_URL'
}

If I replace the URL string with random text (e.g. boogieman) it does record the string.

 input: 'boogieman',
  code: 'ERR_INVALID_URL'
}

It seems to return an empty string as input only IF it is a google drive link.

Is there some structure the seed-text file should follow or is it return separated raw url text?