Sitemap-parser

NOTE: This is a fork of the original sitemapper package with full migration to `ESM` and `ts`. The original package can be found here

Parse through a sitemaps xml to get all the urls for your crawler.

Installation

npm install @yeskiy/sitemapper --save

Simple Example

import Sitemapper from '@yeskiy/sitemapper';

const sitemap = new Sitemapper();

sitemap.fetch('https://www.google.com/work/sitemap.xml').then((sites) => {
    console.log(sites);
});

Options

You can add options on the initial Sitemapper object when instantiating it.

requestHeaders: (Object) - Additional Request Headers (e.g. User-Agent)
timeout: (Number) - Maximum timeout in ms for a single URL. Default: 15000 (15 seconds)
url: (String) - Sitemap URL to crawl
debug: (Boolean) - Enables/Disables debug console logging. Default: False
concurrency: (Number) - Sets the maximum number of concurrent sitemap crawling threads. Default: 10
retries: (Number) - Sets the maximum number of retries to attempt in case of an error response (e.g. 404 or Timeout). Default: 0
rejectUnauthorized: (Boolean) - If true, it will throw on invalid certificates, such as expired or self-signed ones. Default: True
lastmod: (Number) - Timestamp of the minimum lastmod value allowed for returned urls
gotParams: (GotOptions) - Additional options to pass to the got library. See Got Options

License

MIT

yeskiy/sitemapper

Sitemap-parser

NOTE: This is a fork of the original sitemapper package with full migration to ESM and ts. The original package can be found here

Installation

Simple Example

Options

License

NOTE: This is a fork of the original sitemapper package with full migration to `ESM` and `ts`. The original package can be found here