feed-extractor

To read & normalize RSS/ATOM/JSON feed data.

(This library is derived from feed-reader renamed.)

Demo

Install & Usage

Node.js

npm i @extractus/feed-extractor

// es6 module
import { extract } from '@extractus/feed-extractor'

// CommonJS
const { extract } = require('@extractus/feed-extractor')

// you can specify exactly path to CommonJS version
const { extract } = require('@extractus/feed-extractor/dist/cjs/feed-extractor.js')

// extract a RSS
const result = await extract('https://news.google.com/rss')
console.log(result)

Deno

// deno < 1.28
import { extract } from 'https://esm.sh/@extractus/feed-extractor'

// deno > 1.28
import { extract } from 'npm:@extractus/feed-extractor'

Browser

import { extract } from 'https://unpkg.com/@extractus/feed-extractor@latest/dist/feed-extractor.esm.js'

Please check the examples for reference.

APIs

extract()
extractFromJson()
extractFromXml()

Note:

Old method read() has been marked as deprecated and will be removed in next major release.

`extract()`

Load and extract feed data from given RSS/ATOM/JSON source. Return a Promise object.

Syntax

extract(String url)
extract(String url, Object parserOptions)
extract(String url, Object parserOptions, Object fetchOptions)

Example:

import { extract } from '@extractus/feed-extractor'

const result = await extract('https://news.google.com/atom')
console.log(result)

Without any options, the result should have the following structure:

{
  title: String,
  link: String,
  description: String,
  generator: String,
  language: String,
  published: ISO Date String,
  entries: Array[
    {
      id: String,
      title: String,
      link: String,
      description: String,
      published: ISO Datetime String
    },
    // ...
  ]
}

Parameters

`url` required

URL of a valid feed source

Feed content must be accessible and conform one of the following standards:

`parserOptions` optional

Object with all or several of the following properties:

normalization: Boolean, normalize feed data or keep original. Default true.
useISODateFormat: Boolean, convert datetime to ISO format. Default true.
descriptionMaxLen: Number, to truncate description. Default 210 (characters).
xmlParserOptions: Object, used by xml parser, view fast-xml-parser's docs
getExtraFeedFields: Function, to get more fields from feed data
getExtraEntryFields: Function, to get more fields from feed entry data

For example:

import { extract } from '@extractus/feed-extractor'

await extract('https://news.google.com/atom', {
  useISODateFormat: false
})

await extract('https://news.google.com/rss', {
  useISODateFormat: false,
  getExtraFeedFields: (feedData) => {
    return {
      subtitle: feedData.subtitle || ''
    }
  },
  getExtraEntryFields: (feedEntry) => {
    const {
      enclosure,
      category
    } = feedEntry
    return {
      enclosure: {
        url: enclosure['@_url'],
        type: enclosure['@_type'],
        length: enclosure['@_length']
      },
      category: isString(category) ? category : {
        text: category['@_text'],
        domain: category['@_domain']
      }
    }
  }
})

`fetchOptions` optional

You can use this param to set request headers to fetch.

For example:

import { extract } from '@extractus/feed-extractor'

const url = 'https://news.google.com/rss'
await extract(url, null, {
  headers: {
    'user-agent': 'Opera/9.60 (Windows NT 6.0; U; en) Presto/2.1.1'
  }
})

You can also specify a proxy endpoint to load remote content, instead of fetching directly.

For example:

import { extract } from '@extractus/feed-extractor'

const url = 'https://news.google.com/rss'

await extract(url, null, {
  headers: {
    'user-agent': 'Opera/9.60 (Windows NT 6.0; U; en) Presto/2.1.1'
  },
  proxy: {
    target: 'https://your-secret-proxy.io/loadXml?url=',
    headers: {
      'Proxy-Authorization': 'Bearer YWxhZGRpbjpvcGVuc2VzYW1l...'
    }
  }
})

Passing requests to proxy is useful while running @extractus/feed-extractor on browser. View examples/browser-feed-reader as reference example.

`extractFromJson()`

Extract feed data from JSON string. Return an object which contains feed data.

Syntax

extractFromJson(String json)
extractFromJson(String json, Object parserOptions)

Example:

import { extractFromJson } from '@extractus/feed-extractor'

const url = 'https://www.jsonfeed.org/feed.json'
// this resource provides data in JSON feed format
// so we fetch remote content as json
// then pass to feed-extractor
const res = await fetch(url)
const json = await res.json()

const feed = extractFromJson(json)
console.log(feed)

Parameters

`json` required

JSON string loaded from JSON feed resource.

`parserOptions` optional

See parserOptions above.

`extractFromXml()`

Extract feed data from XML string. Return an object which contains feed data.

Syntax

extractFromXml(String xml)
extractFromXml(String xml, Object parserOptions)

Example:

import { extractFromXml } from '@extractus/feed-extractor'

const url = 'https://news.google.com/atom'
// this resource provides data in ATOM feed format
// so we fetch remote content as text
// then pass to feed-extractor
const res = await fetch(url)
const xml = await res.text()

const feed = extractFromXml(xml)
console.log(feed)

Parameters

`xml` required

XML string loaded from RSS/ATOM feed resource.

`parserOptions` optional

See parserOptions above.

Test

git clone https://github.com/extractus/feed-extractor.git
cd feed-extractor
npm i
npm test

Quick evaluation

git clone https://github.com/extractus/feed-extractor.git
cd feed-extractor
npm install

npm run eval https://news.google.com/rss

License

The MIT License (MIT)

kahosan/feed-extractor

feed-extractor

Demo

Install & Usage

Node.js

Deno

Browser

APIs

Note:

extract()

Syntax

Parameters

url required

parserOptions optional

fetchOptions optional

extractFromJson()

Syntax

Parameters

json required

parserOptions optional

extractFromXml()

Syntax

Parameters

xml required

parserOptions optional

Test

Quick evaluation

License

`extract()`

`url` required

`parserOptions` optional

`fetchOptions` optional

`extractFromJson()`

`json` required

`parserOptions` optional

`extractFromXml()`

`xml` required

`parserOptions` optional