To read & normalize RSS/ATOM/JSON feed data.
(This library is derived from feed-reader renamed.)
npm i @extractus/feed-extractor
// es6 module
import { extract } from '@extractus/feed-extractor'
// CommonJS
const { extract } = require('@extractus/feed-extractor')
// you can specify exactly path to CommonJS version
const { extract } = require('@extractus/feed-extractor/dist/cjs/feed-extractor.js')
// extract a RSS
const result = await extract('https://news.google.com/rss')
console.log(result)
// deno < 1.28
import { extract } from 'https://esm.sh/@extractus/feed-extractor'
// deno > 1.28
import { extract } from 'npm:@extractus/feed-extractor'
import { extract } from 'https://unpkg.com/@extractus/feed-extractor@latest/dist/feed-extractor.esm.js'
Please check the examples for reference.
- Old method
read()
has been marked as deprecated and will be removed in next major release.
Load and extract feed data from given RSS/ATOM/JSON source. Return a Promise object.
extract(String url)
extract(String url, Object parserOptions)
extract(String url, Object parserOptions, Object fetchOptions)
Example:
import { extract } from '@extractus/feed-extractor'
const result = await extract('https://news.google.com/atom')
console.log(result)
Without any options, the result should have the following structure:
{
title: String,
link: String,
description: String,
generator: String,
language: String,
published: ISO Date String,
entries: Array[
{
id: String,
title: String,
link: String,
description: String,
published: ISO Datetime String
},
// ...
]
}
URL of a valid feed source
Feed content must be accessible and conform one of the following standards:
Object with all or several of the following properties:
normalization
: Boolean, normalize feed data or keep original. Defaulttrue
.useISODateFormat
: Boolean, convert datetime to ISO format. Defaulttrue
.descriptionMaxLen
: Number, to truncate description. Default210
(characters).xmlParserOptions
: Object, used by xml parser, view fast-xml-parser's docsgetExtraFeedFields
: Function, to get more fields from feed datagetExtraEntryFields
: Function, to get more fields from feed entry data
For example:
import { extract } from '@extractus/feed-extractor'
await extract('https://news.google.com/atom', {
useISODateFormat: false
})
await extract('https://news.google.com/rss', {
useISODateFormat: false,
getExtraFeedFields: (feedData) => {
return {
subtitle: feedData.subtitle || ''
}
},
getExtraEntryFields: (feedEntry) => {
const {
enclosure,
category
} = feedEntry
return {
enclosure: {
url: enclosure['@_url'],
type: enclosure['@_type'],
length: enclosure['@_length']
},
category: isString(category) ? category : {
text: category['@_text'],
domain: category['@_domain']
}
}
}
})
You can use this param to set request headers to fetch.
For example:
import { extract } from '@extractus/feed-extractor'
const url = 'https://news.google.com/rss'
await extract(url, null, {
headers: {
'user-agent': 'Opera/9.60 (Windows NT 6.0; U; en) Presto/2.1.1'
}
})
You can also specify a proxy endpoint to load remote content, instead of fetching directly.
For example:
import { extract } from '@extractus/feed-extractor'
const url = 'https://news.google.com/rss'
await extract(url, null, {
headers: {
'user-agent': 'Opera/9.60 (Windows NT 6.0; U; en) Presto/2.1.1'
},
proxy: {
target: 'https://your-secret-proxy.io/loadXml?url=',
headers: {
'Proxy-Authorization': 'Bearer YWxhZGRpbjpvcGVuc2VzYW1l...'
}
}
})
Passing requests to proxy is useful while running @extractus/feed-extractor
on browser.
View examples/browser-feed-reader
as reference example.
Extract feed data from JSON string. Return an object which contains feed data.
extractFromJson(String json)
extractFromJson(String json, Object parserOptions)
Example:
import { extractFromJson } from '@extractus/feed-extractor'
const url = 'https://www.jsonfeed.org/feed.json'
// this resource provides data in JSON feed format
// so we fetch remote content as json
// then pass to feed-extractor
const res = await fetch(url)
const json = await res.json()
const feed = extractFromJson(json)
console.log(feed)
JSON string loaded from JSON feed resource.
See parserOptions above.
Extract feed data from XML string. Return an object which contains feed data.
extractFromXml(String xml)
extractFromXml(String xml, Object parserOptions)
Example:
import { extractFromXml } from '@extractus/feed-extractor'
const url = 'https://news.google.com/atom'
// this resource provides data in ATOM feed format
// so we fetch remote content as text
// then pass to feed-extractor
const res = await fetch(url)
const xml = await res.text()
const feed = extractFromXml(xml)
console.log(feed)
XML string loaded from RSS/ATOM feed resource.
See parserOptions above.
git clone https://github.com/extractus/feed-extractor.git
cd feed-extractor
npm i
npm test
git clone https://github.com/extractus/feed-extractor.git
cd feed-extractor
npm install
npm run eval https://news.google.com/rss
The MIT License (MIT)