Add support for normalized entry "full text content" ?
isunjn opened this issue · 1 comments
Typically rss feed provide "full text content" directly in their feed file, content:encoded
or description
in rss format, content
or summary
in atom format, etc.
So can feed-extractor try to add a normalized content
property to entry item?
I know you have another package called article-extractor, but I don't want to do a manually html-parse if it already provides it's full "content". also some websites are not server-rendered thus can not be parsed correctly
@isunjn the reason that content
is not included in the default result is because websites handle it inconsistently. Some websites provide this content, while others don't. The 4 default fields chosen are link
, title
, description
and pubdate
which have the highest stability, almost all feeds return them.
If you know exactly a website includes content
in its feed data, you can use getExtraEntryFields() to get them into your extraction result.