Option for full-text feeds

Question

Option for full-text feeds

phavx opened this issue 12 years ago · 5 comments

Since a lot of feeds don't offer complete articles, but only snippets, many RSS clients have an option to pass the link to the full story through things like Google Mobilizer/Instapaper/etc and show the result instead of only the snippet.

On a per-feed basis, this would be a nice option.

Answer 1 · 2013-03-20T10:12:26.000Z

You can check out my soup-strainer project and add support. I'll gladly accept the pull request :)

On Mar 20, 2013, at 01:01 , phavx notifications@github.com wrote:

Since a lot of feeds don't offer complete articles, but only snippets, many RSS clients have an option to pass the link to the full story through things like Google Mobilizer/Instapaper/etc and show the result instead of only the snippet.

On a per-feed basis, this would be a nice option.

—
Reply to this email directly or view it on GitHub.

Answer 2 · 2013-03-20T15:09:57.000Z

Thanks for the hint, I'll try, though Py isn't my strong suit and I first have to get acustomed to r2e's code.

Something like that should work?
(partly pseudo)

import feedparser, soup-strainer
f = feedparser.parse(url)
if not FETCH_FULL_PAGE:
    proceed_like_now(f)
else:
    for i in f.entries:
        content = soup-strainer(i["link"])
        rest_as_normal(i)

Could take me a while, but you'll get the idea.

Answer 3 · 2013-06-27T14:56:14.000Z

👍 soup-strainer support would be awesome

Answer 4 · 2016-03-12T15:06:38.000Z

@rcarmo I don't understand how soup-strainer will handle truncated fields : the problem is not to remove junk HTML content, but to open the page in which that content is, then use soup-strainer.

However, to get full pages from truncated fields, I use rss-bridge (https://github.com/sebsauvage/rss-bridge). It provides quite a complete list of pages processors and can be easily extended.

Answer 5 · 2016-03-12T20:48:58.000Z

Soup strainer would fetch the original page first. It mostly worked when I tried it, but I've never felt the need to get get this done since most sites with truncated feeds turned out not to be worth reading in a MUA - I just ended up visiting the view that were.