rcarmo/rss2imap

Option for full-text feeds

phavx opened this issue · 5 comments

Since a lot of feeds don't offer complete articles, but only snippets, many RSS clients have an option to pass the link to the full story through things like Google Mobilizer/Instapaper/etc and show the result instead of only the snippet.

On a per-feed basis, this would be a nice option.

You can check out my soup-strainer project and add support. I'll gladly accept the pull request :)

On Mar 20, 2013, at 01:01 , phavx notifications@github.com wrote:

Since a lot of feeds don't offer complete articles, but only snippets, many RSS clients have an option to pass the link to the full story through things like Google Mobilizer/Instapaper/etc and show the result instead of only the snippet.

On a per-feed basis, this would be a nice option.


Reply to this email directly or view it on GitHub.

Thanks for the hint, I'll try, though Py isn't my strong suit and I first have to get acustomed to r2e's code.

Something like that should work?
(partly pseudo)

import feedparser, soup-strainer
f = feedparser.parse(url)
if not FETCH_FULL_PAGE:
    proceed_like_now(f)
else:
    for i in f.entries:
        content = soup-strainer(i["link"])
        rest_as_normal(i)

Could take me a while, but you'll get the idea.

👍 soup-strainer support would be awesome

@rcarmo I don't understand how soup-strainer will handle truncated fields : the problem is not to remove junk HTML content, but to open the page in which that content is, then use soup-strainer.

However, to get full pages from truncated fields, I use rss-bridge (https://github.com/sebsauvage/rss-bridge). It provides quite a complete list of pages processors and can be easily extended.

Soup strainer would fetch the original page first. It mostly worked when I tried it, but I've never felt the need to get get this done since most sites with truncated feeds turned out not to be worth reading in a MUA - I just ended up visiting the view that were.