shpaker/feedforbot

separate 'url' and 'id'

Closed this issue · 2 comments

url=raw_entry.get('id') if 'id' in raw_entry else raw_entry.get('url'),

Need to separate 'url' and 'id'.
'id' doesn't always contain a link.

test:

import feedparser
from string import Template


def test_feed(feed_url: str):
    feed = feedparser.parse(feed_url)

    s = Template('$link\n$id')
    o = s.substitute(feed.entries[0])
    print(o)
    print('———')


test_feed('http://www.youtube.com/feeds/videos.xml?channel_id=UCAtFkapSeoEGPxm5bC3tvaw')
test_feed('https://github.com/vapoursynth/vapoursynth/commits/master.atom')
test_feed('https://forum.manjaro.org/c/announcements/stable-updates.rss')
test_feed('https://u2.dmhy.org/torrentrss.php?rows=10&cat16=1&trackerssl=1&search=xxx')
test_feed('https://www.opennet.ru/opennews/opennews_all.rss')
test_feed('https://habr.com/ru/rss/feed/posts/xxx/?with_hubs=true)
test_feed('https://dot.kde.org/rss.xml')
test_feed('https://www.linux.com/feed/')
test_feed('https://www.reddit.com/r/linux/top/.rss')
test_feed('https://www.archlinux.org/feeds/news')

output ('url' vs 'id' ):

https://www.youtube.com/watch?v=QAySwZkKLFo
yt:video:QAySwZkKLFo
———
https://github.com/vapoursynth/vapoursynth/commit/83c63ad716112ddc4977de8be91100b6fd13f375
tag:github.com,2008:Grit::Commit/83c63ad716112ddc4977de8be91100b6fd13f375
———
https://forum.manjaro.org/t/stable-update-2020-03-24-kernels-kde-frameworks-5-68-gnome-3-36-libreoffice-6-4-2/131228
forum.manjaro.org-topic-131228
———
https://u2.dmhy.org/details.php?id=37328
85bfei4e6733644b0a643d72e5f56ceb0bca1f28
———
https://www.opennet.ru/opennews/art.shtml?num=52621
https://www.opennet.ru/opennews/art.shtml?num=52621
———
https://habr.com/ru/post/494370/?utm_source=habrahabr&utm_medium=rss&utm_campaign=494370
https://habr.com/ru/post/494370/
———
https://dot.kde.org/2020/03/26/plasma-tv-presenting-plasma-bigscreen
4430 at https://dot.kde.org
———
https://www.linux.com/articles/kubecf-is-what-devops-wanted-marrying-cloud-foundry-with-kubernetes/
https://www.linux.com/?p=579049
———
https://www.reddit.com/r/linux/comments/fpgn71/track_coronavirus_covid19_on_command_line_it/
https://www.reddit.com/r/linux/top/t3_fpgn71
———
https://www.archlinux.org/news/hplip-3203-2-update-requires-manual-intervention/
tag:www.archlinux.org,2020-03-19:/news/hplip-3203-2-update-requires-manual-intervention/
———

Thanks. I think that should be url = raw_entry.get('url') if 'url' in raw_entry else raw_entry.get('id'). Please, reopen issue if you still think that id and url should to be strong separated.

Fixed in 2.2.3