AttributeError: object has no attribute 'publish' when parsing CNN source

Question

AttributeError: object has no attribute 'publish' when parsing CNN source

Opened this issue 3 years ago · 1 comments

Hello author . i face this problem a little. I'm currently trying to parse rss source from cnn. But i get this log:

This is my code:.
import feedparser

url = "http://rss.cnn.com/rss/edition.rss"
feed = feedparser.parse(url)
for news in feed.entries:
print(news.published)`

What wrong with my code?

Answer 1 · 2021-06-14T13:01:41.000Z

Hello @buinguyenhoangtho, thanks for reporting this.

Some feed entries are missing publication dates:

>>> import re
>>> import requests
>>> url = "http://rss.cnn.com/rss/edition.rss"
>>> text = requests.get(url).text
>>> sum(1 for i in re.findall('<item>', text))
50
>>> sum(1 for i in re.findall('<pubDate>', text))
49

I manually checked the feed XML to confirm that one of the <pubDate> fields is part of the feed metadata, not a feed entry, and confirmed that feedparser was parsing this correctly:

>>> import feedparser
>>> feed = feedparser.parse(text)
>>> len(feed.entries)
50
>>> 1 if 'published' in feed.feed else 0
1
>>> sum(1 for i in feed.entries if 'published' in i)
48

It appears that the CNN feed has 2 entries out of 50 that are missing <pubDate> fields.

In general, you will need to check whether a key or attribute exists before accessing it, using either Python's getattr function or by wrapping code access in if conditions.