Error either passing URL or feed URL
will76 opened this issue · 2 comments
will76 commented
This is what I get passing feed URL:
python blogspot_downloader.py -p -f https://foo.blogspot.com/feeds/posts/default
Download in rss feed mode
Scraping rss feed... https://foo.blogspot.com/feeds/posts/default?start-index=1&max-results=25
Traceback (most recent call last):
File "blogspot_downloader.py", line 636, in <module>
main()
File "blogspot_downloader.py", line 610, in main
url = download(url, url, d_name, ext)
File "blogspot_downloader.py", line 348, in download
print('\ntitle: ' + title_raw)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf4' in position 25: ordinal not in range(128)
Exception -2
Traceback (most recent call last):
File "blogspot_downloader.py", line 636, in <module>
main()
File "blogspot_downloader.py", line 610, in main
url = download(url, url, d_name, ext)
File "blogspot_downloader.py", line 348, in download
print('\ntitle: ' + title_raw)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf4' in position 25: ordinal not in range(128)
Exception -1
and this is what I get passing only the simple URL:
python blogspot_downloader.py -p https://foo.blogspot.com
Download in rss feed mode
Scraping rss feed... https://foo.blogspot.com?start-index=1&max-results=25
Try to scrape rss feed url automatically ... https://foo.blogspot.com
Traceback (most recent call last):
File "blogspot_downloader.py", line 636, in <module>
main()
File "blogspot_downloader.py", line 610, in main
url = download(url, url, d_name, ext)
File "blogspot_downloader.py", line 213, in download
soup = BeautifulSoup(r, "lxml")
File "/usr/local/lib/python2.7/dist-packages/BeautifulSoup.py", line 1522, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/BeautifulSoup.py", line 1147, in __init__
self._feed(isHTML=isHTML)
File "/usr/local/lib/python2.7/dist-packages/BeautifulSoup.py", line 1189, in _feed
SGMLParser.feed(self, markup)
File "/usr/lib/python2.7/sgmllib.py", line 104, in feed
self.goahead(0)
File "/usr/lib/python2.7/sgmllib.py", line 174, in goahead
k = self.parse_declaration(i)
File "/usr/local/lib/python2.7/dist-packages/BeautifulSoup.py", line 1463, in parse_declaration
j = SGMLParser.parse_declaration(self, i)
File "/usr/lib/python2.7/markupbase.py", line 109, in parse_declaration
self.handle_decl(data)
File "/usr/local/lib/python2.7/dist-packages/BeautifulSoup.py", line 1448, in handle_decl
self._toStringSubclass(data, Declaration)
File "/usr/local/lib/python2.7/dist-packages/BeautifulSoup.py", line 1381, in _toStringSubclass
self.endData(subclass)
File "/usr/local/lib/python2.7/dist-packages/BeautifulSoup.py", line 1251, in endData
(not self.parseOnlyThese.text or \
AttributeError: 'str' object has no attribute 'text'
will76 commented
I was able to get rid of the second error. Now only getting the encoding one.
limkokhole commented
You may try to adjust your terminal encoding settings, or better use Python 3.
Python 2 is dead since 1 Jan 2020. I may remove python 2 code in future.
[UPDATE]:
Try to do this if encounter UnicodeEncodeError
error:
export PYTHONIOENCODING=utf8; python3 blogspot_downloader.py
Another possible reason for UnicodeEncodeError
is open file need set encoding='utf-8'
, even though this is not related based on your log.