limkokhole/blogspot-downloader

Error either passing URL or feed URL

will76 opened this issue · 2 comments

This is what I get passing feed URL:

python blogspot_downloader.py -p -f https://foo.blogspot.com/feeds/posts/default
Download in rss feed mode
Scraping rss feed... https://foo.blogspot.com/feeds/posts/default?start-index=1&max-results=25
Traceback (most recent call last):
  File "blogspot_downloader.py", line 636, in <module>
    main()
  File "blogspot_downloader.py", line 610, in main
    url = download(url, url, d_name, ext)
  File "blogspot_downloader.py", line 348, in download
    print('\ntitle: ' + title_raw)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf4' in position 25: ordinal not in range(128)

Exception -2
Traceback (most recent call last):
  File "blogspot_downloader.py", line 636, in <module>
    main()
  File "blogspot_downloader.py", line 610, in main
    url = download(url, url, d_name, ext)
  File "blogspot_downloader.py", line 348, in download
    print('\ntitle: ' + title_raw)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf4' in position 25: ordinal not in range(128)

Exception -1

and this is what I get passing only the simple URL:

python blogspot_downloader.py -p https://foo.blogspot.com
Download in rss feed mode
Scraping rss feed... https://foo.blogspot.com?start-index=1&max-results=25
Try to scrape rss feed url automatically ... https://foo.blogspot.com
Traceback (most recent call last):
  File "blogspot_downloader.py", line 636, in <module>
    main()
  File "blogspot_downloader.py", line 610, in main
    url = download(url, url, d_name, ext)
  File "blogspot_downloader.py", line 213, in download
    soup = BeautifulSoup(r, "lxml")
  File "/usr/local/lib/python2.7/dist-packages/BeautifulSoup.py", line 1522, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/BeautifulSoup.py", line 1147, in __init__
    self._feed(isHTML=isHTML)
  File "/usr/local/lib/python2.7/dist-packages/BeautifulSoup.py", line 1189, in _feed
    SGMLParser.feed(self, markup)
  File "/usr/lib/python2.7/sgmllib.py", line 104, in feed
    self.goahead(0)
  File "/usr/lib/python2.7/sgmllib.py", line 174, in goahead
    k = self.parse_declaration(i)
  File "/usr/local/lib/python2.7/dist-packages/BeautifulSoup.py", line 1463, in parse_declaration
    j = SGMLParser.parse_declaration(self, i)
  File "/usr/lib/python2.7/markupbase.py", line 109, in parse_declaration
    self.handle_decl(data)
  File "/usr/local/lib/python2.7/dist-packages/BeautifulSoup.py", line 1448, in handle_decl
    self._toStringSubclass(data, Declaration)
  File "/usr/local/lib/python2.7/dist-packages/BeautifulSoup.py", line 1381, in _toStringSubclass
    self.endData(subclass)
  File "/usr/local/lib/python2.7/dist-packages/BeautifulSoup.py", line 1251, in endData
    (not self.parseOnlyThese.text or \
AttributeError: 'str' object has no attribute 'text'

I was able to get rid of the second error. Now only getting the encoding one.

You may try to adjust your terminal encoding settings, or better use Python 3.

Python 2 is dead since 1 Jan 2020. I may remove python 2 code in future.

[UPDATE]:
Try to do this if encounter UnicodeEncodeError error:

export PYTHONIOENCODING=utf8; python3 blogspot_downloader.py

Another possible reason for UnicodeEncodeError is open file need set encoding='utf-8', even though this is not related based on your log.