limkokhole/blogspot-downloader

Various RSS PDF download issues

xthursdayx opened this issue · 1 comments

Thanks for this useful script!

For some reason I seem to only be able to download the specific blog I'm after through the rss feed (with option -p), however every time I run the command the scrapping and downloading stops at a particular post.

I've tried using the -a -s and -p flags to download a specific year (or month) after the post which seems to be causing the problem, but I get the following error:

title: Baru Samarinda
link: https://blog-name.blogspot.com//2015/07/baru-samarinda-terima.html
Download html as PDF, please be patient...18/71
file path: /home/xthursdayx/blogspot-downloader/blog name.blogspot.com /Baru Samarinda Terima Penganugerahan P....pdf
pdfkit IOError

I also tried the command python3 blogspot_downloader.py -lo http://blog-name.blogspot.com/, exporting the results to urls.list and then ran the command python3 blogspot_downloader.py -p -1 <urls.list and got the following error:

URL: Create single pdf: /home/vidrir/blogspot-downloader/flores borneo.blogspot.com.pdf
IOError --one:  wkhtmltopdf reported an error:
Loading page (1/2)
Error: Failed to load https://blog-name.blogspot.com/2015/07/baru-samarinda-terima.html.html?action=backlinks&widgetId=Blog1&widgetType=Blog&responseType=js&postID=6408130842748688230&xssi_token=AOuZoY7PJVRw0EwDhHe-xNsCx9cPbEV4gQ400A1620183076462, with network status code 302 and http status code 400 - Error transferring https://blog-name.blogspot.com/2015/007/baru-samarinda-terima.html?action=backlinks&widgetId=Blog1&widgetType=Blog&responseType=js&postID=6408130842748688230&xssi_token=AOuZoY7PJVRw0EwDhHe-xNsCx9cPbEV4gQ%3A1620183076462 - server replied:
Printing pages (2/2)
Done
Exit with code 1 due to network error: ProtocolInvalidOperationError

Any idea what my problem is? Thanks for the help!

**blog name and post changed for the owner's benefit.

The blog page no longer exist. But try not using -p if got error since it rely on 3rd party library pdfkit and tool wkhtmltopdf out of my control unlike Epub (pypub bundled with this script). Also -p do not support multiple links.

And I fixed pypub to make it able to download some images and text. So -p is not preferred unless Epub version not working due to Javascript.