Various RSS PDF download issues
xthursdayx opened this issue · 1 comments
Thanks for this useful script!
For some reason I seem to only be able to download the specific blog I'm after through the rss feed (with option -p), however every time I run the command the scrapping and downloading stops at a particular post.
I've tried using the -a
-s
and -p
flags to download a specific year (or month) after the post which seems to be causing the problem, but I get the following error:
title: Baru Samarinda
link: https://blog-name.blogspot.com//2015/07/baru-samarinda-terima.html
Download html as PDF, please be patient...18/71
file path: /home/xthursdayx/blogspot-downloader/blog name.blogspot.com /Baru Samarinda Terima Penganugerahan P....pdf
pdfkit IOError
I also tried the command python3 blogspot_downloader.py -lo http://blog-name.blogspot.com/
, exporting the results to urls.list
and then ran the command python3 blogspot_downloader.py -p -1 <urls.list
and got the following error:
URL: Create single pdf: /home/vidrir/blogspot-downloader/flores borneo.blogspot.com.pdf
IOError --one: wkhtmltopdf reported an error:
Loading page (1/2)
Error: Failed to load https://blog-name.blogspot.com/2015/07/baru-samarinda-terima.html.html?action=backlinks&widgetId=Blog1&widgetType=Blog&responseType=js&postID=6408130842748688230&xssi_token=AOuZoY7PJVRw0EwDhHe-xNsCx9cPbEV4gQ400A1620183076462, with network status code 302 and http status code 400 - Error transferring https://blog-name.blogspot.com/2015/007/baru-samarinda-terima.html?action=backlinks&widgetId=Blog1&widgetType=Blog&responseType=js&postID=6408130842748688230&xssi_token=AOuZoY7PJVRw0EwDhHe-xNsCx9cPbEV4gQ%3A1620183076462 - server replied:
Printing pages (2/2)
Done
Exit with code 1 due to network error: ProtocolInvalidOperationError
Any idea what my problem is? Thanks for the help!
**blog name and post changed for the owner's benefit.
The blog page no longer exist. But try not using -p
if got error since it rely on 3rd party library pdfkit
and tool wkhtmltopdf
out of my control unlike Epub (pypub
bundled with this script). Also -p
do not support multiple links.
And I fixed pypub
to make it able to download some images and text. So -p
is not preferred unless Epub version not working due to Javascript.