SlideGrubber is a python package that can download SlideShare presentations as PDF files. It uses the power of BeautifulSoup, ImageMagick (through Wand), and Requests.
You will need ImageMagick installed on your system to be able to do the image-to-pdf conversion.
$ apt-get install imagemagick
$ pip install slidegrubber
You can pass the url to initialize the class and then call grub() to download the presentation to a pdf file.
>>> from slidegrubber import SlideGrubber
>>> s = SlideGrubber('http://www.slideshare.net/author/my-slide')
Your presentation My Slide by author is ready for processing.
>>> s.grub()
'/current_working_directory/my-slide-by-author.pdf'
If no filename or path is specified the presentation will be downloaded to the current working directory using the url to build the name. But you can also specify the output path, like so:
>>> s.grub('/my_local_path/my_slide.pdf')
'/my_local_path/my_slide.pdf'
You can get additional information such as the title, author, and (after the presentation has been downloaded) processed output path:
>>> s.title
u'My Slide'
>>> s.author
u'The Author'
>>> s.filename
u'my-slide-by-author'
The slides markup can also be accessed as a property and you can pass the images markup to grub() as a second argument to specify what images to convert to pdf. This is helpful if you only need a fraction of the images but it requires more work.
As of v2.7, the slides are now sorted.
>>> # get entire markup
>>> markup = s.slides_markup
(u'<img ...>', u'<img ...>', ...)
>>> # grab first five slides
>>> s.grub('', markup[:5])
u'/current_working_directory/my-slide-by-author.pdf'
v2.4 fills SlideGrubber with log messages. This should be looked at more carefully...