This is a Python script to convert a PDF to series of HTML <img> tags with alt texts. It makes the presentation suitable embedded for a blog post and reading on a mobile device and such.
Example Workflow:
- Export presentation from Apple Keynote to PDF file. On Export dialog untick include date and add borders around slides.
- Run the script against generated PDF file to convert it to a series of JPEG files and a HTML snippet with <img> tags
- Optionally, the scripts adds a full URL prefix to <img src>, so you don't need to manually link images to your hosting service absolute URL
- Copy-paste generated HTML to your blog post
Tested with Apple Keynote exported PDFs, but the approach should work for any PDF content.
See example blog post and presentation.
Dependencies (OSX):
sudo port install ghostscript
Please note that Ghostscript 9.06 crashed for me during the export. Please upgrade to 9.07.
Setting up virtualenv and insllating the code:
git clone xxx cd pdf-presentation-to-html curl -L -o virtualenv.py https://raw.github.com/pypa/virtualenv/master/virtualenv.py python virtualenv.py venv . venv/bin/activate pip install pyPdf
Example:
. venv/bin/activate python pdf2html.py test.pdf output
Advanced example:
. venv/bin/activate python pdf2html.py test.pdf output
Even more advanced example with hardcoded URL:
GHOSTSCRIPT=/usr/local/bin/gs python pdf2html.py test.pdf output http://opensourcehacker.com/wp-content/uploads/wpd2013/
Then upload to the server for Wordpress to access:
rsync -av pycon2014 yourserver.example.com:/srv/yoursite/wordpress/wp-content/uploads