bradmontgomery/word2html

word2html don't work

Closed this issue · 4 comments

when I run word2html 1.docx I get an error
RuntimeError: Invalid input format! Got "docx" but expected one of these: docbook, haddock, html, json, latex, markdown, markdown_github, markdown_mmd, markdown_phpextra, markdown_strict, mediawiki, native, opml, rst, textile

Hi, thanks for opening the Issue. Can you answer the following?

  1. What version of pandoc do you have installed? (run pandoc -v)
  2. Check the supported versions of pypandoc: python -c "import pypandoc; print(pypandoc.get_pandoc_formats()[0])". If you don't see docx in that list, then for some reason your version of pandoc doesn't know how to read Word documents.
  3. Just for completeness, what version of python and which operating system are you using?

This project (word2html) is just a convenient wrapper around pypandoc, which is a python wrapper around pandoc. If either of those aren't working, then this project won't work either. Hopefully the above can help me troubleshoot this with you.

Hi

  1. pandoc 1.12.3.1
  2. $ python -c "import pypandoc; print(pypandoc.get_pandoc_formats()[0])"
    [u'docbook', u'haddock', u'html', u'json', u'latex', u'markdown', u'markdown_github', u'markdown_mmd', u'markdown_phpextra', u'markdown_strict', u'mediawiki', u'native', u'opml', u'rst', u'textile']
  3. Python 2.7.5, CentOS Linux release 7.3.1611

Well, your version of pandoc is a little old (the current is 1.19.2.x), and it definitely doesn't support reading .docx files. (otherwise you'd see that in the list ouput in step 2).

I'm not 100% sure how to fix this. Any chance you could try upgrading pandoc to see if that works?

Since I think this is a problem with the pandoc, and not word2html directly, I'm closing this issue. Feel free to reopen if this doesn't work with a pandoc version that supports .docx files.