How to get XML from docx file
theasteve opened this issue · 2 comments
I'm trying to convert a docx
file into PDF. The process I thought about was as follows, convert the docx
file into an HTML file and from HTML into PDF. However, using this process the outcome wasn't what I expected.
testing.pdf
This is what it looks like after the process mentioned above. Here is a link to the origin docx
file
https://www.dropbox.com/s/f1klwguv4r9iyje/testing.docx?dl=0
I think word documents use XML so this might improve how documents are displayed if I saved the file from docx to xml and then into PDF(You might have better direction on this.)
So far I have doc = Docx::Document.open('testing.docx')
When I try to get the XML from the document I get nil
.
[61] pry(#<PDFProducer>)> doc.xml
=> nil
Can one get XML from the word document? Or am I wrong in my assumption that word documents use XML?
doc = Docx::Document.open('testing.docx')
File.open("testing.html", 'wb') do |f|
f << doc.to_html
end
@unixmonkey Just saw your answer, I just updated my post. Should I closed it and open a new one? and bring the old question back to represent your answer? Yes, your answer is correct I came across it earlier.