CenterForOpenScience/pydocx

Question: How to turn off included style tag in html head element?

Closed this issue · 4 comments

Hello PyDocx team,

I have below code to generate html files from word files in a folder and it is working fine.
How can I turn off the embedded style in the output html file's head tag? And possibly replace it with a link tag to an external css file.

sourcedir = os.listdir("sourcedocx/") 
# Iterate over all docx files in the source directory
for file in sourcedir:
    html = PyDocX.to_html(open('sourcedocx/' + file, 'rb'))
    # Write the result to a new file in the output directory
    with codecs.open('outdocxhtml/' + file + '.html', 'w', 'utf-8') as f:
        # Write each file to the destination folder
        f.write(html)
print('Done writing html files')

I am new to Python and would like to apologize for any inconvenience.
Thanks for your help.

Hello,

You can extend PyDocX and update the head method to to use an external CSS sheet. As for removing styles from individual elements, I believe most are adding classes and not styles. Although for each element, you can override the handler for that tag and have it remove any attribute added to the tag that you don't want to include.

Thanks for your reply.

Is there any example code for inspiration?

Our documentation has some examples

Thanks so much.