Ignoring Alt Text when convert from docx to txt
caphefalumi opened this issue · 1 comments
caphefalumi commented
Currently, when I convert from docx to txt, the alt text of images is retrieved along with the paragraphs as something like "[ALT TEXT]", how do I exclude alt text?
Here is my code
pypandoc.convert_file(docx_path, 'plain', extra_args=['--wrap=none'], outputfile='output.txt')
JessicaTegner commented
From the pandoc user guide:
A link immediately preceded by a ! will be treated as an image. The link text will be used as the image’s alt text:
![la lune](lalune.jpg "Voyage to the moon")
![movie reel]
[movie reel]: movie.gif
Extension: implicit_figures
An image with nonempty alt text, occurring by itself in a paragraph, will be rendered as a figure with a caption. The image’s alt text will be used as the caption.
![This is the caption](/url/of/image.png)
[...]
If you just want a regular inline image, just make sure it is not the only thing in the paragraph. One way to do this is to insert a nonbreaking space after the image:
![This image won't be a figure](/url/of/image.png)\