ShayHill/docx2python

Extracted image looks different from the one displayed on Word2021

songyuc opened this issue · 4 comments

Hi guys,
I am new to docx2python and learning to use it.
The case is that I find the extracted image looks different from the one displayed on Word2021.
Image A displayed on Word2021:
image
Image B extracted with python:
image

They look different, as A looks like a part of B.
So, how can I solve it?

Your answer and guide will be appreciated!

Can you post an example file?

Sent from my iPhone On Aug 10, 2022, at 09:19, songyuc @.> wrote:  Hi guys, I am new to docx2python and I want know whether I can extract all the pictures and the text of the corresponding legend in a word document? Here is the example, [image]https://user-images.githubusercontent.com/27288110/183924628-be76f8f6-11de-4d97-833c-b7a0343acad1.png Your answer and guide will be appreciated! — Reply to this email directly, view it on GitHub<#37>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADAKIEYAE7XLISKNF2P3ZY3VYO27ZANCNFSM56EXCLDA. You are receiving this because you are subscribed to this thread.Message ID: @.>

Here is the file, https://docs.google.com/document/d/1kUnmt8HfXDjr6OQN9aiiBeFSJAsrfnk7/edit?usp=sharing&ouid=117403696964406551444&rtpof=true&sd=true

Thank you for your patience. I have examined the file. A docx file keeps images inside an internal folder. In this case, the image is "image1.tiff", which is your "Image B". Docx crops this image when displaying it, so you only see the upper portion ("Image A"). The only way to replicate this would be to alter the "image1.tiff" image file, which is outside the scope of docx2python.

Thank you for reaching out, however. And thank you for using doxc2python.