ankushshah89/python-docx2txt

A pure python based utility to extract text and images from docx files.

PythonMIT

Issues

docx2text - unwrapping zip - fails and crashes
#48 opened a year ago by ventz
1
Is there the possibility to pass the entire file instead of the file name in the process() function?
#33 opened 4 years ago by Zast996
1
Python does not recognize the italic font in Docx
#46 opened a year ago by me-suzy
0
Page numbers for each page
#44 opened 2 years ago by Higgs32584
3
Fails to extract .emf image from the .docx document
#45 opened 2 years ago by sand1k
0
text = docx2txt.process("file.docx", "/tmp/img_dir") File not found
#40 opened 2 years ago by Charlie77-E
1
Add license classifier to package metadata
#43 opened 2 years ago by C-nit
0
how to maintain the format of the File
#42 opened 2 years ago by shashankmuralidhar
0
The result contains extra newline
#39 opened 3 years ago by DanteAndroid
0
Exception when using docx2txt
#38 opened 3 years ago by DanteAndroid
1
difficulty with opening file- updated
#37 opened 3 years ago by mmiesner
1
Reading .doc file format
#29 opened 5 years ago by bpkapkar
2
Two argument process gives error (but one argument is fine)
#36 opened 3 years ago by demongolem-biz
0
strikethrough strings are not removed as of now
#35 opened 4 years ago by sdeepmars
0
It does not convert numbered items
#12 opened 8 years ago by robo3945
10
Have an option to include repeating images in extracted folder
#32 opened 4 years ago by rahulchowdhuryce
1
Can I print all contents of a doc file including images as well as text with images in its original position?
#31 opened 4 years ago by Aniket573
1
docx created with word online
#16 opened 7 years ago by burbma
5
BadZipFile: File is not a zip file (while iterating through directory of docx files)
#30 opened 5 years ago by youssefavx
1
Save list numeration
#24 opened 6 years ago by goshulina
4
Read text and associated hyperlink paragraph by paragraph
#28 opened 5 years ago by sebastiansajie
0
Image Paths in generated Text
#21 opened 7 years ago by rushikesh988
2
Extract footnote?
#27 opened 5 years ago by vivlio-kumihan
7
formats
#26 opened 6 years ago by red-frog
1
py3 support
#11 opened 6 years ago by deanmalmgren
6
感谢！Thanks!
#23 opened 6 years ago by lcl1995225
0
h
#25 opened 6 years ago by danabner
0
extract Images?
#13 opened 8 years ago by lxj0276
3
Can't pass file name as an argument when used in a function.
#20 opened 7 years ago by bharrath22
1
Don't work how I expected
#17 opened 7 years ago by juniordiasjfd
4
How to differentiate between header text vs paragraph text?
#15 opened 7 years ago by tbell511
1
How to extract hyperlinks?
#9 opened 8 years ago by badbye
3
excuse me,it can't run from python
#7 opened 8 years ago by Mengqi777
3
text = docx2txt.process("file.docx", "/tmp/img_dir") Function not working
#6 opened 9 years ago by GarrettHartley
6
Error on print
#5 opened 9 years ago by MannyGrewal
5
Is this a working application?
#3 opened 9 years ago by GarrettHartley
3
Directory is not empty
#1 opened 9 years ago by superlou
3