dbashford/textract

node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!

HTMLMIT

Issues

CVE-2021-33623
#240 opened 4 months ago by david-nikolai-mueller
0
CVE-2022-39353
#239 opened 4 months ago by david-nikolai-mueller
0
Update got Dependency to be Compatible with Current Node.js Versions
#238 opened 4 months ago by CONSUMERTHOUGHTS
0
fromUrl doesn't work when passing urls to PDF files?
#237 opened a year ago by davidawad
0
CVE-2021-3803
#228 opened 3 years ago by prafullkulkarni
1
how to get the total page number of the file
#236 opened a year ago by venkatesh-pro
0
unrtf not throwing error
#235 opened a year ago by divyeshrajpura4114
0
Access (doc | docx) (20 MB) have no reaction
#232 opened 2 years ago by dengzhenhai
0
OCR for PDFs
#231 opened 3 years ago by boazl-cyera
0
Get picture from doc?
#229 opened 3 years ago by bigbird231
0
CVE-2021-23362
#217 opened 3 years ago by prafullkulkarni
5
CVE-2021-33623
#220 opened 3 years ago by prafullkulkarni
2
CVE-2021-21366
#215 opened 4 years ago by OlivierB-OB
5
CVE-2021-32014, CVE-2021-32012, and CVE-2021-32013
#227 opened 3 years ago by prafullkulkarni
0
CVE-2021-23413
#226 opened 3 years ago by prafullkulkarni
0
CVE-2021-33587
#225 opened 3 years ago by prafullkulkarni
0
Regular Expression Denial of Service in marked & xmldom dependency
#224 opened 3 years ago by sgadekar81
0
Abandoned project - viable forks or alternatives
#221 opened 3 years ago by nosferatu500
0
CVE-2021-23362
#216 opened 3 years ago by OlivierB-OB
1
'pdftotext' does not appear to be installed
#213 opened 4 years ago by codingalien-d
1
What is suggested architecture to make this into an API?
#200 opened 5 years ago by johnernest02
3
Method to check if mime type is supported
#212 opened 4 years ago by ari62
0
The docx extractor missed all the Emojis
#211 opened 4 years ago by andyli
0
bug: update the j library dependency
#186 opened 5 years ago by qinst64
4
Support reading docx files in flat opc format
#207 opened 4 years ago by jessrosenfield
0
image inside pdf
#183 opened 6 years ago by deepdil-sp
5
Security: update package use of marked library
#202 opened 4 years ago by camsjams
1
Support srt (application/x-subrip) files
#201 opened 5 years ago by altwohill
0
preserveOnlyMultipleLineBreaks does not work on PDF when eol:'dos'
#199 opened 5 years ago by thegoatherder
0
Memory maxed out for a 70page document
#187 opened 5 years ago by tiholic
1
Word does not appear to really be a .doc file, nodejs application
#196 opened 5 years ago by anthonyli
0
Error: Incorrect parameters passed to textract.
#198 opened 5 years ago by sunnysharma03
1
Please update marked
#194 opened 5 years ago by ram-you
2
not able extract text from file using python package
#195 opened 5 years ago by swamyaddala
0
Header and footer missing in .odt
#192 opened 5 years ago by fsandx
1
Regular Expression Denial of Service in marked dependency
#190 opened 5 years ago by madnight
4
not able to change language
#191 opened 5 years ago by swamyaddala
0
Textract Returns Null Value
#185 opened 5 years ago by Jodyadriene
0
get metainfo - count of pages
#170 opened 6 years ago by raulromanp
1
newline in between sentences which is not a linebreak
#184 opened 6 years ago by deepdil-sp
0
AWS S3 bucket file gives does not exist erro
#178 opened 6 years ago by rmr-code
6
Extract Hyperlink from images
#182 opened 6 years ago by deepdil-sp
0
No PPT support
#181 opened 6 years ago by carlosvini
0
pliz update the npm
#180 opened 6 years ago by apporoad
1
How to use Regex with the text extracted ?
#177 opened 6 years ago by alexauvray
0
Problems with garbled characters in docx files
#176 opened 6 years ago by uptown
1
Equations in docx, pdf extraction
#174 opened 6 years ago by zscrca
0
Temporary files are not cleaned up
#171 opened 6 years ago by edelache
0
Support for msg files
#169 opened 6 years ago by roydiasbytes
0
Error trying to read larger files
#168 opened 6 years ago by lic001rabby
1