extract text from doc files(windows10 64)
SHocker-Yu opened this issue · 10 comments
"DOC extraction requires antiword be installed, link, unless on OSX in which case textutil (installed by default) is used."
OS: windows10 64
I installed antiword.exe failed,and i don't konw how to do with this problem...
have you declared the path to your antiword.exe file in the PATH global variable ?
@zzzwx thanks for your reply,antiword does not support Windows.
@SHocker-Yu i am using it on windows (7 and 10)
some good fellow actually compiled it for windows, get it there : http://www-stud.rbi.informatik.uni-frankfurt.de/~markus/antiword/
@zzzwx appreciate for your kind reply.
I loaded it at last time, but when i want to run antiword.exe, it flash back,
OS: Windows10
Have you come across this situation?
Could you tell me how to make it running success?
@SHocker-Yu what do you mean by "flash back" ?
here are the steps I followed to make it work on windows :
0/ modify textract/lib/extractors/doc.js to fix a bug reported in a github issue
- if ( error.toString().indexOf( 'is not a Word Document' ) ) {
+ if ( error.toString().indexOf( 'is not a Word Document' ) > 0 ) {
1/ download windows binary
2/ add antiword directory to Windows' PATH environnement variable
=> at this point it worked but only when the path to the doc file contained no spaces
3/ modify textract/lib/extractors/doc.js again to add quotes so that it reads the input path as is
- var escapedPath = filePath.replace( /\s/g, '\\ ' );
+ var escapedPath = filePath/*.replace( /\s/g, '\\ ' )*/;
- exec( 'antiword ' + escapedPath,
+ exec( 'antiword "' + escapedPath + '"',
=> at this point it worked for every paths
4/ modify textract/lib/extractors/doc.js one last time to manage UTF8 encoding of output text
- exec( 'antiword "' + escapedPath + '"',
+ exec( 'antiword -m UTF-8.txt "' + escapedPath + '"',
=> and after that it worked well all the time :)
hope this helps you
@zzzwx I really appreciate for your kind,so sorry about my pool English,'flash back' means 'crash',these days i had to work all day ,and reply you so late,really sorry, i have readed your reply,and i will try it and then tell you the result.
Best wishes.
@zzzwx It works!Thank you so much!!!
FYI, I've implemented the changes from above across a few different commits the last few months (sorry so slow!).
Published as 2.1
, thanks!
Hi @dbashford , thank you for your work