morungos/node-word-extractor

Incorrect text when extracting fields

Closed this issue · 0 comments

When extracting the body of a Word file, fields are not being handled properly. For example, you might get text as follows from test01.doc:

If you find any bugs, or have any suggestions, please email me at ray@camdenfamily.com. You can also go to the BlogCFC Forums at HYPERLINK "http://ray.camdenfamilywww.coldfusionjedi.com/forums/forums.cfm?conferenceid=CBD210FD-AB88-8875-EBDE545BF7B67269" http://ray.camdenfamilywww.coldfusionjedi.com/forums/forums.cfm?conferenceid=CBD210FD-AB88-8875-EBDE545BF7B67269. You may also go to the BlogCFC Project page at HYPERLINK "http://ray.camdenfamily.com/projects/blogcfc" http://ray.camdenfamily.com/projects/blogcfc.riaforge.org. Lastly – you can read news about BlogCFC at http://www.blogcfc.com.

In practice, the original text is as follows:

image

The differences are in fields, mainly.