Still very early in the development of this code. Will need to find the time to work on it :)
A simple library to extract text content from popular document types, such as Word, PowerPoint, Excel, PDF, etc.
Started developing this module because I need it for another application I've been building and am looking for something that is royalty-free and high performance. I intend to add support for additional document types over time.
This initial version only supports Word/docx documents.
After you have installed go, run this command to install the textract
package:
go get github.com/chchench/textract
- 1.0.0 - Initial release supports text extraction from (post 2007) Word/docx files