/textract

Golang module for extracting text from XML-based MS Office documents

Primary LanguageGoMIT LicenseMIT

Still very early in the development of this code. Will need to find the time to work on it :)

Description

A simple library to extract text content from popular document types, such as Word, PowerPoint, Excel, PDF, etc.

Started developing this module because I need it for another application I've been building and am looking for something that is royalty-free and high performance. I intend to add support for additional document types over time.

This initial version only supports Word/docx documents.

Installation

After you have installed go, run this command to install the textract package:

go get github.com/chchench/textract

Roadmap

  • 1.0.0 - Initial release supports text extraction from (post 2007) Word/docx files

License

Making the source code to this app available under License: MIT